[I debug.cpp:49] [c10d] The debug level is set to INFO. [I socket.cpp:452] [c10d - debug] The server socket will attempt to listen on an IPv6 address. [I socket.cpp:502] [c10d - debug] The server socket is attempting to listen on [::]:19532. [I socket.cpp:576] [c10d] The server socket has started to listen on [::]:19532. [I TCPStore.cpp:258] [c10d - debug] The server has started on port = 19532. [I socket.cpp:689] [c10d - debug] The client socket will attempt to connect to an IPv6 address of (127.0.0.1, 19532). [I socket.cpp:849] [c10d] The client socket has connected to [localhost]:19532 on [localhost]:42228. [I TCPStore.cpp:267] [c10d - debug] TCP client connected to host 127.0.0.1:19532 [I socket.cpp:300] [c10d - debug] The server socket on [::]:19532 has accepted a connection from [localhost]:42228. [I debug.cpp:49] [c10d] The debug level is set to INFO. [I debug.cpp:49] [c10d] The debug level is set to INFO. [I debug.cpp:49] [c10d] The debug level is set to INFO. [I debug.cpp:49] [c10d] The debug level is set to INFO. [I debug.cpp:49] [c10d] The debug level is set to INFO. [I debug.cpp:49] [c10d] The debug level is set to INFO. [I debug.cpp:49] [c10d] The debug level is set to INFO. [I debug.cpp:49] [c10d] The debug level is set to INFO. [2024-12-05 12:30:20,921] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-12-05 12:30:20,994] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-12-05 12:30:21,352] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-12-05 12:30:21,355] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-12-05 12:30:21,409] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-12-05 12:30:21,420] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-12-05 12:30:21,427] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-12-05 12:30:21,490] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-12-05 12:30:21,637] [INFO] [comm.py:637:init_distributed] cdb=None [I socket.cpp:689] [c10d - debug] The client socket will attempt to connect to an IPv6 address of (127.0.0.1, 19532). [I socket.cpp:300] [c10d - debug] The server socket on [::]:19532 has accepted a connection from [localhost]:47306. [I socket.cpp:849] [c10d] The client socket has connected to [localhost]:19532 on [localhost]:47306. [I TCPStore.cpp:267] [c10d - debug] TCP client connected to host 127.0.0.1:19532 [I ProcessGroupNCCL.cpp:1139] [Rank 6] ProcessGroupNCCL initialization options:NCCL_ASYNC_ERROR_HANDLING: 1, NCCL_DESYNC_DEBUG: 0, NCCL_ENABLE_TIMING: 0, NCCL_BLOCKING_WAIT: 0, TIMEOUT(ms): 1800000, USE_HIGH_PRIORITY_STREAM: 0, TORCH_DISTRIBUTED_DEBUG: INFO, NCCL_DEBUG: INFO, ID=200731664 [2024-12-05 12:30:21,643] [INFO] [comm.py:637:init_distributed] cdb=None [I socket.cpp:689] [c10d - debug] The client socket will attempt to connect to an IPv6 address of (127.0.0.1, 19532). [I socket.cpp:300] [c10d - debug] The server socket on [::]:19532 has accepted a connection from [localhost]:47308. [I socket.cpp:849] [c10d] The client socket has connected to [localhost]:19532 on [localhost]:47308. [I TCPStore.cpp:267] [c10d - debug] TCP client connected to host 127.0.0.1:19532 [I ProcessGroupNCCL.cpp:1139] [Rank 4] ProcessGroupNCCL initialization options:NCCL_ASYNC_ERROR_HANDLING: 1, NCCL_DESYNC_DEBUG: 0, NCCL_ENABLE_TIMING: 0, NCCL_BLOCKING_WAIT: 0, TIMEOUT(ms): 1800000, USE_HIGH_PRIORITY_STREAM: 0, TORCH_DISTRIBUTED_DEBUG: INFO, NCCL_DEBUG: INFO, ID=200429024 [2024-12-05 12:30:22,532] [INFO] [comm.py:637:init_distributed] cdb=None [I socket.cpp:689] [c10d - debug] The client socket will attempt to connect to an IPv6 address of (127.0.0.1, 19532). [I socket.cpp:300] [c10d - debug] The server socket on [::]:19532 has accepted a connection from [localhost]:47320. [I socket.cpp:849] [c10d] The client socket has connected to [localhost]:19532 on [localhost]:47320. [I TCPStore.cpp:267] [c10d - debug] TCP client connected to host 127.0.0.1:19532 [I ProcessGroupNCCL.cpp:1139] [Rank 2] ProcessGroupNCCL initialization options:NCCL_ASYNC_ERROR_HANDLING: 1, NCCL_DESYNC_DEBUG: 0, NCCL_ENABLE_TIMING: 0, NCCL_BLOCKING_WAIT: 0, TIMEOUT(ms): 1800000, USE_HIGH_PRIORITY_STREAM: 0, TORCH_DISTRIBUTED_DEBUG: INFO, NCCL_DEBUG: INFO, ID=215752224 [2024-12-05 12:30:22,874] [INFO] [comm.py:637:init_distributed] cdb=None [2024-12-05 12:30:22,874] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl [I socket.cpp:689] [c10d - debug] The client socket will attempt to connect to an IPv6 address of (127.0.0.1, 19532). [2024-12-05 12:30:22,876] [INFO] [comm.py:637:init_distributed] cdb=None [I socket.cpp:689] [c10d - debug] The client socket will attempt to connect to an IPv6 address of (127.0.0.1, 19532). [I socket.cpp:300] [c10d - debug] The server socket on [::]:19532 has accepted a connection from [localhost]:47322. [I socket.cpp:300] [c10d - debug] The server socket on [::]:19532 has accepted a connection from [localhost]:47326. [I socket.cpp:849] [c10d] The client socket has connected to [localhost]:19532 on [localhost]:47322. [I TCPStore.cpp:267] [c10d - debug] TCP client connected to host 127.0.0.1:19532 [I ProcessGroupNCCL.cpp:1139] [Rank 0] ProcessGroupNCCL initialization options:NCCL_ASYNC_ERROR_HANDLING: 1, NCCL_DESYNC_DEBUG: 0, NCCL_ENABLE_TIMING: 0, NCCL_BLOCKING_WAIT: 0, TIMEOUT(ms): 1800000, USE_HIGH_PRIORITY_STREAM: 0, TORCH_DISTRIBUTED_DEBUG: INFO, NCCL_DEBUG: INFO, ID=199192736 [I socket.cpp:849] [c10d] The client socket has connected to [localhost]:19532 on [localhost]:47326. [I TCPStore.cpp:267] [c10d - debug] TCP client connected to host 127.0.0.1:19532 [I ProcessGroupNCCL.cpp:1139] [Rank 1] ProcessGroupNCCL initialization options:NCCL_ASYNC_ERROR_HANDLING: 1, NCCL_DESYNC_DEBUG: 0, NCCL_ENABLE_TIMING: 0, NCCL_BLOCKING_WAIT: 0, TIMEOUT(ms): 1800000, USE_HIGH_PRIORITY_STREAM: 0, TORCH_DISTRIBUTED_DEBUG: INFO, NCCL_DEBUG: INFO, ID=226215808 [2024-12-05 12:30:22,901] [INFO] [comm.py:637:init_distributed] cdb=None [I socket.cpp:689] [c10d - debug] The client socket will attempt to connect to an IPv6 address of (127.0.0.1, 19532). [2024-12-05 12:30:22,903] [INFO] [comm.py:637:init_distributed] cdb=None [I socket.cpp:689] [c10d - debug] The client socket will attempt to connect to an IPv6 address of (127.0.0.1, 19532). [I socket.cpp:300] [c10d - debug] The server socket on [::]:19532 has accepted a connection from [localhost]:47340. [I socket.cpp:300] [c10d - debug] The server socket on [::]:19532 has accepted a connection from [localhost]:47348. [I socket.cpp:849] [c10d] The client socket has connected to [localhost]:19532 on [localhost]:47340. [I TCPStore.cpp:267] [c10d - debug] TCP client connected to host 127.0.0.1:19532 [I ProcessGroupNCCL.cpp:1139] [Rank 5] ProcessGroupNCCL initialization options:NCCL_ASYNC_ERROR_HANDLING: 1, NCCL_DESYNC_DEBUG: 0, NCCL_ENABLE_TIMING: 0, NCCL_BLOCKING_WAIT: 0, TIMEOUT(ms): 1800000, USE_HIGH_PRIORITY_STREAM: 0, TORCH_DISTRIBUTED_DEBUG: INFO, NCCL_DEBUG: INFO, ID=202704640 [I socket.cpp:849] [c10d] The client socket has connected to [localhost]:19532 on [localhost]:47348. [I TCPStore.cpp:267] [c10d - debug] TCP client connected to host 127.0.0.1:19532 [I ProcessGroupNCCL.cpp:1139] [Rank 3] ProcessGroupNCCL initialization options:NCCL_ASYNC_ERROR_HANDLING: 1, NCCL_DESYNC_DEBUG: 0, NCCL_ENABLE_TIMING: 0, NCCL_BLOCKING_WAIT: 0, TIMEOUT(ms): 1800000, USE_HIGH_PRIORITY_STREAM: 0, TORCH_DISTRIBUTED_DEBUG: INFO, NCCL_DEBUG: INFO, ID=225556928 [2024-12-05 12:30:22,909] [INFO] [comm.py:637:init_distributed] cdb=None [I socket.cpp:689] [c10d - debug] The client socket will attempt to connect to an IPv6 address of (127.0.0.1, 19532). [I socket.cpp:300] [c10d - debug] The server socket on [::]:19532 has accepted a connection from [localhost]:47362. [I socket.cpp:849] [c10d] The client socket has connected to [localhost]:19532 on [localhost]:47362. [I TCPStore.cpp:267] [c10d - debug] TCP client connected to host 127.0.0.1:19532 [I ProcessGroupNCCL.cpp:1139] [Rank 7] ProcessGroupNCCL initialization options:NCCL_ASYNC_ERROR_HANDLING: 1, NCCL_DESYNC_DEBUG: 0, NCCL_ENABLE_TIMING: 0, NCCL_BLOCKING_WAIT: 0, TIMEOUT(ms): 1800000, USE_HIGH_PRIORITY_STREAM: 0, TORCH_DISTRIBUTED_DEBUG: INFO, NCCL_DEBUG: INFO, ID=227440400 You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. 2024-12-05 12:30:29.136 n124-105-156:494060:494060 [0] NCCL INFO NCCL_SOCKET_FAMILY set by environment to AF_INET6 2024-12-05 12:30:29.136 n124-105-156:494060:494060 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 2024-12-05 12:30:29.145 n124-105-156:494060:494060 [0] NCCL INFO Bootstrap : Using eth0:fdbd:dccd:cdc2:12c8:0:1d8::<0> 2024-12-05 12:30:29.145 n124-105-156:494060:494060 [0] NCCL INFO NET/Plugin: No plugin found (gcp-fastrak) 2024-12-05 12:30:29.145 n124-105-156:494060:494060 [0] NCCL INFO NET/Plugin: No plugin found (libnccl-net-gcp-fastrak.so) 2024-12-05 12:30:29.145 n124-105-156:494060:494060 [0] NCCL INFO NET/Plugin: Plugin load returned 2 : libnccl-net-gcp-fastrak.so: cannot open shared object file: No such file or directory : when loading gcp-fastrak 2024-12-05 12:30:29.145 n124-105-156:494060:494060 [0] NCCL INFO NET/Plugin: Using internal network plugin. 2024-12-05 12:30:29.145 n124-105-156:494060:494060 [0] NCCL INFO cudaDriverVersion 12020 2024-12-05 12:30:29.145 n124-105-156:494060:494060 [0] NCCL INFO NCCL LOG Enabled NCCL version 2.21.5-1+cuda12.1 You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. 2024-12-05 12:30:29.172 n124-105-156:494064:494064 [4] NCCL INFO cudaDriverVersion 12020 You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. 2024-12-05 12:30:29.172 n124-105-156:494064:494064 [4] NCCL INFO NCCL LOG Enabled 2024-12-05 12:30:29.172 n124-105-156:494064:494064 [4] NCCL INFO NCCL_SOCKET_FAMILY set by environment to AF_INET6 2024-12-05 12:30:29.172 n124-105-156:494064:494064 [4] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 2024-12-05 12:30:29.188 n124-105-156:494066:494066 [6] NCCL INFO cudaDriverVersion 12020 2024-12-05 12:30:29.188 n124-105-156:494066:494066 [6] NCCL INFO NCCL LOG Enabled 2024-12-05 12:30:29.188 n124-105-156:494066:494066 [6] NCCL INFO NCCL_SOCKET_FAMILY set by environment to AF_INET6 2024-12-05 12:30:29.188 n124-105-156:494066:494066 [6] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 2024-12-05 12:30:29.192 n124-105-156:494064:494064 [4] NCCL INFO Bootstrap : Using eth0:fdbd:dccd:cdc2:12c8:0:1d8::<0> 2024-12-05 12:30:29.193 n124-105-156:494064:494064 [4] NCCL INFO NET/Plugin: No plugin found (gcp-fastrak) 2024-12-05 12:30:29.193 n124-105-156:494064:494064 [4] NCCL INFO NET/Plugin: No plugin found (libnccl-net-gcp-fastrak.so) 2024-12-05 12:30:29.193 n124-105-156:494064:494064 [4] NCCL INFO NET/Plugin: Plugin load returned 2 : libnccl-net-gcp-fastrak.so: cannot open shared object file: No such file or directory : when loading gcp-fastrak 2024-12-05 12:30:29.193 n124-105-156:494064:494064 [4] NCCL INFO NET/Plugin: Using internal network plugin. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. 2024-12-05 12:30:29.208 n124-105-156:494066:494066 [6] NCCL INFO Bootstrap : Using eth0:fdbd:dccd:cdc2:12c8:0:1d8::<0> 2024-12-05 12:30:29.209 n124-105-156:494066:494066 [6] NCCL INFO NET/Plugin: No plugin found (gcp-fastrak) 2024-12-05 12:30:29.209 n124-105-156:494066:494066 [6] NCCL INFO NET/Plugin: No plugin found (libnccl-net-gcp-fastrak.so) 2024-12-05 12:30:29.209 n124-105-156:494066:494066 [6] NCCL INFO NET/Plugin: Plugin load returned 2 : libnccl-net-gcp-fastrak.so: cannot open shared object file: No such file or directory : when loading gcp-fastrak 2024-12-05 12:30:29.209 n124-105-156:494066:494066 [6] NCCL INFO NET/Plugin: Using internal network plugin. 2024-12-05 12:30:29.251 n124-105-156:494065:494065 [5] NCCL INFO cudaDriverVersion 12020 2024-12-05 12:30:29.251 n124-105-156:494065:494065 [5] NCCL INFO NCCL LOG Enabled 2024-12-05 12:30:29.251 n124-105-156:494065:494065 [5] NCCL INFO NCCL_SOCKET_FAMILY set by environment to AF_INET6 2024-12-05 12:30:29.251 n124-105-156:494065:494065 [5] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 2024-12-05 12:30:29.276 n124-105-156:494065:494065 [5] NCCL INFO Bootstrap : Using eth0:fdbd:dccd:cdc2:12c8:0:1d8::<0> 2024-12-05 12:30:29.276 n124-105-156:494065:494065 [5] NCCL INFO NET/Plugin: No plugin found (gcp-fastrak) 2024-12-05 12:30:29.276 n124-105-156:494065:494065 [5] NCCL INFO NET/Plugin: No plugin found (libnccl-net-gcp-fastrak.so) 2024-12-05 12:30:29.276 n124-105-156:494065:494065 [5] NCCL INFO NET/Plugin: Plugin load returned 2 : libnccl-net-gcp-fastrak.so: cannot open shared object file: No such file or directory : when loading gcp-fastrak 2024-12-05 12:30:29.276 n124-105-156:494065:494065 [5] NCCL INFO NET/Plugin: Using internal network plugin. 2024-12-05 12:30:29.279 n124-105-156:494061:494061 [1] NCCL INFO cudaDriverVersion 12020 2024-12-05 12:30:29.280 n124-105-156:494061:494061 [1] NCCL INFO NCCL LOG Enabled 2024-12-05 12:30:29.280 n124-105-156:494061:494061 [1] NCCL INFO NCCL_SOCKET_FAMILY set by environment to AF_INET6 2024-12-05 12:30:29.280 n124-105-156:494061:494061 [1] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 2024-12-05 12:30:29.299 n124-105-156:494061:494061 [1] NCCL INFO Bootstrap : Using eth0:fdbd:dccd:cdc2:12c8:0:1d8::<0> 2024-12-05 12:30:29.299 n124-105-156:494061:494061 [1] NCCL INFO NET/Plugin: No plugin found (gcp-fastrak) 2024-12-05 12:30:29.299 n124-105-156:494061:494061 [1] NCCL INFO NET/Plugin: No plugin found (libnccl-net-gcp-fastrak.so) 2024-12-05 12:30:29.299 n124-105-156:494061:494061 [1] NCCL INFO NET/Plugin: Plugin load returned 2 : libnccl-net-gcp-fastrak.so: cannot open shared object file: No such file or directory : when loading gcp-fastrak 2024-12-05 12:30:29.299 n124-105-156:494061:494061 [1] NCCL INFO NET/Plugin: Using internal network plugin. 2024-12-05 12:30:29.374 n124-105-156:494063:494063 [3] NCCL INFO cudaDriverVersion 12020 2024-12-05 12:30:29.374 n124-105-156:494063:494063 [3] NCCL INFO NCCL LOG Enabled 2024-12-05 12:30:29.374 n124-105-156:494063:494063 [3] NCCL INFO NCCL_SOCKET_FAMILY set by environment to AF_INET6 2024-12-05 12:30:29.374 n124-105-156:494063:494063 [3] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 2024-12-05 12:30:29.384 n124-105-156:494067:494067 [7] NCCL INFO cudaDriverVersion 12020 2024-12-05 12:30:29.384 n124-105-156:494067:494067 [7] NCCL INFO NCCL LOG Enabled 2024-12-05 12:30:29.385 n124-105-156:494067:494067 [7] NCCL INFO NCCL_SOCKET_FAMILY set by environment to AF_INET6 2024-12-05 12:30:29.385 n124-105-156:494067:494067 [7] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 2024-12-05 12:30:29.395 n124-105-156:494063:494063 [3] NCCL INFO Bootstrap : Using eth0:fdbd:dccd:cdc2:12c8:0:1d8::<0> 2024-12-05 12:30:29.395 n124-105-156:494063:494063 [3] NCCL INFO NET/Plugin: No plugin found (gcp-fastrak) 2024-12-05 12:30:29.395 n124-105-156:494063:494063 [3] NCCL INFO NET/Plugin: No plugin found (libnccl-net-gcp-fastrak.so) 2024-12-05 12:30:29.395 n124-105-156:494063:494063 [3] NCCL INFO NET/Plugin: Plugin load returned 2 : libnccl-net-gcp-fastrak.so: cannot open shared object file: No such file or directory : when loading gcp-fastrak 2024-12-05 12:30:29.395 n124-105-156:494063:494063 [3] NCCL INFO NET/Plugin: Using internal network plugin. 2024-12-05 12:30:29.408 n124-105-156:494067:494067 [7] NCCL INFO Bootstrap : Using eth0:fdbd:dccd:cdc2:12c8:0:1d8::<0> 2024-12-05 12:30:29.409 n124-105-156:494067:494067 [7] NCCL INFO NET/Plugin: No plugin found (gcp-fastrak) 2024-12-05 12:30:29.409 n124-105-156:494067:494067 [7] NCCL INFO NET/Plugin: No plugin found (libnccl-net-gcp-fastrak.so) 2024-12-05 12:30:29.409 n124-105-156:494067:494067 [7] NCCL INFO NET/Plugin: Plugin load returned 2 : libnccl-net-gcp-fastrak.so: cannot open shared object file: No such file or directory : when loading gcp-fastrak 2024-12-05 12:30:29.409 n124-105-156:494067:494067 [7] NCCL INFO NET/Plugin: Using internal network plugin. 2024-12-05 12:30:29.409 n124-105-156:494062:494062 [2] NCCL INFO cudaDriverVersion 12020 2024-12-05 12:30:29.409 n124-105-156:494062:494062 [2] NCCL INFO NCCL LOG Enabled 2024-12-05 12:30:29.409 n124-105-156:494062:494062 [2] NCCL INFO NCCL_SOCKET_FAMILY set by environment to AF_INET6 2024-12-05 12:30:29.409 n124-105-156:494062:494062 [2] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 2024-12-05 12:30:29.437 n124-105-156:494062:494062 [2] NCCL INFO Bootstrap : Using eth0:fdbd:dccd:cdc2:12c8:0:1d8::<0> 2024-12-05 12:30:29.437 n124-105-156:494062:494062 [2] NCCL INFO NET/Plugin: No plugin found (gcp-fastrak) 2024-12-05 12:30:29.437 n124-105-156:494062:494062 [2] NCCL INFO NET/Plugin: No plugin found (libnccl-net-gcp-fastrak.so) 2024-12-05 12:30:29.437 n124-105-156:494062:494062 [2] NCCL INFO NET/Plugin: Plugin load returned 2 : libnccl-net-gcp-fastrak.so: cannot open shared object file: No such file or directory : when loading gcp-fastrak 2024-12-05 12:30:29.437 n124-105-156:494062:494062 [2] NCCL INFO NET/Plugin: Using internal network plugin. 2024-12-05 12:30:29.518 n124-105-156:494060:494507 [0] NCCL INFO NCCL_SOCKET_FAMILY set by environment to AF_INET6 2024-12-05 12:30:29.518 n124-105-156:494060:494507 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 2024-12-05 12:30:29.526 n124-105-156:494064:494508 [4] NCCL INFO NCCL_SOCKET_FAMILY set by environment to AF_INET6 2024-12-05 12:30:29.526 n124-105-156:494064:494508 [4] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 2024-12-05 12:30:29.526 n124-105-156:494066:494509 [6] NCCL INFO NCCL_SOCKET_FAMILY set by environment to AF_INET6 2024-12-05 12:30:29.526 n124-105-156:494066:494509 [6] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 2024-12-05 12:30:29.541 n124-105-156:494060:494507 [0] NCCL INFO NET/IB : No device found. 2024-12-05 12:30:29.541 n124-105-156:494060:494507 [0] NCCL INFO NCCL_SOCKET_FAMILY set by environment to AF_INET6 2024-12-05 12:30:29.541 n124-105-156:494060:494507 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 2024-12-05 12:30:29.559 n124-105-156:494064:494508 [4] NCCL INFO NET/IB : No device found. 2024-12-05 12:30:29.559 n124-105-156:494064:494508 [4] NCCL INFO NCCL_SOCKET_FAMILY set by environment to AF_INET6 2024-12-05 12:30:29.559 n124-105-156:494064:494508 [4] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 2024-12-05 12:30:29.559 n124-105-156:494066:494509 [6] NCCL INFO NET/IB : No device found. 2024-12-05 12:30:29.559 n124-105-156:494066:494509 [6] NCCL INFO NCCL_SOCKET_FAMILY set by environment to AF_INET6 2024-12-05 12:30:29.559 n124-105-156:494066:494509 [6] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 2024-12-05 12:30:29.573 n124-105-156:494060:494507 [0] NCCL INFO NET/Socket : Using [0]eth0:fdbd:dccd:cdc2:12c8:0:1d8::<0> 2024-12-05 12:30:29.573 n124-105-156:494060:494507 [0] NCCL INFO Using non-device net plugin version 0 2024-12-05 12:30:29.573 n124-105-156:494060:494507 [0] NCCL INFO Using network Socket 2024-12-05 12:30:29.589 n124-105-156:494064:494508 [4] NCCL INFO NET/Socket : Using [0]eth0:fdbd:dccd:cdc2:12c8:0:1d8::<0> 2024-12-05 12:30:29.589 n124-105-156:494064:494508 [4] NCCL INFO Using non-device net plugin version 0 2024-12-05 12:30:29.589 n124-105-156:494064:494508 [4] NCCL INFO Using network Socket 2024-12-05 12:30:29.589 n124-105-156:494066:494509 [6] NCCL INFO NET/Socket : Using [0]eth0:fdbd:dccd:cdc2:12c8:0:1d8::<0> 2024-12-05 12:30:29.589 n124-105-156:494066:494509 [6] NCCL INFO Using non-device net plugin version 0 2024-12-05 12:30:29.589 n124-105-156:494066:494509 [6] NCCL INFO Using network Socket 2024-12-05 12:30:29.770 n124-105-156:494065:494510 [5] NCCL INFO NCCL_SOCKET_FAMILY set by environment to AF_INET6 2024-12-05 12:30:29.770 n124-105-156:494065:494510 [5] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 2024-12-05 12:30:29.779 n124-105-156:494065:494510 [5] NCCL INFO NET/IB : No device found. 2024-12-05 12:30:29.779 n124-105-156:494065:494510 [5] NCCL INFO NCCL_SOCKET_FAMILY set by environment to AF_INET6 2024-12-05 12:30:29.779 n124-105-156:494065:494510 [5] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 2024-12-05 12:30:29.786 n124-105-156:494065:494510 [5] NCCL INFO NET/Socket : Using [0]eth0:fdbd:dccd:cdc2:12c8:0:1d8::<0> 2024-12-05 12:30:29.786 n124-105-156:494065:494510 [5] NCCL INFO Using non-device net plugin version 0 2024-12-05 12:30:29.786 n124-105-156:494065:494510 [5] NCCL INFO Using network Socket 2024-12-05 12:30:30.271 n124-105-156:494061:494511 [1] NCCL INFO NCCL_SOCKET_FAMILY set by environment to AF_INET6 2024-12-05 12:30:30.271 n124-105-156:494061:494511 [1] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 2024-12-05 12:30:30.271 n124-105-156:494067:494513 [7] NCCL INFO NCCL_SOCKET_FAMILY set by environment to AF_INET6 2024-12-05 12:30:30.271 n124-105-156:494067:494513 [7] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 2024-12-05 12:30:30.287 n124-105-156:494061:494511 [1] NCCL INFO NET/IB : No device found. 2024-12-05 12:30:30.287 n124-105-156:494061:494511 [1] NCCL INFO NCCL_SOCKET_FAMILY set by environment to AF_INET6 2024-12-05 12:30:30.287 n124-105-156:494061:494511 [1] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 2024-12-05 12:30:30.287 n124-105-156:494067:494513 [7] NCCL INFO NET/IB : No device found. 2024-12-05 12:30:30.287 n124-105-156:494067:494513 [7] NCCL INFO NCCL_SOCKET_FAMILY set by environment to AF_INET6 2024-12-05 12:30:30.287 n124-105-156:494067:494513 [7] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 2024-12-05 12:30:30.291 n124-105-156:494063:494512 [3] NCCL INFO NCCL_SOCKET_FAMILY set by environment to AF_INET6 2024-12-05 12:30:30.291 n124-105-156:494063:494512 [3] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 2024-12-05 12:30:30.297 n124-105-156:494062:494514 [2] NCCL INFO NCCL_SOCKET_FAMILY set by environment to AF_INET6 2024-12-05 12:30:30.297 n124-105-156:494062:494514 [2] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 2024-12-05 12:30:30.303 n124-105-156:494061:494511 [1] NCCL INFO NET/Socket : Using [0]eth0:fdbd:dccd:cdc2:12c8:0:1d8::<0> 2024-12-05 12:30:30.303 n124-105-156:494067:494513 [7] NCCL INFO NET/Socket : Using [0]eth0:fdbd:dccd:cdc2:12c8:0:1d8::<0> 2024-12-05 12:30:30.303 n124-105-156:494061:494511 [1] NCCL INFO Using non-device net plugin version 0 2024-12-05 12:30:30.303 n124-105-156:494061:494511 [1] NCCL INFO Using network Socket 2024-12-05 12:30:30.303 n124-105-156:494067:494513 [7] NCCL INFO Using non-device net plugin version 0 2024-12-05 12:30:30.303 n124-105-156:494067:494513 [7] NCCL INFO Using network Socket 2024-12-05 12:30:30.314 n124-105-156:494063:494512 [3] NCCL INFO NET/IB : No device found. 2024-12-05 12:30:30.315 n124-105-156:494063:494512 [3] NCCL INFO NCCL_SOCKET_FAMILY set by environment to AF_INET6 2024-12-05 12:30:30.315 n124-105-156:494063:494512 [3] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 2024-12-05 12:30:30.319 n124-105-156:494062:494514 [2] NCCL INFO NET/IB : No device found. 2024-12-05 12:30:30.319 n124-105-156:494062:494514 [2] NCCL INFO NCCL_SOCKET_FAMILY set by environment to AF_INET6 2024-12-05 12:30:30.319 n124-105-156:494062:494514 [2] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 2024-12-05 12:30:30.336 n124-105-156:494063:494512 [3] NCCL INFO NET/Socket : Using [0]eth0:fdbd:dccd:cdc2:12c8:0:1d8::<0> 2024-12-05 12:30:30.336 n124-105-156:494063:494512 [3] NCCL INFO Using non-device net plugin version 0 2024-12-05 12:30:30.336 n124-105-156:494063:494512 [3] NCCL INFO Using network Socket 2024-12-05 12:30:30.341 n124-105-156:494062:494514 [2] NCCL INFO NET/Socket : Using [0]eth0:fdbd:dccd:cdc2:12c8:0:1d8::<0> 2024-12-05 12:30:30.341 n124-105-156:494062:494514 [2] NCCL INFO Using non-device net plugin version 0 2024-12-05 12:30:30.341 n124-105-156:494062:494514 [2] NCCL INFO Using network Socket 2024-12-05 12:30:31.000 n124-105-156:494065:494510 [5] NCCL INFO ncclCommInitRank comm 0xe11f9c0 rank 5 nranks 8 cudaDev 5 nvmlDev 5 busId 85000 commId 0x6d6ed6148153dfe0 - Init START 2024-12-05 12:30:31.000 n124-105-156:494060:494507 [0] NCCL INFO ncclCommInitRank comm 0xdebf210 rank 0 nranks 8 cudaDev 0 nvmlDev 0 busId 4000 commId 0x6d6ed6148153dfe0 - Init START 2024-12-05 12:30:31.000 n124-105-156:494067:494513 [7] NCCL INFO ncclCommInitRank comm 0xf8d09c0 rank 7 nranks 8 cudaDev 7 nvmlDev 7 busId 8c000 commId 0x6d6ed6148153dfe0 - Init START 2024-12-05 12:30:31.000 n124-105-156:494063:494512 [3] NCCL INFO ncclCommInitRank comm 0xf68bd60 rank 3 nranks 8 cudaDev 3 nvmlDev 3 busId c000 commId 0x6d6ed6148153dfe0 - Init START 2024-12-05 12:30:31.000 n124-105-156:494061:494511 [1] NCCL INFO ncclCommInitRank comm 0xf829c20 rank 1 nranks 8 cudaDev 1 nvmlDev 1 busId 5000 commId 0x6d6ed6148153dfe0 - Init START 2024-12-05 12:30:31.000 n124-105-156:494062:494514 [2] NCCL INFO ncclCommInitRank comm 0xee8a820 rank 2 nranks 8 cudaDev 2 nvmlDev 2 busId b000 commId 0x6d6ed6148153dfe0 - Init START 2024-12-05 12:30:31.000 n124-105-156:494064:494508 [4] NCCL INFO ncclCommInitRank comm 0xdf02040 rank 4 nranks 8 cudaDev 4 nvmlDev 4 busId 84000 commId 0x6d6ed6148153dfe0 - Init START 2024-12-05 12:30:31.000 n124-105-156:494066:494509 [6] NCCL INFO ncclCommInitRank comm 0xe035b70 rank 6 nranks 8 cudaDev 6 nvmlDev 6 busId 8b000 commId 0x6d6ed6148153dfe0 - Init START 2024-12-05 12:30:31.565 n124-105-156:494067:494513 [7] NCCL INFO Could not find real path of /sys/class/pci_bus/fffffff/../../fffffff:ff:f 2024-12-05 12:30:31.565 n124-105-156:494063:494512 [3] NCCL INFO Could not find real path of /sys/class/pci_bus/fffffff/../../fffffff:ff:f 2024-12-05 12:30:31.565 n124-105-156:494064:494508 [4] NCCL INFO Could not find real path of /sys/class/pci_bus/fffffff/../../fffffff:ff:f 2024-12-05 12:30:31.566 n124-105-156:494060:494507 [0] NCCL INFO Could not find real path of /sys/class/pci_bus/fffffff/../../fffffff:ff:f 2024-12-05 12:30:31.566 n124-105-156:494065:494510 [5] NCCL INFO Could not find real path of /sys/class/pci_bus/fffffff/../../fffffff:ff:f 2024-12-05 12:30:31.566 n124-105-156:494066:494509 [6] NCCL INFO Could not find real path of /sys/class/pci_bus/fffffff/../../fffffff:ff:f 2024-12-05 12:30:31.566 n124-105-156:494062:494514 [2] NCCL INFO Could not find real path of /sys/class/pci_bus/fffffff/../../fffffff:ff:f 2024-12-05 12:30:31.566 n124-105-156:494061:494511 [1] NCCL INFO Could not find real path of /sys/class/pci_bus/fffffff/../../fffffff:ff:f 2024-12-05 12:30:32.114 n124-105-156:494067:494513 [7] NCCL INFO Could not find real path of /sys/class/pci_bus/fffffff/../../fffffff:ff:f 2024-12-05 12:30:32.115 n124-105-156:494063:494512 [3] NCCL INFO Could not find real path of /sys/class/pci_bus/fffffff/../../fffffff:ff:f 2024-12-05 12:30:32.115 n124-105-156:494064:494508 [4] NCCL INFO Could not find real path of /sys/class/pci_bus/fffffff/../../fffffff:ff:f 2024-12-05 12:30:32.118 n124-105-156:494060:494507 [0] NCCL INFO Could not find real path of /sys/class/pci_bus/fffffff/../../fffffff:ff:f 2024-12-05 12:30:32.118 n124-105-156:494065:494510 [5] NCCL INFO Could not find real path of /sys/class/pci_bus/fffffff/../../fffffff:ff:f 2024-12-05 12:30:32.119 n124-105-156:494066:494509 [6] NCCL INFO Could not find real path of /sys/class/pci_bus/fffffff/../../fffffff:ff:f 2024-12-05 12:30:32.121 n124-105-156:494062:494514 [2] NCCL INFO Could not find real path of /sys/class/pci_bus/fffffff/../../fffffff:ff:f 2024-12-05 12:30:32.122 n124-105-156:494061:494511 [1] NCCL INFO Could not find real path of /sys/class/pci_bus/fffffff/../../fffffff:ff:f 2024-12-05 12:30:32.651 n124-105-156:494066:494509 [6] NCCL INFO Could not find real path of /sys/class/pci_bus/fffffff/../../fffffff:ff:f 2024-12-05 12:30:32.713 n124-105-156:494065:494510 [5] NCCL INFO Could not find real path of /sys/class/pci_bus/fffffff/../../fffffff:ff:f 2024-12-05 12:30:32.731 n124-105-156:494060:494507 [0] NCCL INFO Could not find real path of /sys/class/pci_bus/fffffff/../../fffffff:ff:f 2024-12-05 12:30:32.736 n124-105-156:494067:494513 [7] NCCL INFO Could not find real path of /sys/class/pci_bus/fffffff/../../fffffff:ff:f 2024-12-05 12:30:32.737 n124-105-156:494063:494512 [3] NCCL INFO Could not find real path of /sys/class/pci_bus/fffffff/../../fffffff:ff:f 2024-12-05 12:30:32.738 n124-105-156:494064:494508 [4] NCCL INFO Could not find real path of /sys/class/pci_bus/fffffff/../../fffffff:ff:f 2024-12-05 12:30:32.740 n124-105-156:494062:494514 [2] NCCL INFO Could not find real path of /sys/class/pci_bus/fffffff/../../fffffff:ff:f 2024-12-05 12:30:32.748 n124-105-156:494061:494511 [1] NCCL INFO Could not find real path of /sys/class/pci_bus/fffffff/../../fffffff:ff:f 2024-12-05 12:30:33.214 n124-105-156:494060:494507 [0] NCCL INFO Could not find real path of /sys/class/pci_bus/fffffff/../../fffffff:ff:f 2024-12-05 12:30:33.250 n124-105-156:494066:494509 [6] NCCL INFO Could not find real path of /sys/class/pci_bus/fffffff/../../fffffff:ff:f 2024-12-05 12:30:33.351 n124-105-156:494065:494510 [5] NCCL INFO Could not find real path of /sys/class/pci_bus/fffffff/../../fffffff:ff:f 2024-12-05 12:30:33.356 n124-105-156:494061:494511 [1] NCCL INFO Could not find real path of /sys/class/pci_bus/fffffff/../../fffffff:ff:f 2024-12-05 12:30:33.365 n124-105-156:494064:494508 [4] NCCL INFO Could not find real path of /sys/class/pci_bus/fffffff/../../fffffff:ff:f 2024-12-05 12:30:33.371 n124-105-156:494067:494513 [7] NCCL INFO Could not find real path of /sys/class/pci_bus/fffffff/../../fffffff:ff:f 2024-12-05 12:30:33.377 n124-105-156:494062:494514 [2] NCCL INFO Could not find real path of /sys/class/pci_bus/fffffff/../../fffffff:ff:f 2024-12-05 12:30:33.395 n124-105-156:494063:494512 [3] NCCL INFO Could not find real path of /sys/class/pci_bus/fffffff/../../fffffff:ff:f 2024-12-05 12:30:33.794 n124-105-156:494060:494507 [0] NCCL INFO Could not find real path of /sys/class/pci_bus/fffffff/../../fffffff:ff:f 2024-12-05 12:30:33.885 n124-105-156:494065:494510 [5] NCCL INFO Could not find real path of /sys/class/pci_bus/fffffff/../../fffffff:ff:f 2024-12-05 12:30:33.908 n124-105-156:494066:494509 [6] NCCL INFO Could not find real path of /sys/class/pci_bus/fffffff/../../fffffff:ff:f 2024-12-05 12:30:33.988 n124-105-156:494064:494508 [4] NCCL INFO Could not find real path of /sys/class/pci_bus/fffffff/../../fffffff:ff:f 2024-12-05 12:30:34.022 n124-105-156:494061:494511 [1] NCCL INFO Could not find real path of /sys/class/pci_bus/fffffff/../../fffffff:ff:f 2024-12-05 12:30:34.025 n124-105-156:494067:494513 [7] NCCL INFO Could not find real path of /sys/class/pci_bus/fffffff/../../fffffff:ff:f 2024-12-05 12:30:34.033 n124-105-156:494063:494512 [3] NCCL INFO Could not find real path of /sys/class/pci_bus/fffffff/../../fffffff:ff:f 2024-12-05 12:30:34.036 n124-105-156:494062:494514 [2] NCCL INFO Could not find real path of /sys/class/pci_bus/fffffff/../../fffffff:ff:f 2024-12-05 12:30:34.401 n124-105-156:494060:494507 [0] NCCL INFO Could not find real path of /sys/class/pci_bus/fffffff/../../fffffff:ff:f 2024-12-05 12:30:34.449 n124-105-156:494065:494510 [5] NCCL INFO Could not find real path of /sys/class/pci_bus/fffffff/../../fffffff:ff:f 2024-12-05 12:30:34.460 n124-105-156:494066:494509 [6] NCCL INFO Could not find real path of /sys/class/pci_bus/fffffff/../../fffffff:ff:f 2024-12-05 12:30:34.574 n124-105-156:494061:494511 [1] NCCL INFO Could not find real path of /sys/class/pci_bus/fffffff/../../fffffff:ff:f 2024-12-05 12:30:34.614 n124-105-156:494064:494508 [4] NCCL INFO Could not find real path of /sys/class/pci_bus/fffffff/../../fffffff:ff:f 2024-12-05 12:30:34.648 n124-105-156:494067:494513 [7] NCCL INFO Could not find real path of /sys/class/pci_bus/fffffff/../../fffffff:ff:f 2024-12-05 12:30:34.650 n124-105-156:494062:494514 [2] NCCL INFO Could not find real path of /sys/class/pci_bus/fffffff/../../fffffff:ff:f 2024-12-05 12:30:34.694 n124-105-156:494063:494512 [3] NCCL INFO Could not find real path of /sys/class/pci_bus/fffffff/../../fffffff:ff:f 2024-12-05 12:30:34.974 n124-105-156:494066:494509 [6] NCCL INFO Could not find real path of /sys/class/pci_bus/fffffff/../../fffffff:ff:f 2024-12-05 12:30:35.011 n124-105-156:494060:494507 [0] NCCL INFO Could not find real path of /sys/class/pci_bus/fffffff/../../fffffff:ff:f 2024-12-05 12:30:35.017 n124-105-156:494065:494510 [5] NCCL INFO Could not find real path of /sys/class/pci_bus/fffffff/../../fffffff:ff:f 2024-12-05 12:30:35.076 n124-105-156:494061:494511 [1] NCCL INFO Could not find real path of /sys/class/pci_bus/fffffff/../../fffffff:ff:f 2024-12-05 12:30:35.229 n124-105-156:494064:494508 [4] NCCL INFO Could not find real path of /sys/class/pci_bus/fffffff/../../fffffff:ff:f 2024-12-05 12:30:35.262 n124-105-156:494067:494513 [7] NCCL INFO Could not find real path of /sys/class/pci_bus/fffffff/../../fffffff:ff:f 2024-12-05 12:30:35.263 n124-105-156:494062:494514 [2] NCCL INFO Could not find real path of /sys/class/pci_bus/fffffff/../../fffffff:ff:f 2024-12-05 12:30:35.307 n124-105-156:494063:494512 [3] NCCL INFO Could not find real path of /sys/class/pci_bus/fffffff/../../fffffff:ff:f 2024-12-05 12:30:35.663 n124-105-156:494066:494509 [6] NCCL INFO Could not find real path of /sys/class/pci_bus/fffffff/../../fffffff:ff:f 2024-12-05 12:30:35.663 n124-105-156:494066:494509 [6] NCCL INFO Topology detection : could not read /sys/devices/pci0000:00/0000:00:0c.0/max_link_speed, ignoring 2024-12-05 12:30:35.663 n124-105-156:494066:494509 [6] NCCL INFO Topology detection : could not read /sys/devices/pci0000:00/0000:00:0c.0/../max_link_speed, ignoring 2024-12-05 12:30:35.663 n124-105-156:494066:494509 [6] NCCL INFO Topology detection : could not read /sys/devices/pci0000:00/0000:00:0c.0/max_link_width, ignoring 2024-12-05 12:30:35.663 n124-105-156:494066:494509 [6] NCCL INFO Topology detection : could not read /sys/devices/pci0000:00/0000:00:0c.0/../max_link_width, ignoring 2024-12-05 12:30:35.663 n124-105-156:494066:494509 [6] NCCL INFO KV Convert to int : could not find value of '' in dictionary, falling back to 60 2024-12-05 12:30:35.664 n124-105-156:494066:494509 [6] NCCL INFO === System : maxBw 370.8 totalBw 370.8 === 2024-12-05 12:30:35.664 n124-105-156:494066:494509 [6] NCCL INFO CPU/0-0 (1/1/2) 2024-12-05 12:30:35.664 n124-105-156:494066:494509 [6] NCCL INFO + PCI[24.0] - PCI/0-2000 (10b5879610b58796) 2024-12-05 12:30:35.664 n124-105-156:494066:494509 [6] NCCL INFO + PCI[24.0] - GPU/0-4000 (0) 2024-12-05 12:30:35.664 n124-105-156:494066:494509 [6] NCCL INFO + NVL[370.8] - NVS/0 2024-12-05 12:30:35.664 n124-105-156:494066:494509 [6] NCCL INFO + PCI[24.0] - GPU/0-5000 (1) 2024-12-05 12:30:35.664 n124-105-156:494066:494509 [6] NCCL INFO + NVL[370.8] - NVS/0 2024-12-05 12:30:35.664 n124-105-156:494066:494509 [6] NCCL INFO + PCI[24.0] - PCI/0-9000 (10b5879610b58796) 2024-12-05 12:30:35.664 n124-105-156:494066:494509 [6] NCCL INFO + PCI[24.0] - GPU/0-b000 (2) 2024-12-05 12:30:35.664 n124-105-156:494066:494509 [6] NCCL INFO + NVL[370.8] - NVS/0 2024-12-05 12:30:35.664 n124-105-156:494066:494509 [6] NCCL INFO + PCI[24.0] - GPU/0-c000 (3) 2024-12-05 12:30:35.664 n124-105-156:494066:494509 [6] NCCL INFO + NVL[370.8] - NVS/0 2024-12-05 12:30:35.664 n124-105-156:494066:494509 [6] NCCL INFO + PCI[12.0] - NIC/0-c0 2024-12-05 12:30:35.664 n124-105-156:494066:494509 [6] NCCL INFO + SYS[10.0] - CPU/1 2024-12-05 12:30:35.664 n124-105-156:494066:494509 [6] NCCL INFO CPU/0-1 (1/1/2) 2024-12-05 12:30:35.664 n124-105-156:494066:494509 [6] NCCL INFO + PCI[24.0] - PCI/0-82000 (10b5879610b58796) 2024-12-05 12:30:35.664 n124-105-156:494066:494509 [6] NCCL INFO + PCI[24.0] - GPU/0-84000 (4) 2024-12-05 12:30:35.664 n124-105-156:494066:494509 [6] NCCL INFO + NVL[370.8] - NVS/0 2024-12-05 12:30:35.664 n124-105-156:494066:494509 [6] NCCL INFO + PCI[24.0] - GPU/0-85000 (5) 2024-12-05 12:30:35.664 n124-105-156:494066:494509 [6] NCCL INFO + NVL[370.8] - NVS/0 2024-12-05 12:30:35.664 n124-105-156:494066:494509 [6] NCCL INFO + PCI[24.0] - PCI/0-89000 (10b5879610b58796) 2024-12-05 12:30:35.664 n124-105-156:494066:494509 [6] NCCL INFO + PCI[24.0] - GPU/0-8b000 (6) 2024-12-05 12:30:35.664 n124-105-156:494066:494509 [6] NCCL INFO + NVL[370.8] - NVS/0 2024-12-05 12:30:35.664 n124-105-156:494066:494509 [6] NCCL INFO + PCI[24.0] - GPU/0-8c000 (7) 2024-12-05 12:30:35.664 n124-105-156:494066:494509 [6] NCCL INFO + NVL[370.8] - NVS/0 2024-12-05 12:30:35.664 n124-105-156:494066:494509 [6] NCCL INFO + SYS[10.0] - CPU/0 2024-12-05 12:30:35.664 n124-105-156:494066:494509 [6] NCCL INFO ========================================== 2024-12-05 12:30:35.664 n124-105-156:494066:494509 [6] NCCL INFO GPU/4000 :GPU/0-4000 (0/5000.0/LOC) GPU/0-5000 (2/370.8/NVL) GPU/0-b000 (2/370.8/NVL) GPU/0-c000 (2/370.8/NVL) GPU/0-84000 (2/370.8/NVL) GPU/0-85000 (2/370.8/NVL) GPU/0-8b000 (2/370.8/NVL) GPU/0-8c000 (2/370.8/NVL) NVS/0-0 (1/370.8/NVL) CPU/0-0 (2/24.0/PHB) CPU/0-1 (3/10.0/SYS) 2024-12-05 12:30:35.664 n124-105-156:494066:494509 [6] NCCL INFO GPU/5000 :GPU/0-4000 (2/370.8/NVL) GPU/0-5000 (0/5000.0/LOC) GPU/0-b000 (2/370.8/NVL) GPU/0-c000 (2/370.8/NVL) GPU/0-84000 (2/370.8/NVL) GPU/0-85000 (2/370.8/NVL) GPU/0-8b000 (2/370.8/NVL) GPU/0-8c000 (2/370.8/NVL) NVS/0-0 (1/370.8/NVL) CPU/0-0 (2/24.0/PHB) CPU/0-1 (3/10.0/SYS) 2024-12-05 12:30:35.664 n124-105-156:494066:494509 [6] NCCL INFO GPU/B000 :GPU/0-4000 (2/370.8/NVL) GPU/0-5000 (2/370.8/NVL) GPU/0-b000 (0/5000.0/LOC) GPU/0-c000 (2/370.8/NVL) GPU/0-84000 (2/370.8/NVL) GPU/0-85000 (2/370.8/NVL) GPU/0-8b000 (2/370.8/NVL) GPU/0-8c000 (2/370.8/NVL) NVS/0-0 (1/370.8/NVL) CPU/0-0 (2/24.0/PHB) CPU/0-1 (3/10.0/SYS) 2024-12-05 12:30:35.664 n124-105-156:494066:494509 [6] NCCL INFO GPU/C000 :GPU/0-4000 (2/370.8/NVL) GPU/0-5000 (2/370.8/NVL) GPU/0-b000 (2/370.8/NVL) GPU/0-c000 (0/5000.0/LOC) GPU/0-84000 (2/370.8/NVL) GPU/0-85000 (2/370.8/NVL) GPU/0-8b000 (2/370.8/NVL) GPU/0-8c000 (2/370.8/NVL) NVS/0-0 (1/370.8/NVL) CPU/0-0 (2/24.0/PHB) CPU/0-1 (3/10.0/SYS) 2024-12-05 12:30:35.664 n124-105-156:494066:494509 [6] NCCL INFO GPU/84000 :GPU/0-4000 (2/370.8/NVL) GPU/0-5000 (2/370.8/NVL) GPU/0-b000 (2/370.8/NVL) GPU/0-c000 (2/370.8/NVL) GPU/0-84000 (0/5000.0/LOC) GPU/0-85000 (2/370.8/NVL) GPU/0-8b000 (2/370.8/NVL) GPU/0-8c000 (2/370.8/NVL) NVS/0-0 (1/370.8/NVL) CPU/0-0 (3/10.0/SYS) CPU/0-1 (2/24.0/PHB) 2024-12-05 12:30:35.664 n124-105-156:494066:494509 [6] NCCL INFO GPU/85000 :GPU/0-4000 (2/370.8/NVL) GPU/0-5000 (2/370.8/NVL) GPU/0-b000 (2/370.8/NVL) GPU/0-c000 (2/370.8/NVL) GPU/0-84000 (2/370.8/NVL) GPU/0-85000 (0/5000.0/LOC) GPU/0-8b000 (2/370.8/NVL) GPU/0-8c000 (2/370.8/NVL) NVS/0-0 (1/370.8/NVL) CPU/0-0 (3/10.0/SYS) CPU/0-1 (2/24.0/PHB) 2024-12-05 12:30:35.664 n124-105-156:494066:494509 [6] NCCL INFO GPU/8B000 :GPU/0-4000 (2/370.8/NVL) GPU/0-5000 (2/370.8/NVL) GPU/0-b000 (2/370.8/NVL) GPU/0-c000 (2/370.8/NVL) GPU/0-84000 (2/370.8/NVL) GPU/0-85000 (2/370.8/NVL) GPU/0-8b000 (0/5000.0/LOC) GPU/0-8c000 (2/370.8/NVL) NVS/0-0 (1/370.8/NVL) CPU/0-0 (3/10.0/SYS) CPU/0-1 (2/24.0/PHB) 2024-12-05 12:30:35.664 n124-105-156:494066:494509 [6] NCCL INFO GPU/8C000 :GPU/0-4000 (2/370.8/NVL) GPU/0-5000 (2/370.8/NVL) GPU/0-b000 (2/370.8/NVL) GPU/0-c000 (2/370.8/NVL) GPU/0-84000 (2/370.8/NVL) GPU/0-85000 (2/370.8/NVL) GPU/0-8b000 (2/370.8/NVL) GPU/0-8c000 (0/5000.0/LOC) NVS/0-0 (1/370.8/NVL) CPU/0-0 (3/10.0/SYS) CPU/0-1 (2/24.0/PHB) 2024-12-05 12:30:35.665 n124-105-156:494066:494509 [6] NCCL INFO Setting affinity for GPU 6 to ffff,ffffffff,f0000000,000000ff,ffffffff,fff00000,00000000 2024-12-05 12:30:35.665 n124-105-156:494066:494509 [6] NCCL INFO NCCL_NVLS_ENABLE set by environment to 0. 2024-12-05 12:30:35.674 n124-105-156:494066:494509 [6] NCCL INFO Pattern 4, crossNic 0, nChannels 12, bw 30.000000/30.000000, type NVL/PIX, sameChannels 1 2024-12-05 12:30:35.674 n124-105-156:494066:494509 [6] NCCL INFO 0 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.674 n124-105-156:494066:494509 [6] NCCL INFO 1 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.674 n124-105-156:494066:494509 [6] NCCL INFO 2 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.674 n124-105-156:494066:494509 [6] NCCL INFO 3 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.674 n124-105-156:494066:494509 [6] NCCL INFO 4 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.674 n124-105-156:494066:494509 [6] NCCL INFO 5 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.674 n124-105-156:494066:494509 [6] NCCL INFO 6 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.674 n124-105-156:494066:494509 [6] NCCL INFO 7 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.674 n124-105-156:494066:494509 [6] NCCL INFO 8 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.674 n124-105-156:494066:494509 [6] NCCL INFO 9 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.674 n124-105-156:494066:494509 [6] NCCL INFO 10 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.674 n124-105-156:494066:494509 [6] NCCL INFO 11 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.674 n124-105-156:494066:494509 [6] NCCL INFO Pattern 1, crossNic 0, nChannels 12, bw 30.000000/30.000000, type NVL/PIX, sameChannels 1 2024-12-05 12:30:35.674 n124-105-156:494066:494509 [6] NCCL INFO 0 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.674 n124-105-156:494066:494509 [6] NCCL INFO 1 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.674 n124-105-156:494066:494509 [6] NCCL INFO 2 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.674 n124-105-156:494066:494509 [6] NCCL INFO 3 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.674 n124-105-156:494066:494509 [6] NCCL INFO 4 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.674 n124-105-156:494066:494509 [6] NCCL INFO 5 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.674 n124-105-156:494066:494509 [6] NCCL INFO 6 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.674 n124-105-156:494066:494509 [6] NCCL INFO 7 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.674 n124-105-156:494066:494509 [6] NCCL INFO 8 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.674 n124-105-156:494066:494509 [6] NCCL INFO 9 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.674 n124-105-156:494066:494509 [6] NCCL INFO 10 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.674 n124-105-156:494066:494509 [6] NCCL INFO 11 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.700 n124-105-156:494060:494507 [0] NCCL INFO Could not find real path of /sys/class/pci_bus/fffffff/../../fffffff:ff:f 2024-12-05 12:30:35.700 n124-105-156:494060:494507 [0] NCCL INFO Topology detection : could not read /sys/devices/pci0000:00/0000:00:0c.0/max_link_speed, ignoring 2024-12-05 12:30:35.700 n124-105-156:494060:494507 [0] NCCL INFO Topology detection : could not read /sys/devices/pci0000:00/0000:00:0c.0/../max_link_speed, ignoring 2024-12-05 12:30:35.700 n124-105-156:494060:494507 [0] NCCL INFO Topology detection : could not read /sys/devices/pci0000:00/0000:00:0c.0/max_link_width, ignoring 2024-12-05 12:30:35.700 n124-105-156:494060:494507 [0] NCCL INFO Topology detection : could not read /sys/devices/pci0000:00/0000:00:0c.0/../max_link_width, ignoring 2024-12-05 12:30:35.700 n124-105-156:494060:494507 [0] NCCL INFO KV Convert to int : could not find value of '' in dictionary, falling back to 60 2024-12-05 12:30:35.702 n124-105-156:494060:494507 [0] NCCL INFO === System : maxBw 370.8 totalBw 370.8 === 2024-12-05 12:30:35.702 n124-105-156:494060:494507 [0] NCCL INFO CPU/0-0 (1/1/2) 2024-12-05 12:30:35.702 n124-105-156:494060:494507 [0] NCCL INFO + PCI[24.0] - PCI/0-2000 (10b5879610b58796) 2024-12-05 12:30:35.702 n124-105-156:494060:494507 [0] NCCL INFO + PCI[24.0] - GPU/0-4000 (0) 2024-12-05 12:30:35.702 n124-105-156:494060:494507 [0] NCCL INFO + NVL[370.8] - NVS/0 2024-12-05 12:30:35.702 n124-105-156:494060:494507 [0] NCCL INFO + PCI[24.0] - GPU/0-5000 (1) 2024-12-05 12:30:35.702 n124-105-156:494060:494507 [0] NCCL INFO + NVL[370.8] - NVS/0 2024-12-05 12:30:35.702 n124-105-156:494060:494507 [0] NCCL INFO + PCI[24.0] - PCI/0-9000 (10b5879610b58796) 2024-12-05 12:30:35.702 n124-105-156:494060:494507 [0] NCCL INFO + PCI[24.0] - GPU/0-b000 (2) 2024-12-05 12:30:35.702 n124-105-156:494060:494507 [0] NCCL INFO + NVL[370.8] - NVS/0 2024-12-05 12:30:35.702 n124-105-156:494060:494507 [0] NCCL INFO + PCI[24.0] - GPU/0-c000 (3) 2024-12-05 12:30:35.702 n124-105-156:494060:494507 [0] NCCL INFO + NVL[370.8] - NVS/0 2024-12-05 12:30:35.702 n124-105-156:494060:494507 [0] NCCL INFO + PCI[12.0] - NIC/0-c0 2024-12-05 12:30:35.702 n124-105-156:494060:494507 [0] NCCL INFO + SYS[10.0] - CPU/1 2024-12-05 12:30:35.702 n124-105-156:494060:494507 [0] NCCL INFO CPU/0-1 (1/1/2) 2024-12-05 12:30:35.702 n124-105-156:494060:494507 [0] NCCL INFO + PCI[24.0] - PCI/0-82000 (10b5879610b58796) 2024-12-05 12:30:35.702 n124-105-156:494060:494507 [0] NCCL INFO + PCI[24.0] - GPU/0-84000 (4) 2024-12-05 12:30:35.702 n124-105-156:494060:494507 [0] NCCL INFO + NVL[370.8] - NVS/0 2024-12-05 12:30:35.702 n124-105-156:494060:494507 [0] NCCL INFO + PCI[24.0] - GPU/0-85000 (5) 2024-12-05 12:30:35.702 n124-105-156:494060:494507 [0] NCCL INFO + NVL[370.8] - NVS/0 2024-12-05 12:30:35.702 n124-105-156:494060:494507 [0] NCCL INFO + PCI[24.0] - PCI/0-89000 (10b5879610b58796) 2024-12-05 12:30:35.702 n124-105-156:494060:494507 [0] NCCL INFO + PCI[24.0] - GPU/0-8b000 (6) 2024-12-05 12:30:35.702 n124-105-156:494060:494507 [0] NCCL INFO + NVL[370.8] - NVS/0 2024-12-05 12:30:35.702 n124-105-156:494060:494507 [0] NCCL INFO + PCI[24.0] - GPU/0-8c000 (7) 2024-12-05 12:30:35.702 n124-105-156:494060:494507 [0] NCCL INFO + NVL[370.8] - NVS/0 2024-12-05 12:30:35.702 n124-105-156:494060:494507 [0] NCCL INFO + SYS[10.0] - CPU/0 2024-12-05 12:30:35.702 n124-105-156:494060:494507 [0] NCCL INFO ========================================== 2024-12-05 12:30:35.702 n124-105-156:494060:494507 [0] NCCL INFO GPU/4000 :GPU/0-4000 (0/5000.0/LOC) GPU/0-5000 (2/370.8/NVL) GPU/0-b000 (2/370.8/NVL) GPU/0-c000 (2/370.8/NVL) GPU/0-84000 (2/370.8/NVL) GPU/0-85000 (2/370.8/NVL) GPU/0-8b000 (2/370.8/NVL) GPU/0-8c000 (2/370.8/NVL) NVS/0-0 (1/370.8/NVL) CPU/0-0 (2/24.0/PHB) CPU/0-1 (3/10.0/SYS) 2024-12-05 12:30:35.702 n124-105-156:494060:494507 [0] NCCL INFO GPU/5000 :GPU/0-4000 (2/370.8/NVL) GPU/0-5000 (0/5000.0/LOC) GPU/0-b000 (2/370.8/NVL) GPU/0-c000 (2/370.8/NVL) GPU/0-84000 (2/370.8/NVL) GPU/0-85000 (2/370.8/NVL) GPU/0-8b000 (2/370.8/NVL) GPU/0-8c000 (2/370.8/NVL) NVS/0-0 (1/370.8/NVL) CPU/0-0 (2/24.0/PHB) CPU/0-1 (3/10.0/SYS) 2024-12-05 12:30:35.702 n124-105-156:494060:494507 [0] NCCL INFO GPU/B000 :GPU/0-4000 (2/370.8/NVL) GPU/0-5000 (2/370.8/NVL) GPU/0-b000 (0/5000.0/LOC) GPU/0-c000 (2/370.8/NVL) GPU/0-84000 (2/370.8/NVL) GPU/0-85000 (2/370.8/NVL) GPU/0-8b000 (2/370.8/NVL) GPU/0-8c000 (2/370.8/NVL) NVS/0-0 (1/370.8/NVL) CPU/0-0 (2/24.0/PHB) CPU/0-1 (3/10.0/SYS) 2024-12-05 12:30:35.702 n124-105-156:494060:494507 [0] NCCL INFO GPU/C000 :GPU/0-4000 (2/370.8/NVL) GPU/0-5000 (2/370.8/NVL) GPU/0-b000 (2/370.8/NVL) GPU/0-c000 (0/5000.0/LOC) GPU/0-84000 (2/370.8/NVL) GPU/0-85000 (2/370.8/NVL) GPU/0-8b000 (2/370.8/NVL) GPU/0-8c000 (2/370.8/NVL) NVS/0-0 (1/370.8/NVL) CPU/0-0 (2/24.0/PHB) CPU/0-1 (3/10.0/SYS) 2024-12-05 12:30:35.702 n124-105-156:494060:494507 [0] NCCL INFO GPU/84000 :GPU/0-4000 (2/370.8/NVL) GPU/0-5000 (2/370.8/NVL) GPU/0-b000 (2/370.8/NVL) GPU/0-c000 (2/370.8/NVL) GPU/0-84000 (0/5000.0/LOC) GPU/0-85000 (2/370.8/NVL) GPU/0-8b000 (2/370.8/NVL) GPU/0-8c000 (2/370.8/NVL) NVS/0-0 (1/370.8/NVL) CPU/0-0 (3/10.0/SYS) CPU/0-1 (2/24.0/PHB) 2024-12-05 12:30:35.702 n124-105-156:494060:494507 [0] NCCL INFO GPU/85000 :GPU/0-4000 (2/370.8/NVL) GPU/0-5000 (2/370.8/NVL) GPU/0-b000 (2/370.8/NVL) GPU/0-c000 (2/370.8/NVL) GPU/0-84000 (2/370.8/NVL) GPU/0-85000 (0/5000.0/LOC) GPU/0-8b000 (2/370.8/NVL) GPU/0-8c000 (2/370.8/NVL) NVS/0-0 (1/370.8/NVL) CPU/0-0 (3/10.0/SYS) CPU/0-1 (2/24.0/PHB) 2024-12-05 12:30:35.702 n124-105-156:494060:494507 [0] NCCL INFO GPU/8B000 :GPU/0-4000 (2/370.8/NVL) GPU/0-5000 (2/370.8/NVL) GPU/0-b000 (2/370.8/NVL) GPU/0-c000 (2/370.8/NVL) GPU/0-84000 (2/370.8/NVL) GPU/0-85000 (2/370.8/NVL) GPU/0-8b000 (0/5000.0/LOC) GPU/0-8c000 (2/370.8/NVL) NVS/0-0 (1/370.8/NVL) CPU/0-0 (3/10.0/SYS) CPU/0-1 (2/24.0/PHB) 2024-12-05 12:30:35.702 n124-105-156:494060:494507 [0] NCCL INFO GPU/8C000 :GPU/0-4000 (2/370.8/NVL) GPU/0-5000 (2/370.8/NVL) GPU/0-b000 (2/370.8/NVL) GPU/0-c000 (2/370.8/NVL) GPU/0-84000 (2/370.8/NVL) GPU/0-85000 (2/370.8/NVL) GPU/0-8b000 (2/370.8/NVL) GPU/0-8c000 (0/5000.0/LOC) NVS/0-0 (1/370.8/NVL) CPU/0-0 (3/10.0/SYS) CPU/0-1 (2/24.0/PHB) 2024-12-05 12:30:35.702 n124-105-156:494060:494507 [0] NCCL INFO Setting affinity for GPU 0 to 0fffffff,ffffff00,00000000,000fffff,ffffffff 2024-12-05 12:30:35.702 n124-105-156:494060:494507 [0] NCCL INFO NCCL_NVLS_ENABLE set by environment to 0. 2024-12-05 12:30:35.706 n124-105-156:494065:494510 [5] NCCL INFO Could not find real path of /sys/class/pci_bus/fffffff/../../fffffff:ff:f 2024-12-05 12:30:35.706 n124-105-156:494065:494510 [5] NCCL INFO Topology detection : could not read /sys/devices/pci0000:00/0000:00:0c.0/max_link_speed, ignoring 2024-12-05 12:30:35.706 n124-105-156:494065:494510 [5] NCCL INFO Topology detection : could not read /sys/devices/pci0000:00/0000:00:0c.0/../max_link_speed, ignoring 2024-12-05 12:30:35.706 n124-105-156:494065:494510 [5] NCCL INFO Topology detection : could not read /sys/devices/pci0000:00/0000:00:0c.0/max_link_width, ignoring 2024-12-05 12:30:35.706 n124-105-156:494065:494510 [5] NCCL INFO Topology detection : could not read /sys/devices/pci0000:00/0000:00:0c.0/../max_link_width, ignoring 2024-12-05 12:30:35.706 n124-105-156:494065:494510 [5] NCCL INFO KV Convert to int : could not find value of '' in dictionary, falling back to 60 2024-12-05 12:30:35.707 n124-105-156:494065:494510 [5] NCCL INFO === System : maxBw 370.8 totalBw 370.8 === 2024-12-05 12:30:35.707 n124-105-156:494065:494510 [5] NCCL INFO CPU/0-0 (1/1/2) 2024-12-05 12:30:35.707 n124-105-156:494065:494510 [5] NCCL INFO + PCI[24.0] - PCI/0-2000 (10b5879610b58796) 2024-12-05 12:30:35.707 n124-105-156:494065:494510 [5] NCCL INFO + PCI[24.0] - GPU/0-4000 (0) 2024-12-05 12:30:35.707 n124-105-156:494065:494510 [5] NCCL INFO + NVL[370.8] - NVS/0 2024-12-05 12:30:35.707 n124-105-156:494065:494510 [5] NCCL INFO + PCI[24.0] - GPU/0-5000 (1) 2024-12-05 12:30:35.707 n124-105-156:494065:494510 [5] NCCL INFO + NVL[370.8] - NVS/0 2024-12-05 12:30:35.707 n124-105-156:494065:494510 [5] NCCL INFO + PCI[24.0] - PCI/0-9000 (10b5879610b58796) 2024-12-05 12:30:35.707 n124-105-156:494065:494510 [5] NCCL INFO + PCI[24.0] - GPU/0-b000 (2) 2024-12-05 12:30:35.707 n124-105-156:494065:494510 [5] NCCL INFO + NVL[370.8] - NVS/0 2024-12-05 12:30:35.707 n124-105-156:494065:494510 [5] NCCL INFO + PCI[24.0] - GPU/0-c000 (3) 2024-12-05 12:30:35.707 n124-105-156:494065:494510 [5] NCCL INFO + NVL[370.8] - NVS/0 2024-12-05 12:30:35.707 n124-105-156:494065:494510 [5] NCCL INFO + PCI[12.0] - NIC/0-c0 2024-12-05 12:30:35.707 n124-105-156:494065:494510 [5] NCCL INFO + SYS[10.0] - CPU/1 2024-12-05 12:30:35.707 n124-105-156:494065:494510 [5] NCCL INFO CPU/0-1 (1/1/2) 2024-12-05 12:30:35.707 n124-105-156:494065:494510 [5] NCCL INFO + PCI[24.0] - PCI/0-82000 (10b5879610b58796) 2024-12-05 12:30:35.707 n124-105-156:494065:494510 [5] NCCL INFO + PCI[24.0] - GPU/0-84000 (4) 2024-12-05 12:30:35.707 n124-105-156:494065:494510 [5] NCCL INFO + NVL[370.8] - NVS/0 2024-12-05 12:30:35.707 n124-105-156:494065:494510 [5] NCCL INFO + PCI[24.0] - GPU/0-85000 (5) 2024-12-05 12:30:35.707 n124-105-156:494065:494510 [5] NCCL INFO + NVL[370.8] - NVS/0 2024-12-05 12:30:35.707 n124-105-156:494065:494510 [5] NCCL INFO + PCI[24.0] - PCI/0-89000 (10b5879610b58796) 2024-12-05 12:30:35.707 n124-105-156:494065:494510 [5] NCCL INFO + PCI[24.0] - GPU/0-8b000 (6) 2024-12-05 12:30:35.707 n124-105-156:494065:494510 [5] NCCL INFO + NVL[370.8] - NVS/0 2024-12-05 12:30:35.707 n124-105-156:494065:494510 [5] NCCL INFO + PCI[24.0] - GPU/0-8c000 (7) 2024-12-05 12:30:35.707 n124-105-156:494065:494510 [5] NCCL INFO + NVL[370.8] - NVS/0 2024-12-05 12:30:35.707 n124-105-156:494065:494510 [5] NCCL INFO + SYS[10.0] - CPU/0 2024-12-05 12:30:35.707 n124-105-156:494065:494510 [5] NCCL INFO ========================================== 2024-12-05 12:30:35.707 n124-105-156:494065:494510 [5] NCCL INFO GPU/4000 :GPU/0-4000 (0/5000.0/LOC) GPU/0-5000 (2/370.8/NVL) GPU/0-b000 (2/370.8/NVL) GPU/0-c000 (2/370.8/NVL) GPU/0-84000 (2/370.8/NVL) GPU/0-85000 (2/370.8/NVL) GPU/0-8b000 (2/370.8/NVL) GPU/0-8c000 (2/370.8/NVL) NVS/0-0 (1/370.8/NVL) CPU/0-0 (2/24.0/PHB) CPU/0-1 (3/10.0/SYS) 2024-12-05 12:30:35.707 n124-105-156:494065:494510 [5] NCCL INFO GPU/5000 :GPU/0-4000 (2/370.8/NVL) GPU/0-5000 (0/5000.0/LOC) GPU/0-b000 (2/370.8/NVL) GPU/0-c000 (2/370.8/NVL) GPU/0-84000 (2/370.8/NVL) GPU/0-85000 (2/370.8/NVL) GPU/0-8b000 (2/370.8/NVL) GPU/0-8c000 (2/370.8/NVL) NVS/0-0 (1/370.8/NVL) CPU/0-0 (2/24.0/PHB) CPU/0-1 (3/10.0/SYS) 2024-12-05 12:30:35.707 n124-105-156:494065:494510 [5] NCCL INFO GPU/B000 :GPU/0-4000 (2/370.8/NVL) GPU/0-5000 (2/370.8/NVL) GPU/0-b000 (0/5000.0/LOC) GPU/0-c000 (2/370.8/NVL) GPU/0-84000 (2/370.8/NVL) GPU/0-85000 (2/370.8/NVL) GPU/0-8b000 (2/370.8/NVL) GPU/0-8c000 (2/370.8/NVL) NVS/0-0 (1/370.8/NVL) CPU/0-0 (2/24.0/PHB) CPU/0-1 (3/10.0/SYS) 2024-12-05 12:30:35.707 n124-105-156:494065:494510 [5] NCCL INFO GPU/C000 :GPU/0-4000 (2/370.8/NVL) GPU/0-5000 (2/370.8/NVL) GPU/0-b000 (2/370.8/NVL) GPU/0-c000 (0/5000.0/LOC) GPU/0-84000 (2/370.8/NVL) GPU/0-85000 (2/370.8/NVL) GPU/0-8b000 (2/370.8/NVL) GPU/0-8c000 (2/370.8/NVL) NVS/0-0 (1/370.8/NVL) CPU/0-0 (2/24.0/PHB) CPU/0-1 (3/10.0/SYS) 2024-12-05 12:30:35.707 n124-105-156:494065:494510 [5] NCCL INFO GPU/84000 :GPU/0-4000 (2/370.8/NVL) GPU/0-5000 (2/370.8/NVL) GPU/0-b000 (2/370.8/NVL) GPU/0-c000 (2/370.8/NVL) GPU/0-84000 (0/5000.0/LOC) GPU/0-85000 (2/370.8/NVL) GPU/0-8b000 (2/370.8/NVL) GPU/0-8c000 (2/370.8/NVL) NVS/0-0 (1/370.8/NVL) CPU/0-0 (3/10.0/SYS) CPU/0-1 (2/24.0/PHB) 2024-12-05 12:30:35.707 n124-105-156:494065:494510 [5] NCCL INFO GPU/85000 :GPU/0-4000 (2/370.8/NVL) GPU/0-5000 (2/370.8/NVL) GPU/0-b000 (2/370.8/NVL) GPU/0-c000 (2/370.8/NVL) GPU/0-84000 (2/370.8/NVL) GPU/0-85000 (0/5000.0/LOC) GPU/0-8b000 (2/370.8/NVL) GPU/0-8c000 (2/370.8/NVL) NVS/0-0 (1/370.8/NVL) CPU/0-0 (3/10.0/SYS) CPU/0-1 (2/24.0/PHB) 2024-12-05 12:30:35.707 n124-105-156:494065:494510 [5] NCCL INFO GPU/8B000 :GPU/0-4000 (2/370.8/NVL) GPU/0-5000 (2/370.8/NVL) GPU/0-b000 (2/370.8/NVL) GPU/0-c000 (2/370.8/NVL) GPU/0-84000 (2/370.8/NVL) GPU/0-85000 (2/370.8/NVL) GPU/0-8b000 (0/5000.0/LOC) GPU/0-8c000 (2/370.8/NVL) NVS/0-0 (1/370.8/NVL) CPU/0-0 (3/10.0/SYS) CPU/0-1 (2/24.0/PHB) 2024-12-05 12:30:35.707 n124-105-156:494065:494510 [5] NCCL INFO GPU/8C000 :GPU/0-4000 (2/370.8/NVL) GPU/0-5000 (2/370.8/NVL) GPU/0-b000 (2/370.8/NVL) GPU/0-c000 (2/370.8/NVL) GPU/0-84000 (2/370.8/NVL) GPU/0-85000 (2/370.8/NVL) GPU/0-8b000 (2/370.8/NVL) GPU/0-8c000 (0/5000.0/LOC) NVS/0-0 (1/370.8/NVL) CPU/0-0 (3/10.0/SYS) CPU/0-1 (2/24.0/PHB) 2024-12-05 12:30:35.707 n124-105-156:494065:494510 [5] NCCL INFO Setting affinity for GPU 5 to ffff,ffffffff,f0000000,000000ff,ffffffff,fff00000,00000000 2024-12-05 12:30:35.707 n124-105-156:494065:494510 [5] NCCL INFO NCCL_NVLS_ENABLE set by environment to 0. 2024-12-05 12:30:35.709 n124-105-156:494060:494507 [0] NCCL INFO Pattern 4, crossNic 0, nChannels 12, bw 30.000000/30.000000, type NVL/PIX, sameChannels 1 2024-12-05 12:30:35.709 n124-105-156:494060:494507 [0] NCCL INFO 0 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.709 n124-105-156:494060:494507 [0] NCCL INFO 1 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.709 n124-105-156:494060:494507 [0] NCCL INFO 2 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.709 n124-105-156:494060:494507 [0] NCCL INFO 3 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.709 n124-105-156:494060:494507 [0] NCCL INFO 4 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.709 n124-105-156:494060:494507 [0] NCCL INFO 5 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.709 n124-105-156:494060:494507 [0] NCCL INFO 6 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.709 n124-105-156:494060:494507 [0] NCCL INFO 7 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.709 n124-105-156:494060:494507 [0] NCCL INFO 8 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.709 n124-105-156:494060:494507 [0] NCCL INFO 9 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.709 n124-105-156:494060:494507 [0] NCCL INFO 10 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.709 n124-105-156:494060:494507 [0] NCCL INFO 11 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.709 n124-105-156:494060:494507 [0] NCCL INFO Pattern 1, crossNic 0, nChannels 12, bw 30.000000/30.000000, type NVL/PIX, sameChannels 1 2024-12-05 12:30:35.709 n124-105-156:494060:494507 [0] NCCL INFO 0 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.709 n124-105-156:494060:494507 [0] NCCL INFO 1 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.709 n124-105-156:494060:494507 [0] NCCL INFO 2 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.709 n124-105-156:494060:494507 [0] NCCL INFO 3 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.709 n124-105-156:494060:494507 [0] NCCL INFO 4 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.709 n124-105-156:494060:494507 [0] NCCL INFO 5 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.709 n124-105-156:494060:494507 [0] NCCL INFO 6 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.709 n124-105-156:494060:494507 [0] NCCL INFO 7 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.709 n124-105-156:494060:494507 [0] NCCL INFO 8 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.709 n124-105-156:494060:494507 [0] NCCL INFO 9 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.709 n124-105-156:494060:494507 [0] NCCL INFO 10 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.709 n124-105-156:494060:494507 [0] NCCL INFO 11 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.714 n124-105-156:494065:494510 [5] NCCL INFO Pattern 4, crossNic 0, nChannels 12, bw 30.000000/30.000000, type NVL/PIX, sameChannels 1 2024-12-05 12:30:35.714 n124-105-156:494065:494510 [5] NCCL INFO 0 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.714 n124-105-156:494065:494510 [5] NCCL INFO 1 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.714 n124-105-156:494065:494510 [5] NCCL INFO 2 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.714 n124-105-156:494065:494510 [5] NCCL INFO 3 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.714 n124-105-156:494065:494510 [5] NCCL INFO 4 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.714 n124-105-156:494065:494510 [5] NCCL INFO 5 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.714 n124-105-156:494065:494510 [5] NCCL INFO 6 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.714 n124-105-156:494065:494510 [5] NCCL INFO 7 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.714 n124-105-156:494065:494510 [5] NCCL INFO 8 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.714 n124-105-156:494065:494510 [5] NCCL INFO 9 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.714 n124-105-156:494065:494510 [5] NCCL INFO 10 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.714 n124-105-156:494065:494510 [5] NCCL INFO 11 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.714 n124-105-156:494065:494510 [5] NCCL INFO Pattern 1, crossNic 0, nChannels 12, bw 30.000000/30.000000, type NVL/PIX, sameChannels 1 2024-12-05 12:30:35.714 n124-105-156:494065:494510 [5] NCCL INFO 0 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.714 n124-105-156:494065:494510 [5] NCCL INFO 1 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.714 n124-105-156:494065:494510 [5] NCCL INFO 2 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.714 n124-105-156:494065:494510 [5] NCCL INFO 3 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.714 n124-105-156:494065:494510 [5] NCCL INFO 4 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.714 n124-105-156:494065:494510 [5] NCCL INFO 5 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.714 n124-105-156:494065:494510 [5] NCCL INFO 6 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.714 n124-105-156:494065:494510 [5] NCCL INFO 7 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.714 n124-105-156:494065:494510 [5] NCCL INFO 8 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.714 n124-105-156:494065:494510 [5] NCCL INFO 9 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.714 n124-105-156:494065:494510 [5] NCCL INFO 10 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.714 n124-105-156:494065:494510 [5] NCCL INFO 11 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.739 n124-105-156:494061:494511 [1] NCCL INFO Could not find real path of /sys/class/pci_bus/fffffff/../../fffffff:ff:f 2024-12-05 12:30:35.739 n124-105-156:494061:494511 [1] NCCL INFO Topology detection : could not read /sys/devices/pci0000:00/0000:00:0c.0/max_link_speed, ignoring 2024-12-05 12:30:35.739 n124-105-156:494061:494511 [1] NCCL INFO Topology detection : could not read /sys/devices/pci0000:00/0000:00:0c.0/../max_link_speed, ignoring 2024-12-05 12:30:35.739 n124-105-156:494061:494511 [1] NCCL INFO Topology detection : could not read /sys/devices/pci0000:00/0000:00:0c.0/max_link_width, ignoring 2024-12-05 12:30:35.739 n124-105-156:494061:494511 [1] NCCL INFO Topology detection : could not read /sys/devices/pci0000:00/0000:00:0c.0/../max_link_width, ignoring 2024-12-05 12:30:35.739 n124-105-156:494061:494511 [1] NCCL INFO KV Convert to int : could not find value of '' in dictionary, falling back to 60 2024-12-05 12:30:35.740 n124-105-156:494061:494511 [1] NCCL INFO === System : maxBw 370.8 totalBw 370.8 === 2024-12-05 12:30:35.740 n124-105-156:494061:494511 [1] NCCL INFO CPU/0-0 (1/1/2) 2024-12-05 12:30:35.740 n124-105-156:494061:494511 [1] NCCL INFO + PCI[24.0] - PCI/0-2000 (10b5879610b58796) 2024-12-05 12:30:35.740 n124-105-156:494061:494511 [1] NCCL INFO + PCI[24.0] - GPU/0-4000 (0) 2024-12-05 12:30:35.740 n124-105-156:494061:494511 [1] NCCL INFO + NVL[370.8] - NVS/0 2024-12-05 12:30:35.740 n124-105-156:494061:494511 [1] NCCL INFO + PCI[24.0] - GPU/0-5000 (1) 2024-12-05 12:30:35.740 n124-105-156:494061:494511 [1] NCCL INFO + NVL[370.8] - NVS/0 2024-12-05 12:30:35.740 n124-105-156:494061:494511 [1] NCCL INFO + PCI[24.0] - PCI/0-9000 (10b5879610b58796) 2024-12-05 12:30:35.740 n124-105-156:494061:494511 [1] NCCL INFO + PCI[24.0] - GPU/0-b000 (2) 2024-12-05 12:30:35.740 n124-105-156:494061:494511 [1] NCCL INFO + NVL[370.8] - NVS/0 2024-12-05 12:30:35.740 n124-105-156:494061:494511 [1] NCCL INFO + PCI[24.0] - GPU/0-c000 (3) 2024-12-05 12:30:35.740 n124-105-156:494061:494511 [1] NCCL INFO + NVL[370.8] - NVS/0 2024-12-05 12:30:35.740 n124-105-156:494061:494511 [1] NCCL INFO + PCI[12.0] - NIC/0-c0 2024-12-05 12:30:35.740 n124-105-156:494061:494511 [1] NCCL INFO + SYS[10.0] - CPU/1 2024-12-05 12:30:35.740 n124-105-156:494061:494511 [1] NCCL INFO CPU/0-1 (1/1/2) 2024-12-05 12:30:35.740 n124-105-156:494061:494511 [1] NCCL INFO + PCI[24.0] - PCI/0-82000 (10b5879610b58796) 2024-12-05 12:30:35.740 n124-105-156:494061:494511 [1] NCCL INFO + PCI[24.0] - GPU/0-84000 (4) 2024-12-05 12:30:35.740 n124-105-156:494061:494511 [1] NCCL INFO + NVL[370.8] - NVS/0 2024-12-05 12:30:35.740 n124-105-156:494061:494511 [1] NCCL INFO + PCI[24.0] - GPU/0-85000 (5) 2024-12-05 12:30:35.740 n124-105-156:494061:494511 [1] NCCL INFO + NVL[370.8] - NVS/0 2024-12-05 12:30:35.740 n124-105-156:494061:494511 [1] NCCL INFO + PCI[24.0] - PCI/0-89000 (10b5879610b58796) 2024-12-05 12:30:35.740 n124-105-156:494061:494511 [1] NCCL INFO + PCI[24.0] - GPU/0-8b000 (6) 2024-12-05 12:30:35.740 n124-105-156:494061:494511 [1] NCCL INFO + NVL[370.8] - NVS/0 2024-12-05 12:30:35.740 n124-105-156:494061:494511 [1] NCCL INFO + PCI[24.0] - GPU/0-8c000 (7) 2024-12-05 12:30:35.740 n124-105-156:494061:494511 [1] NCCL INFO + NVL[370.8] - NVS/0 2024-12-05 12:30:35.740 n124-105-156:494061:494511 [1] NCCL INFO + SYS[10.0] - CPU/0 2024-12-05 12:30:35.740 n124-105-156:494061:494511 [1] NCCL INFO ========================================== 2024-12-05 12:30:35.740 n124-105-156:494061:494511 [1] NCCL INFO GPU/4000 :GPU/0-4000 (0/5000.0/LOC) GPU/0-5000 (2/370.8/NVL) GPU/0-b000 (2/370.8/NVL) GPU/0-c000 (2/370.8/NVL) GPU/0-84000 (2/370.8/NVL) GPU/0-85000 (2/370.8/NVL) GPU/0-8b000 (2/370.8/NVL) GPU/0-8c000 (2/370.8/NVL) NVS/0-0 (1/370.8/NVL) CPU/0-0 (2/24.0/PHB) CPU/0-1 (3/10.0/SYS) 2024-12-05 12:30:35.740 n124-105-156:494061:494511 [1] NCCL INFO GPU/5000 :GPU/0-4000 (2/370.8/NVL) GPU/0-5000 (0/5000.0/LOC) GPU/0-b000 (2/370.8/NVL) GPU/0-c000 (2/370.8/NVL) GPU/0-84000 (2/370.8/NVL) GPU/0-85000 (2/370.8/NVL) GPU/0-8b000 (2/370.8/NVL) GPU/0-8c000 (2/370.8/NVL) NVS/0-0 (1/370.8/NVL) CPU/0-0 (2/24.0/PHB) CPU/0-1 (3/10.0/SYS) 2024-12-05 12:30:35.740 n124-105-156:494061:494511 [1] NCCL INFO GPU/B000 :GPU/0-4000 (2/370.8/NVL) GPU/0-5000 (2/370.8/NVL) GPU/0-b000 (0/5000.0/LOC) GPU/0-c000 (2/370.8/NVL) GPU/0-84000 (2/370.8/NVL) GPU/0-85000 (2/370.8/NVL) GPU/0-8b000 (2/370.8/NVL) GPU/0-8c000 (2/370.8/NVL) NVS/0-0 (1/370.8/NVL) CPU/0-0 (2/24.0/PHB) CPU/0-1 (3/10.0/SYS) 2024-12-05 12:30:35.740 n124-105-156:494061:494511 [1] NCCL INFO GPU/C000 :GPU/0-4000 (2/370.8/NVL) GPU/0-5000 (2/370.8/NVL) GPU/0-b000 (2/370.8/NVL) GPU/0-c000 (0/5000.0/LOC) GPU/0-84000 (2/370.8/NVL) GPU/0-85000 (2/370.8/NVL) GPU/0-8b000 (2/370.8/NVL) GPU/0-8c000 (2/370.8/NVL) NVS/0-0 (1/370.8/NVL) CPU/0-0 (2/24.0/PHB) CPU/0-1 (3/10.0/SYS) 2024-12-05 12:30:35.740 n124-105-156:494061:494511 [1] NCCL INFO GPU/84000 :GPU/0-4000 (2/370.8/NVL) GPU/0-5000 (2/370.8/NVL) GPU/0-b000 (2/370.8/NVL) GPU/0-c000 (2/370.8/NVL) GPU/0-84000 (0/5000.0/LOC) GPU/0-85000 (2/370.8/NVL) GPU/0-8b000 (2/370.8/NVL) GPU/0-8c000 (2/370.8/NVL) NVS/0-0 (1/370.8/NVL) CPU/0-0 (3/10.0/SYS) CPU/0-1 (2/24.0/PHB) 2024-12-05 12:30:35.741 n124-105-156:494061:494511 [1] NCCL INFO GPU/85000 :GPU/0-4000 (2/370.8/NVL) GPU/0-5000 (2/370.8/NVL) GPU/0-b000 (2/370.8/NVL) GPU/0-c000 (2/370.8/NVL) GPU/0-84000 (2/370.8/NVL) GPU/0-85000 (0/5000.0/LOC) GPU/0-8b000 (2/370.8/NVL) GPU/0-8c000 (2/370.8/NVL) NVS/0-0 (1/370.8/NVL) CPU/0-0 (3/10.0/SYS) CPU/0-1 (2/24.0/PHB) 2024-12-05 12:30:35.741 n124-105-156:494061:494511 [1] NCCL INFO GPU/8B000 :GPU/0-4000 (2/370.8/NVL) GPU/0-5000 (2/370.8/NVL) GPU/0-b000 (2/370.8/NVL) GPU/0-c000 (2/370.8/NVL) GPU/0-84000 (2/370.8/NVL) GPU/0-85000 (2/370.8/NVL) GPU/0-8b000 (0/5000.0/LOC) GPU/0-8c000 (2/370.8/NVL) NVS/0-0 (1/370.8/NVL) CPU/0-0 (3/10.0/SYS) CPU/0-1 (2/24.0/PHB) 2024-12-05 12:30:35.741 n124-105-156:494061:494511 [1] NCCL INFO GPU/8C000 :GPU/0-4000 (2/370.8/NVL) GPU/0-5000 (2/370.8/NVL) GPU/0-b000 (2/370.8/NVL) GPU/0-c000 (2/370.8/NVL) GPU/0-84000 (2/370.8/NVL) GPU/0-85000 (2/370.8/NVL) GPU/0-8b000 (2/370.8/NVL) GPU/0-8c000 (0/5000.0/LOC) NVS/0-0 (1/370.8/NVL) CPU/0-0 (3/10.0/SYS) CPU/0-1 (2/24.0/PHB) 2024-12-05 12:30:35.741 n124-105-156:494061:494511 [1] NCCL INFO Setting affinity for GPU 1 to 0fffffff,ffffff00,00000000,000fffff,ffffffff 2024-12-05 12:30:35.741 n124-105-156:494061:494511 [1] NCCL INFO NCCL_NVLS_ENABLE set by environment to 0. 2024-12-05 12:30:35.748 n124-105-156:494061:494511 [1] NCCL INFO Pattern 4, crossNic 0, nChannels 12, bw 30.000000/30.000000, type NVL/PIX, sameChannels 1 2024-12-05 12:30:35.748 n124-105-156:494061:494511 [1] NCCL INFO 0 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.748 n124-105-156:494061:494511 [1] NCCL INFO 1 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.748 n124-105-156:494061:494511 [1] NCCL INFO 2 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.748 n124-105-156:494061:494511 [1] NCCL INFO 3 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.748 n124-105-156:494061:494511 [1] NCCL INFO 4 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.748 n124-105-156:494061:494511 [1] NCCL INFO 5 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.748 n124-105-156:494061:494511 [1] NCCL INFO 6 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.748 n124-105-156:494061:494511 [1] NCCL INFO 7 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.748 n124-105-156:494061:494511 [1] NCCL INFO 8 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.748 n124-105-156:494061:494511 [1] NCCL INFO 9 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.748 n124-105-156:494061:494511 [1] NCCL INFO 10 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.748 n124-105-156:494061:494511 [1] NCCL INFO 11 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.748 n124-105-156:494061:494511 [1] NCCL INFO Pattern 1, crossNic 0, nChannels 12, bw 30.000000/30.000000, type NVL/PIX, sameChannels 1 2024-12-05 12:30:35.748 n124-105-156:494061:494511 [1] NCCL INFO 0 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.748 n124-105-156:494061:494511 [1] NCCL INFO 1 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.748 n124-105-156:494061:494511 [1] NCCL INFO 2 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.748 n124-105-156:494061:494511 [1] NCCL INFO 3 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.748 n124-105-156:494061:494511 [1] NCCL INFO 4 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.748 n124-105-156:494061:494511 [1] NCCL INFO 5 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.748 n124-105-156:494061:494511 [1] NCCL INFO 6 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.748 n124-105-156:494061:494511 [1] NCCL INFO 7 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.748 n124-105-156:494061:494511 [1] NCCL INFO 8 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.748 n124-105-156:494061:494511 [1] NCCL INFO 9 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.748 n124-105-156:494061:494511 [1] NCCL INFO 10 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.748 n124-105-156:494061:494511 [1] NCCL INFO 11 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.810 n124-105-156:494064:494508 [4] NCCL INFO Could not find real path of /sys/class/pci_bus/fffffff/../../fffffff:ff:f 2024-12-05 12:30:35.810 n124-105-156:494064:494508 [4] NCCL INFO Topology detection : could not read /sys/devices/pci0000:00/0000:00:0c.0/max_link_speed, ignoring 2024-12-05 12:30:35.810 n124-105-156:494064:494508 [4] NCCL INFO Topology detection : could not read /sys/devices/pci0000:00/0000:00:0c.0/../max_link_speed, ignoring 2024-12-05 12:30:35.810 n124-105-156:494064:494508 [4] NCCL INFO Topology detection : could not read /sys/devices/pci0000:00/0000:00:0c.0/max_link_width, ignoring 2024-12-05 12:30:35.810 n124-105-156:494064:494508 [4] NCCL INFO Topology detection : could not read /sys/devices/pci0000:00/0000:00:0c.0/../max_link_width, ignoring 2024-12-05 12:30:35.810 n124-105-156:494064:494508 [4] NCCL INFO KV Convert to int : could not find value of '' in dictionary, falling back to 60 2024-12-05 12:30:35.811 n124-105-156:494064:494508 [4] NCCL INFO === System : maxBw 370.8 totalBw 370.8 === 2024-12-05 12:30:35.811 n124-105-156:494064:494508 [4] NCCL INFO CPU/0-0 (1/1/2) 2024-12-05 12:30:35.811 n124-105-156:494064:494508 [4] NCCL INFO + PCI[24.0] - PCI/0-2000 (10b5879610b58796) 2024-12-05 12:30:35.811 n124-105-156:494064:494508 [4] NCCL INFO + PCI[24.0] - GPU/0-4000 (0) 2024-12-05 12:30:35.811 n124-105-156:494064:494508 [4] NCCL INFO + NVL[370.8] - NVS/0 2024-12-05 12:30:35.811 n124-105-156:494064:494508 [4] NCCL INFO + PCI[24.0] - GPU/0-5000 (1) 2024-12-05 12:30:35.811 n124-105-156:494064:494508 [4] NCCL INFO + NVL[370.8] - NVS/0 2024-12-05 12:30:35.811 n124-105-156:494064:494508 [4] NCCL INFO + PCI[24.0] - PCI/0-9000 (10b5879610b58796) 2024-12-05 12:30:35.811 n124-105-156:494064:494508 [4] NCCL INFO + PCI[24.0] - GPU/0-b000 (2) 2024-12-05 12:30:35.811 n124-105-156:494064:494508 [4] NCCL INFO + NVL[370.8] - NVS/0 2024-12-05 12:30:35.811 n124-105-156:494064:494508 [4] NCCL INFO + PCI[24.0] - GPU/0-c000 (3) 2024-12-05 12:30:35.811 n124-105-156:494064:494508 [4] NCCL INFO + NVL[370.8] - NVS/0 2024-12-05 12:30:35.811 n124-105-156:494064:494508 [4] NCCL INFO + PCI[12.0] - NIC/0-c0 2024-12-05 12:30:35.811 n124-105-156:494064:494508 [4] NCCL INFO + SYS[10.0] - CPU/1 2024-12-05 12:30:35.811 n124-105-156:494064:494508 [4] NCCL INFO CPU/0-1 (1/1/2) 2024-12-05 12:30:35.811 n124-105-156:494064:494508 [4] NCCL INFO + PCI[24.0] - PCI/0-82000 (10b5879610b58796) 2024-12-05 12:30:35.811 n124-105-156:494064:494508 [4] NCCL INFO + PCI[24.0] - GPU/0-84000 (4) 2024-12-05 12:30:35.811 n124-105-156:494064:494508 [4] NCCL INFO + NVL[370.8] - NVS/0 2024-12-05 12:30:35.811 n124-105-156:494064:494508 [4] NCCL INFO + PCI[24.0] - GPU/0-85000 (5) 2024-12-05 12:30:35.811 n124-105-156:494064:494508 [4] NCCL INFO + NVL[370.8] - NVS/0 2024-12-05 12:30:35.811 n124-105-156:494064:494508 [4] NCCL INFO + PCI[24.0] - PCI/0-89000 (10b5879610b58796) 2024-12-05 12:30:35.811 n124-105-156:494064:494508 [4] NCCL INFO + PCI[24.0] - GPU/0-8b000 (6) 2024-12-05 12:30:35.811 n124-105-156:494064:494508 [4] NCCL INFO + NVL[370.8] - NVS/0 2024-12-05 12:30:35.811 n124-105-156:494064:494508 [4] NCCL INFO + PCI[24.0] - GPU/0-8c000 (7) 2024-12-05 12:30:35.811 n124-105-156:494064:494508 [4] NCCL INFO + NVL[370.8] - NVS/0 2024-12-05 12:30:35.811 n124-105-156:494064:494508 [4] NCCL INFO + SYS[10.0] - CPU/0 2024-12-05 12:30:35.811 n124-105-156:494064:494508 [4] NCCL INFO ========================================== 2024-12-05 12:30:35.811 n124-105-156:494064:494508 [4] NCCL INFO GPU/4000 :GPU/0-4000 (0/5000.0/LOC) GPU/0-5000 (2/370.8/NVL) GPU/0-b000 (2/370.8/NVL) GPU/0-c000 (2/370.8/NVL) GPU/0-84000 (2/370.8/NVL) GPU/0-85000 (2/370.8/NVL) GPU/0-8b000 (2/370.8/NVL) GPU/0-8c000 (2/370.8/NVL) NVS/0-0 (1/370.8/NVL) CPU/0-0 (2/24.0/PHB) CPU/0-1 (3/10.0/SYS) 2024-12-05 12:30:35.811 n124-105-156:494064:494508 [4] NCCL INFO GPU/5000 :GPU/0-4000 (2/370.8/NVL) GPU/0-5000 (0/5000.0/LOC) GPU/0-b000 (2/370.8/NVL) GPU/0-c000 (2/370.8/NVL) GPU/0-84000 (2/370.8/NVL) GPU/0-85000 (2/370.8/NVL) GPU/0-8b000 (2/370.8/NVL) GPU/0-8c000 (2/370.8/NVL) NVS/0-0 (1/370.8/NVL) CPU/0-0 (2/24.0/PHB) CPU/0-1 (3/10.0/SYS) 2024-12-05 12:30:35.811 n124-105-156:494064:494508 [4] NCCL INFO GPU/B000 :GPU/0-4000 (2/370.8/NVL) GPU/0-5000 (2/370.8/NVL) GPU/0-b000 (0/5000.0/LOC) GPU/0-c000 (2/370.8/NVL) GPU/0-84000 (2/370.8/NVL) GPU/0-85000 (2/370.8/NVL) GPU/0-8b000 (2/370.8/NVL) GPU/0-8c000 (2/370.8/NVL) NVS/0-0 (1/370.8/NVL) CPU/0-0 (2/24.0/PHB) CPU/0-1 (3/10.0/SYS) 2024-12-05 12:30:35.811 n124-105-156:494064:494508 [4] NCCL INFO GPU/C000 :GPU/0-4000 (2/370.8/NVL) GPU/0-5000 (2/370.8/NVL) GPU/0-b000 (2/370.8/NVL) GPU/0-c000 (0/5000.0/LOC) GPU/0-84000 (2/370.8/NVL) GPU/0-85000 (2/370.8/NVL) GPU/0-8b000 (2/370.8/NVL) GPU/0-8c000 (2/370.8/NVL) NVS/0-0 (1/370.8/NVL) CPU/0-0 (2/24.0/PHB) CPU/0-1 (3/10.0/SYS) 2024-12-05 12:30:35.811 n124-105-156:494064:494508 [4] NCCL INFO GPU/84000 :GPU/0-4000 (2/370.8/NVL) GPU/0-5000 (2/370.8/NVL) GPU/0-b000 (2/370.8/NVL) GPU/0-c000 (2/370.8/NVL) GPU/0-84000 (0/5000.0/LOC) GPU/0-85000 (2/370.8/NVL) GPU/0-8b000 (2/370.8/NVL) GPU/0-8c000 (2/370.8/NVL) NVS/0-0 (1/370.8/NVL) CPU/0-0 (3/10.0/SYS) CPU/0-1 (2/24.0/PHB) 2024-12-05 12:30:35.811 n124-105-156:494064:494508 [4] NCCL INFO GPU/85000 :GPU/0-4000 (2/370.8/NVL) GPU/0-5000 (2/370.8/NVL) GPU/0-b000 (2/370.8/NVL) GPU/0-c000 (2/370.8/NVL) GPU/0-84000 (2/370.8/NVL) GPU/0-85000 (0/5000.0/LOC) GPU/0-8b000 (2/370.8/NVL) GPU/0-8c000 (2/370.8/NVL) NVS/0-0 (1/370.8/NVL) CPU/0-0 (3/10.0/SYS) CPU/0-1 (2/24.0/PHB) 2024-12-05 12:30:35.811 n124-105-156:494064:494508 [4] NCCL INFO GPU/8B000 :GPU/0-4000 (2/370.8/NVL) GPU/0-5000 (2/370.8/NVL) GPU/0-b000 (2/370.8/NVL) GPU/0-c000 (2/370.8/NVL) GPU/0-84000 (2/370.8/NVL) GPU/0-85000 (2/370.8/NVL) GPU/0-8b000 (0/5000.0/LOC) GPU/0-8c000 (2/370.8/NVL) NVS/0-0 (1/370.8/NVL) CPU/0-0 (3/10.0/SYS) CPU/0-1 (2/24.0/PHB) 2024-12-05 12:30:35.811 n124-105-156:494064:494508 [4] NCCL INFO GPU/8C000 :GPU/0-4000 (2/370.8/NVL) GPU/0-5000 (2/370.8/NVL) GPU/0-b000 (2/370.8/NVL) GPU/0-c000 (2/370.8/NVL) GPU/0-84000 (2/370.8/NVL) GPU/0-85000 (2/370.8/NVL) GPU/0-8b000 (2/370.8/NVL) GPU/0-8c000 (0/5000.0/LOC) NVS/0-0 (1/370.8/NVL) CPU/0-0 (3/10.0/SYS) CPU/0-1 (2/24.0/PHB) 2024-12-05 12:30:35.811 n124-105-156:494064:494508 [4] NCCL INFO Setting affinity for GPU 4 to ffff,ffffffff,f0000000,000000ff,ffffffff,fff00000,00000000 2024-12-05 12:30:35.811 n124-105-156:494064:494508 [4] NCCL INFO NCCL_NVLS_ENABLE set by environment to 0. 2024-12-05 12:30:35.823 n124-105-156:494067:494513 [7] NCCL INFO Could not find real path of /sys/class/pci_bus/fffffff/../../fffffff:ff:f 2024-12-05 12:30:35.823 n124-105-156:494064:494508 [4] NCCL INFO Pattern 4, crossNic 0, nChannels 12, bw 30.000000/30.000000, type NVL/PIX, sameChannels 1 2024-12-05 12:30:35.823 n124-105-156:494064:494508 [4] NCCL INFO 0 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.823 n124-105-156:494064:494508 [4] NCCL INFO 1 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.823 n124-105-156:494067:494513 [7] NCCL INFO Topology detection : could not read /sys/devices/pci0000:00/0000:00:0c.0/max_link_speed, ignoring 2024-12-05 12:30:35.823 n124-105-156:494064:494508 [4] NCCL INFO 2 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.823 n124-105-156:494067:494513 [7] NCCL INFO Topology detection : could not read /sys/devices/pci0000:00/0000:00:0c.0/../max_link_speed, ignoring 2024-12-05 12:30:35.823 n124-105-156:494064:494508 [4] NCCL INFO 3 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.823 n124-105-156:494067:494513 [7] NCCL INFO Topology detection : could not read /sys/devices/pci0000:00/0000:00:0c.0/max_link_width, ignoring 2024-12-05 12:30:35.823 n124-105-156:494064:494508 [4] NCCL INFO 4 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.823 n124-105-156:494067:494513 [7] NCCL INFO Topology detection : could not read /sys/devices/pci0000:00/0000:00:0c.0/../max_link_width, ignoring 2024-12-05 12:30:35.823 n124-105-156:494064:494508 [4] NCCL INFO 5 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.823 n124-105-156:494064:494508 [4] NCCL INFO 6 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.823 n124-105-156:494064:494508 [4] NCCL INFO 7 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.823 n124-105-156:494064:494508 [4] NCCL INFO 8 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.823 n124-105-156:494064:494508 [4] NCCL INFO 9 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.823 n124-105-156:494064:494508 [4] NCCL INFO 10 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.823 n124-105-156:494064:494508 [4] NCCL INFO 11 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.823 n124-105-156:494064:494508 [4] NCCL INFO Pattern 1, crossNic 0, nChannels 12, bw 30.000000/30.000000, type NVL/PIX, sameChannels 1 2024-12-05 12:30:35.823 n124-105-156:494067:494513 [7] NCCL INFO KV Convert to int : could not find value of '' in dictionary, falling back to 60 2024-12-05 12:30:35.823 n124-105-156:494064:494508 [4] NCCL INFO 0 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.823 n124-105-156:494062:494514 [2] NCCL INFO Could not find real path of /sys/class/pci_bus/fffffff/../../fffffff:ff:f 2024-12-05 12:30:35.823 n124-105-156:494064:494508 [4] NCCL INFO 1 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.823 n124-105-156:494064:494508 [4] NCCL INFO 2 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.823 n124-105-156:494064:494508 [4] NCCL INFO 3 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.823 n124-105-156:494064:494508 [4] NCCL INFO 4 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.823 n124-105-156:494064:494508 [4] NCCL INFO 5 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.823 n124-105-156:494064:494508 [4] NCCL INFO 6 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.823 n124-105-156:494064:494508 [4] NCCL INFO 7 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.823 n124-105-156:494064:494508 [4] NCCL INFO 8 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.823 n124-105-156:494064:494508 [4] NCCL INFO 9 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.823 n124-105-156:494064:494508 [4] NCCL INFO 10 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.823 n124-105-156:494064:494508 [4] NCCL INFO 11 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.823 n124-105-156:494062:494514 [2] NCCL INFO Topology detection : could not read /sys/devices/pci0000:00/0000:00:0c.0/max_link_speed, ignoring 2024-12-05 12:30:35.823 n124-105-156:494062:494514 [2] NCCL INFO Topology detection : could not read /sys/devices/pci0000:00/0000:00:0c.0/../max_link_speed, ignoring 2024-12-05 12:30:35.823 n124-105-156:494062:494514 [2] NCCL INFO Topology detection : could not read /sys/devices/pci0000:00/0000:00:0c.0/max_link_width, ignoring 2024-12-05 12:30:35.823 n124-105-156:494062:494514 [2] NCCL INFO Topology detection : could not read /sys/devices/pci0000:00/0000:00:0c.0/../max_link_width, ignoring 2024-12-05 12:30:35.823 n124-105-156:494062:494514 [2] NCCL INFO KV Convert to int : could not find value of '' in dictionary, falling back to 60 2024-12-05 12:30:35.824 n124-105-156:494067:494513 [7] NCCL INFO === System : maxBw 370.8 totalBw 370.8 === 2024-12-05 12:30:35.825 n124-105-156:494067:494513 [7] NCCL INFO CPU/0-0 (1/1/2) 2024-12-05 12:30:35.825 n124-105-156:494067:494513 [7] NCCL INFO + PCI[24.0] - PCI/0-2000 (10b5879610b58796) 2024-12-05 12:30:35.825 n124-105-156:494067:494513 [7] NCCL INFO + PCI[24.0] - GPU/0-4000 (0) 2024-12-05 12:30:35.825 n124-105-156:494067:494513 [7] NCCL INFO + NVL[370.8] - NVS/0 2024-12-05 12:30:35.825 n124-105-156:494067:494513 [7] NCCL INFO + PCI[24.0] - GPU/0-5000 (1) 2024-12-05 12:30:35.825 n124-105-156:494067:494513 [7] NCCL INFO + NVL[370.8] - NVS/0 2024-12-05 12:30:35.825 n124-105-156:494067:494513 [7] NCCL INFO + PCI[24.0] - PCI/0-9000 (10b5879610b58796) 2024-12-05 12:30:35.825 n124-105-156:494067:494513 [7] NCCL INFO + PCI[24.0] - GPU/0-b000 (2) 2024-12-05 12:30:35.825 n124-105-156:494067:494513 [7] NCCL INFO + NVL[370.8] - NVS/0 2024-12-05 12:30:35.825 n124-105-156:494067:494513 [7] NCCL INFO + PCI[24.0] - GPU/0-c000 (3) 2024-12-05 12:30:35.825 n124-105-156:494067:494513 [7] NCCL INFO + NVL[370.8] - NVS/0 2024-12-05 12:30:35.825 n124-105-156:494067:494513 [7] NCCL INFO + PCI[12.0] - NIC/0-c0 2024-12-05 12:30:35.825 n124-105-156:494067:494513 [7] NCCL INFO + SYS[10.0] - CPU/1 2024-12-05 12:30:35.825 n124-105-156:494067:494513 [7] NCCL INFO CPU/0-1 (1/1/2) 2024-12-05 12:30:35.825 n124-105-156:494067:494513 [7] NCCL INFO + PCI[24.0] - PCI/0-82000 (10b5879610b58796) 2024-12-05 12:30:35.825 n124-105-156:494067:494513 [7] NCCL INFO + PCI[24.0] - GPU/0-84000 (4) 2024-12-05 12:30:35.825 n124-105-156:494067:494513 [7] NCCL INFO + NVL[370.8] - NVS/0 2024-12-05 12:30:35.825 n124-105-156:494067:494513 [7] NCCL INFO + PCI[24.0] - GPU/0-85000 (5) 2024-12-05 12:30:35.825 n124-105-156:494067:494513 [7] NCCL INFO + NVL[370.8] - NVS/0 2024-12-05 12:30:35.825 n124-105-156:494062:494514 [2] NCCL INFO === System : maxBw 370.8 totalBw 370.8 === 2024-12-05 12:30:35.825 n124-105-156:494067:494513 [7] NCCL INFO + PCI[24.0] - PCI/0-89000 (10b5879610b58796) 2024-12-05 12:30:35.825 n124-105-156:494067:494513 [7] NCCL INFO + PCI[24.0] - GPU/0-8b000 (6) 2024-12-05 12:30:35.825 n124-105-156:494062:494514 [2] NCCL INFO CPU/0-0 (1/1/2) 2024-12-05 12:30:35.825 n124-105-156:494067:494513 [7] NCCL INFO + NVL[370.8] - NVS/0 2024-12-05 12:30:35.825 n124-105-156:494067:494513 [7] NCCL INFO + PCI[24.0] - GPU/0-8c000 (7) 2024-12-05 12:30:35.825 n124-105-156:494062:494514 [2] NCCL INFO + PCI[24.0] - PCI/0-2000 (10b5879610b58796) 2024-12-05 12:30:35.825 n124-105-156:494067:494513 [7] NCCL INFO + NVL[370.8] - NVS/0 2024-12-05 12:30:35.825 n124-105-156:494062:494514 [2] NCCL INFO + PCI[24.0] - GPU/0-4000 (0) 2024-12-05 12:30:35.825 n124-105-156:494067:494513 [7] NCCL INFO + SYS[10.0] - CPU/0 2024-12-05 12:30:35.825 n124-105-156:494067:494513 [7] NCCL INFO ========================================== 2024-12-05 12:30:35.825 n124-105-156:494062:494514 [2] NCCL INFO + NVL[370.8] - NVS/0 2024-12-05 12:30:35.825 n124-105-156:494062:494514 [2] NCCL INFO + PCI[24.0] - GPU/0-5000 (1) 2024-12-05 12:30:35.825 n124-105-156:494062:494514 [2] NCCL INFO + NVL[370.8] - NVS/0 2024-12-05 12:30:35.825 n124-105-156:494062:494514 [2] NCCL INFO + PCI[24.0] - PCI/0-9000 (10b5879610b58796) 2024-12-05 12:30:35.825 n124-105-156:494062:494514 [2] NCCL INFO + PCI[24.0] - GPU/0-b000 (2) 2024-12-05 12:30:35.825 n124-105-156:494067:494513 [7] NCCL INFO GPU/4000 :GPU/0-4000 (0/5000.0/LOC) GPU/0-5000 (2/370.8/NVL) GPU/0-b000 (2/370.8/NVL) GPU/0-c000 (2/370.8/NVL) GPU/0-84000 (2/370.8/NVL) GPU/0-85000 (2/370.8/NVL) GPU/0-8b000 (2/370.8/NVL) GPU/0-8c000 (2/370.8/NVL) NVS/0-0 (1/370.8/NVL) CPU/0-0 (2/24.0/PHB) CPU/0-1 (3/10.0/SYS) 2024-12-05 12:30:35.825 n124-105-156:494062:494514 [2] NCCL INFO + NVL[370.8] - NVS/0 2024-12-05 12:30:35.825 n124-105-156:494062:494514 [2] NCCL INFO + PCI[24.0] - GPU/0-c000 (3) 2024-12-05 12:30:35.825 n124-105-156:494062:494514 [2] NCCL INFO + NVL[370.8] - NVS/0 2024-12-05 12:30:35.825 n124-105-156:494062:494514 [2] NCCL INFO + PCI[12.0] - NIC/0-c0 2024-12-05 12:30:35.825 n124-105-156:494067:494513 [7] NCCL INFO GPU/5000 :GPU/0-4000 (2/370.8/NVL) GPU/0-5000 (0/5000.0/LOC) GPU/0-b000 (2/370.8/NVL) GPU/0-c000 (2/370.8/NVL) GPU/0-84000 (2/370.8/NVL) GPU/0-85000 (2/370.8/NVL) GPU/0-8b000 (2/370.8/NVL) GPU/0-8c000 (2/370.8/NVL) NVS/0-0 (1/370.8/NVL) CPU/0-0 (2/24.0/PHB) CPU/0-1 (3/10.0/SYS) 2024-12-05 12:30:35.825 n124-105-156:494062:494514 [2] NCCL INFO + SYS[10.0] - CPU/1 2024-12-05 12:30:35.825 n124-105-156:494062:494514 [2] NCCL INFO CPU/0-1 (1/1/2) 2024-12-05 12:30:35.825 n124-105-156:494062:494514 [2] NCCL INFO + PCI[24.0] - PCI/0-82000 (10b5879610b58796) 2024-12-05 12:30:35.825 n124-105-156:494067:494513 [7] NCCL INFO GPU/B000 :GPU/0-4000 (2/370.8/NVL) GPU/0-5000 (2/370.8/NVL) GPU/0-b000 (0/5000.0/LOC) GPU/0-c000 (2/370.8/NVL) GPU/0-84000 (2/370.8/NVL) GPU/0-85000 (2/370.8/NVL) GPU/0-8b000 (2/370.8/NVL) GPU/0-8c000 (2/370.8/NVL) NVS/0-0 (1/370.8/NVL) CPU/0-0 (2/24.0/PHB) CPU/0-1 (3/10.0/SYS) 2024-12-05 12:30:35.825 n124-105-156:494062:494514 [2] NCCL INFO + PCI[24.0] - GPU/0-84000 (4) 2024-12-05 12:30:35.825 n124-105-156:494062:494514 [2] NCCL INFO + NVL[370.8] - NVS/0 2024-12-05 12:30:35.825 n124-105-156:494062:494514 [2] NCCL INFO + PCI[24.0] - GPU/0-85000 (5) 2024-12-05 12:30:35.825 n124-105-156:494062:494514 [2] NCCL INFO + NVL[370.8] - NVS/0 2024-12-05 12:30:35.825 n124-105-156:494067:494513 [7] NCCL INFO GPU/C000 :GPU/0-4000 (2/370.8/NVL) GPU/0-5000 (2/370.8/NVL) GPU/0-b000 (2/370.8/NVL) GPU/0-c000 (0/5000.0/LOC) GPU/0-84000 (2/370.8/NVL) GPU/0-85000 (2/370.8/NVL) GPU/0-8b000 (2/370.8/NVL) GPU/0-8c000 (2/370.8/NVL) NVS/0-0 (1/370.8/NVL) CPU/0-0 (2/24.0/PHB) CPU/0-1 (3/10.0/SYS) 2024-12-05 12:30:35.825 n124-105-156:494062:494514 [2] NCCL INFO + PCI[24.0] - PCI/0-89000 (10b5879610b58796) 2024-12-05 12:30:35.825 n124-105-156:494062:494514 [2] NCCL INFO + PCI[24.0] - GPU/0-8b000 (6) 2024-12-05 12:30:35.825 n124-105-156:494062:494514 [2] NCCL INFO + NVL[370.8] - NVS/0 2024-12-05 12:30:35.825 n124-105-156:494067:494513 [7] NCCL INFO GPU/84000 :GPU/0-4000 (2/370.8/NVL) GPU/0-5000 (2/370.8/NVL) GPU/0-b000 (2/370.8/NVL) GPU/0-c000 (2/370.8/NVL) GPU/0-84000 (0/5000.0/LOC) GPU/0-85000 (2/370.8/NVL) GPU/0-8b000 (2/370.8/NVL) GPU/0-8c000 (2/370.8/NVL) NVS/0-0 (1/370.8/NVL) CPU/0-0 (3/10.0/SYS) CPU/0-1 (2/24.0/PHB) 2024-12-05 12:30:35.825 n124-105-156:494062:494514 [2] NCCL INFO + PCI[24.0] - GPU/0-8c000 (7) 2024-12-05 12:30:35.825 n124-105-156:494062:494514 [2] NCCL INFO + NVL[370.8] - NVS/0 2024-12-05 12:30:35.825 n124-105-156:494062:494514 [2] NCCL INFO + SYS[10.0] - CPU/0 2024-12-05 12:30:35.825 n124-105-156:494062:494514 [2] NCCL INFO ========================================== 2024-12-05 12:30:35.825 n124-105-156:494067:494513 [7] NCCL INFO GPU/85000 :GPU/0-4000 (2/370.8/NVL) GPU/0-5000 (2/370.8/NVL) GPU/0-b000 (2/370.8/NVL) GPU/0-c000 (2/370.8/NVL) GPU/0-84000 (2/370.8/NVL) GPU/0-85000 (0/5000.0/LOC) GPU/0-8b000 (2/370.8/NVL) GPU/0-8c000 (2/370.8/NVL) NVS/0-0 (1/370.8/NVL) CPU/0-0 (3/10.0/SYS) CPU/0-1 (2/24.0/PHB) 2024-12-05 12:30:35.825 n124-105-156:494062:494514 [2] NCCL INFO GPU/4000 :GPU/0-4000 (0/5000.0/LOC) GPU/0-5000 (2/370.8/NVL) GPU/0-b000 (2/370.8/NVL) GPU/0-c000 (2/370.8/NVL) GPU/0-84000 (2/370.8/NVL) GPU/0-85000 (2/370.8/NVL) GPU/0-8b000 (2/370.8/NVL) GPU/0-8c000 (2/370.8/NVL) NVS/0-0 (1/370.8/NVL) CPU/0-0 (2/24.0/PHB) CPU/0-1 (3/10.0/SYS) 2024-12-05 12:30:35.825 n124-105-156:494067:494513 [7] NCCL INFO GPU/8B000 :GPU/0-4000 (2/370.8/NVL) GPU/0-5000 (2/370.8/NVL) GPU/0-b000 (2/370.8/NVL) GPU/0-c000 (2/370.8/NVL) GPU/0-84000 (2/370.8/NVL) GPU/0-85000 (2/370.8/NVL) GPU/0-8b000 (0/5000.0/LOC) GPU/0-8c000 (2/370.8/NVL) NVS/0-0 (1/370.8/NVL) CPU/0-0 (3/10.0/SYS) CPU/0-1 (2/24.0/PHB) 2024-12-05 12:30:35.825 n124-105-156:494062:494514 [2] NCCL INFO GPU/5000 :GPU/0-4000 (2/370.8/NVL) GPU/0-5000 (0/5000.0/LOC) GPU/0-b000 (2/370.8/NVL) GPU/0-c000 (2/370.8/NVL) GPU/0-84000 (2/370.8/NVL) GPU/0-85000 (2/370.8/NVL) GPU/0-8b000 (2/370.8/NVL) GPU/0-8c000 (2/370.8/NVL) NVS/0-0 (1/370.8/NVL) CPU/0-0 (2/24.0/PHB) CPU/0-1 (3/10.0/SYS) 2024-12-05 12:30:35.825 n124-105-156:494067:494513 [7] NCCL INFO GPU/8C000 :GPU/0-4000 (2/370.8/NVL) GPU/0-5000 (2/370.8/NVL) GPU/0-b000 (2/370.8/NVL) GPU/0-c000 (2/370.8/NVL) GPU/0-84000 (2/370.8/NVL) GPU/0-85000 (2/370.8/NVL) GPU/0-8b000 (2/370.8/NVL) GPU/0-8c000 (0/5000.0/LOC) NVS/0-0 (1/370.8/NVL) CPU/0-0 (3/10.0/SYS) CPU/0-1 (2/24.0/PHB) 2024-12-05 12:30:35.825 n124-105-156:494062:494514 [2] NCCL INFO GPU/B000 :GPU/0-4000 (2/370.8/NVL) GPU/0-5000 (2/370.8/NVL) GPU/0-b000 (0/5000.0/LOC) GPU/0-c000 (2/370.8/NVL) GPU/0-84000 (2/370.8/NVL) GPU/0-85000 (2/370.8/NVL) GPU/0-8b000 (2/370.8/NVL) GPU/0-8c000 (2/370.8/NVL) NVS/0-0 (1/370.8/NVL) CPU/0-0 (2/24.0/PHB) CPU/0-1 (3/10.0/SYS) 2024-12-05 12:30:35.825 n124-105-156:494062:494514 [2] NCCL INFO GPU/C000 :GPU/0-4000 (2/370.8/NVL) GPU/0-5000 (2/370.8/NVL) GPU/0-b000 (2/370.8/NVL) GPU/0-c000 (0/5000.0/LOC) GPU/0-84000 (2/370.8/NVL) GPU/0-85000 (2/370.8/NVL) GPU/0-8b000 (2/370.8/NVL) GPU/0-8c000 (2/370.8/NVL) NVS/0-0 (1/370.8/NVL) CPU/0-0 (2/24.0/PHB) CPU/0-1 (3/10.0/SYS) 2024-12-05 12:30:35.825 n124-105-156:494067:494513 [7] NCCL INFO Setting affinity for GPU 7 to ffff,ffffffff,f0000000,000000ff,ffffffff,fff00000,00000000 2024-12-05 12:30:35.825 n124-105-156:494062:494514 [2] NCCL INFO GPU/84000 :GPU/0-4000 (2/370.8/NVL) GPU/0-5000 (2/370.8/NVL) GPU/0-b000 (2/370.8/NVL) GPU/0-c000 (2/370.8/NVL) GPU/0-84000 (0/5000.0/LOC) GPU/0-85000 (2/370.8/NVL) GPU/0-8b000 (2/370.8/NVL) GPU/0-8c000 (2/370.8/NVL) NVS/0-0 (1/370.8/NVL) CPU/0-0 (3/10.0/SYS) CPU/0-1 (2/24.0/PHB) 2024-12-05 12:30:35.825 n124-105-156:494062:494514 [2] NCCL INFO GPU/85000 :GPU/0-4000 (2/370.8/NVL) GPU/0-5000 (2/370.8/NVL) GPU/0-b000 (2/370.8/NVL) GPU/0-c000 (2/370.8/NVL) GPU/0-84000 (2/370.8/NVL) GPU/0-85000 (0/5000.0/LOC) GPU/0-8b000 (2/370.8/NVL) GPU/0-8c000 (2/370.8/NVL) NVS/0-0 (1/370.8/NVL) CPU/0-0 (3/10.0/SYS) CPU/0-1 (2/24.0/PHB) 2024-12-05 12:30:35.825 n124-105-156:494062:494514 [2] NCCL INFO GPU/8B000 :GPU/0-4000 (2/370.8/NVL) GPU/0-5000 (2/370.8/NVL) GPU/0-b000 (2/370.8/NVL) GPU/0-c000 (2/370.8/NVL) GPU/0-84000 (2/370.8/NVL) GPU/0-85000 (2/370.8/NVL) GPU/0-8b000 (0/5000.0/LOC) GPU/0-8c000 (2/370.8/NVL) NVS/0-0 (1/370.8/NVL) CPU/0-0 (3/10.0/SYS) CPU/0-1 (2/24.0/PHB) 2024-12-05 12:30:35.825 n124-105-156:494062:494514 [2] NCCL INFO GPU/8C000 :GPU/0-4000 (2/370.8/NVL) GPU/0-5000 (2/370.8/NVL) GPU/0-b000 (2/370.8/NVL) GPU/0-c000 (2/370.8/NVL) GPU/0-84000 (2/370.8/NVL) GPU/0-85000 (2/370.8/NVL) GPU/0-8b000 (2/370.8/NVL) GPU/0-8c000 (0/5000.0/LOC) NVS/0-0 (1/370.8/NVL) CPU/0-0 (3/10.0/SYS) CPU/0-1 (2/24.0/PHB) 2024-12-05 12:30:35.825 n124-105-156:494062:494514 [2] NCCL INFO Setting affinity for GPU 2 to 0fffffff,ffffff00,00000000,000fffff,ffffffff 2024-12-05 12:30:35.825 n124-105-156:494062:494514 [2] NCCL INFO NCCL_NVLS_ENABLE set by environment to 0. 2024-12-05 12:30:35.825 n124-105-156:494067:494513 [7] NCCL INFO NCCL_NVLS_ENABLE set by environment to 0. 2024-12-05 12:30:35.829 n124-105-156:494063:494512 [3] NCCL INFO Could not find real path of /sys/class/pci_bus/fffffff/../../fffffff:ff:f 2024-12-05 12:30:35.829 n124-105-156:494063:494512 [3] NCCL INFO Topology detection : could not read /sys/devices/pci0000:00/0000:00:0c.0/max_link_speed, ignoring 2024-12-05 12:30:35.829 n124-105-156:494063:494512 [3] NCCL INFO Topology detection : could not read /sys/devices/pci0000:00/0000:00:0c.0/../max_link_speed, ignoring 2024-12-05 12:30:35.829 n124-105-156:494063:494512 [3] NCCL INFO Topology detection : could not read /sys/devices/pci0000:00/0000:00:0c.0/max_link_width, ignoring 2024-12-05 12:30:35.829 n124-105-156:494063:494512 [3] NCCL INFO Topology detection : could not read /sys/devices/pci0000:00/0000:00:0c.0/../max_link_width, ignoring 2024-12-05 12:30:35.829 n124-105-156:494063:494512 [3] NCCL INFO KV Convert to int : could not find value of '' in dictionary, falling back to 60 2024-12-05 12:30:35.830 n124-105-156:494063:494512 [3] NCCL INFO === System : maxBw 370.8 totalBw 370.8 === 2024-12-05 12:30:35.830 n124-105-156:494063:494512 [3] NCCL INFO CPU/0-0 (1/1/2) 2024-12-05 12:30:35.830 n124-105-156:494063:494512 [3] NCCL INFO + PCI[24.0] - PCI/0-2000 (10b5879610b58796) 2024-12-05 12:30:35.830 n124-105-156:494063:494512 [3] NCCL INFO + PCI[24.0] - GPU/0-4000 (0) 2024-12-05 12:30:35.830 n124-105-156:494063:494512 [3] NCCL INFO + NVL[370.8] - NVS/0 2024-12-05 12:30:35.830 n124-105-156:494063:494512 [3] NCCL INFO + PCI[24.0] - GPU/0-5000 (1) 2024-12-05 12:30:35.830 n124-105-156:494063:494512 [3] NCCL INFO + NVL[370.8] - NVS/0 2024-12-05 12:30:35.830 n124-105-156:494063:494512 [3] NCCL INFO + PCI[24.0] - PCI/0-9000 (10b5879610b58796) 2024-12-05 12:30:35.830 n124-105-156:494063:494512 [3] NCCL INFO + PCI[24.0] - GPU/0-b000 (2) 2024-12-05 12:30:35.830 n124-105-156:494063:494512 [3] NCCL INFO + NVL[370.8] - NVS/0 2024-12-05 12:30:35.830 n124-105-156:494063:494512 [3] NCCL INFO + PCI[24.0] - GPU/0-c000 (3) 2024-12-05 12:30:35.830 n124-105-156:494063:494512 [3] NCCL INFO + NVL[370.8] - NVS/0 2024-12-05 12:30:35.830 n124-105-156:494063:494512 [3] NCCL INFO + PCI[12.0] - NIC/0-c0 2024-12-05 12:30:35.830 n124-105-156:494063:494512 [3] NCCL INFO + SYS[10.0] - CPU/1 2024-12-05 12:30:35.830 n124-105-156:494063:494512 [3] NCCL INFO CPU/0-1 (1/1/2) 2024-12-05 12:30:35.830 n124-105-156:494063:494512 [3] NCCL INFO + PCI[24.0] - PCI/0-82000 (10b5879610b58796) 2024-12-05 12:30:35.830 n124-105-156:494063:494512 [3] NCCL INFO + PCI[24.0] - GPU/0-84000 (4) 2024-12-05 12:30:35.830 n124-105-156:494063:494512 [3] NCCL INFO + NVL[370.8] - NVS/0 2024-12-05 12:30:35.830 n124-105-156:494063:494512 [3] NCCL INFO + PCI[24.0] - GPU/0-85000 (5) 2024-12-05 12:30:35.830 n124-105-156:494063:494512 [3] NCCL INFO + NVL[370.8] - NVS/0 2024-12-05 12:30:35.830 n124-105-156:494063:494512 [3] NCCL INFO + PCI[24.0] - PCI/0-89000 (10b5879610b58796) 2024-12-05 12:30:35.830 n124-105-156:494063:494512 [3] NCCL INFO + PCI[24.0] - GPU/0-8b000 (6) 2024-12-05 12:30:35.830 n124-105-156:494063:494512 [3] NCCL INFO + NVL[370.8] - NVS/0 2024-12-05 12:30:35.830 n124-105-156:494063:494512 [3] NCCL INFO + PCI[24.0] - GPU/0-8c000 (7) 2024-12-05 12:30:35.830 n124-105-156:494063:494512 [3] NCCL INFO + NVL[370.8] - NVS/0 2024-12-05 12:30:35.830 n124-105-156:494063:494512 [3] NCCL INFO + SYS[10.0] - CPU/0 2024-12-05 12:30:35.830 n124-105-156:494063:494512 [3] NCCL INFO ========================================== 2024-12-05 12:30:35.830 n124-105-156:494063:494512 [3] NCCL INFO GPU/4000 :GPU/0-4000 (0/5000.0/LOC) GPU/0-5000 (2/370.8/NVL) GPU/0-b000 (2/370.8/NVL) GPU/0-c000 (2/370.8/NVL) GPU/0-84000 (2/370.8/NVL) GPU/0-85000 (2/370.8/NVL) GPU/0-8b000 (2/370.8/NVL) GPU/0-8c000 (2/370.8/NVL) NVS/0-0 (1/370.8/NVL) CPU/0-0 (2/24.0/PHB) CPU/0-1 (3/10.0/SYS) 2024-12-05 12:30:35.830 n124-105-156:494063:494512 [3] NCCL INFO GPU/5000 :GPU/0-4000 (2/370.8/NVL) GPU/0-5000 (0/5000.0/LOC) GPU/0-b000 (2/370.8/NVL) GPU/0-c000 (2/370.8/NVL) GPU/0-84000 (2/370.8/NVL) GPU/0-85000 (2/370.8/NVL) GPU/0-8b000 (2/370.8/NVL) GPU/0-8c000 (2/370.8/NVL) NVS/0-0 (1/370.8/NVL) CPU/0-0 (2/24.0/PHB) CPU/0-1 (3/10.0/SYS) 2024-12-05 12:30:35.830 n124-105-156:494063:494512 [3] NCCL INFO GPU/B000 :GPU/0-4000 (2/370.8/NVL) GPU/0-5000 (2/370.8/NVL) GPU/0-b000 (0/5000.0/LOC) GPU/0-c000 (2/370.8/NVL) GPU/0-84000 (2/370.8/NVL) GPU/0-85000 (2/370.8/NVL) GPU/0-8b000 (2/370.8/NVL) GPU/0-8c000 (2/370.8/NVL) NVS/0-0 (1/370.8/NVL) CPU/0-0 (2/24.0/PHB) CPU/0-1 (3/10.0/SYS) 2024-12-05 12:30:35.830 n124-105-156:494063:494512 [3] NCCL INFO GPU/C000 :GPU/0-4000 (2/370.8/NVL) GPU/0-5000 (2/370.8/NVL) GPU/0-b000 (2/370.8/NVL) GPU/0-c000 (0/5000.0/LOC) GPU/0-84000 (2/370.8/NVL) GPU/0-85000 (2/370.8/NVL) GPU/0-8b000 (2/370.8/NVL) GPU/0-8c000 (2/370.8/NVL) NVS/0-0 (1/370.8/NVL) CPU/0-0 (2/24.0/PHB) CPU/0-1 (3/10.0/SYS) 2024-12-05 12:30:35.830 n124-105-156:494063:494512 [3] NCCL INFO GPU/84000 :GPU/0-4000 (2/370.8/NVL) GPU/0-5000 (2/370.8/NVL) GPU/0-b000 (2/370.8/NVL) GPU/0-c000 (2/370.8/NVL) GPU/0-84000 (0/5000.0/LOC) GPU/0-85000 (2/370.8/NVL) GPU/0-8b000 (2/370.8/NVL) GPU/0-8c000 (2/370.8/NVL) NVS/0-0 (1/370.8/NVL) CPU/0-0 (3/10.0/SYS) CPU/0-1 (2/24.0/PHB) 2024-12-05 12:30:35.830 n124-105-156:494063:494512 [3] NCCL INFO GPU/85000 :GPU/0-4000 (2/370.8/NVL) GPU/0-5000 (2/370.8/NVL) GPU/0-b000 (2/370.8/NVL) GPU/0-c000 (2/370.8/NVL) GPU/0-84000 (2/370.8/NVL) GPU/0-85000 (0/5000.0/LOC) GPU/0-8b000 (2/370.8/NVL) GPU/0-8c000 (2/370.8/NVL) NVS/0-0 (1/370.8/NVL) CPU/0-0 (3/10.0/SYS) CPU/0-1 (2/24.0/PHB) 2024-12-05 12:30:35.830 n124-105-156:494063:494512 [3] NCCL INFO GPU/8B000 :GPU/0-4000 (2/370.8/NVL) GPU/0-5000 (2/370.8/NVL) GPU/0-b000 (2/370.8/NVL) GPU/0-c000 (2/370.8/NVL) GPU/0-84000 (2/370.8/NVL) GPU/0-85000 (2/370.8/NVL) GPU/0-8b000 (0/5000.0/LOC) GPU/0-8c000 (2/370.8/NVL) NVS/0-0 (1/370.8/NVL) CPU/0-0 (3/10.0/SYS) CPU/0-1 (2/24.0/PHB) 2024-12-05 12:30:35.830 n124-105-156:494063:494512 [3] NCCL INFO GPU/8C000 :GPU/0-4000 (2/370.8/NVL) GPU/0-5000 (2/370.8/NVL) GPU/0-b000 (2/370.8/NVL) GPU/0-c000 (2/370.8/NVL) GPU/0-84000 (2/370.8/NVL) GPU/0-85000 (2/370.8/NVL) GPU/0-8b000 (2/370.8/NVL) GPU/0-8c000 (0/5000.0/LOC) NVS/0-0 (1/370.8/NVL) CPU/0-0 (3/10.0/SYS) CPU/0-1 (2/24.0/PHB) 2024-12-05 12:30:35.830 n124-105-156:494063:494512 [3] NCCL INFO Setting affinity for GPU 3 to 0fffffff,ffffff00,00000000,000fffff,ffffffff 2024-12-05 12:30:35.830 n124-105-156:494063:494512 [3] NCCL INFO NCCL_NVLS_ENABLE set by environment to 0. 2024-12-05 12:30:35.832 n124-105-156:494062:494514 [2] NCCL INFO Pattern 4, crossNic 0, nChannels 12, bw 30.000000/30.000000, type NVL/PIX, sameChannels 1 2024-12-05 12:30:35.832 n124-105-156:494062:494514 [2] NCCL INFO 0 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.832 n124-105-156:494062:494514 [2] NCCL INFO 1 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.832 n124-105-156:494062:494514 [2] NCCL INFO 2 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.832 n124-105-156:494062:494514 [2] NCCL INFO 3 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.832 n124-105-156:494062:494514 [2] NCCL INFO 4 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.832 n124-105-156:494062:494514 [2] NCCL INFO 5 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.832 n124-105-156:494062:494514 [2] NCCL INFO 6 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.832 n124-105-156:494062:494514 [2] NCCL INFO 7 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.832 n124-105-156:494062:494514 [2] NCCL INFO 8 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.832 n124-105-156:494062:494514 [2] NCCL INFO 9 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.832 n124-105-156:494062:494514 [2] NCCL INFO 10 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.832 n124-105-156:494062:494514 [2] NCCL INFO 11 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.832 n124-105-156:494062:494514 [2] NCCL INFO Pattern 1, crossNic 0, nChannels 12, bw 30.000000/30.000000, type NVL/PIX, sameChannels 1 2024-12-05 12:30:35.832 n124-105-156:494062:494514 [2] NCCL INFO 0 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.832 n124-105-156:494062:494514 [2] NCCL INFO 1 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.832 n124-105-156:494062:494514 [2] NCCL INFO 2 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.832 n124-105-156:494062:494514 [2] NCCL INFO 3 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.832 n124-105-156:494062:494514 [2] NCCL INFO 4 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.832 n124-105-156:494062:494514 [2] NCCL INFO 5 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.832 n124-105-156:494062:494514 [2] NCCL INFO 6 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.832 n124-105-156:494062:494514 [2] NCCL INFO 7 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.832 n124-105-156:494062:494514 [2] NCCL INFO 8 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.832 n124-105-156:494062:494514 [2] NCCL INFO 9 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.832 n124-105-156:494062:494514 [2] NCCL INFO 10 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.832 n124-105-156:494062:494514 [2] NCCL INFO 11 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.832 n124-105-156:494067:494513 [7] NCCL INFO Pattern 4, crossNic 0, nChannels 12, bw 30.000000/30.000000, type NVL/PIX, sameChannels 1 2024-12-05 12:30:35.832 n124-105-156:494067:494513 [7] NCCL INFO 0 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.832 n124-105-156:494067:494513 [7] NCCL INFO 1 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.832 n124-105-156:494067:494513 [7] NCCL INFO 2 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.832 n124-105-156:494067:494513 [7] NCCL INFO 3 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.832 n124-105-156:494067:494513 [7] NCCL INFO 4 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.832 n124-105-156:494067:494513 [7] NCCL INFO 5 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.832 n124-105-156:494067:494513 [7] NCCL INFO 6 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.832 n124-105-156:494067:494513 [7] NCCL INFO 7 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.832 n124-105-156:494067:494513 [7] NCCL INFO 8 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.832 n124-105-156:494067:494513 [7] NCCL INFO 9 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.832 n124-105-156:494067:494513 [7] NCCL INFO 10 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.832 n124-105-156:494067:494513 [7] NCCL INFO 11 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.832 n124-105-156:494067:494513 [7] NCCL INFO Pattern 1, crossNic 0, nChannels 12, bw 30.000000/30.000000, type NVL/PIX, sameChannels 1 2024-12-05 12:30:35.832 n124-105-156:494067:494513 [7] NCCL INFO 0 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.832 n124-105-156:494067:494513 [7] NCCL INFO 1 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.832 n124-105-156:494067:494513 [7] NCCL INFO 2 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.832 n124-105-156:494067:494513 [7] NCCL INFO 3 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.832 n124-105-156:494067:494513 [7] NCCL INFO 4 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.832 n124-105-156:494067:494513 [7] NCCL INFO 5 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.832 n124-105-156:494067:494513 [7] NCCL INFO 6 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.832 n124-105-156:494067:494513 [7] NCCL INFO 7 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.832 n124-105-156:494067:494513 [7] NCCL INFO 8 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.832 n124-105-156:494067:494513 [7] NCCL INFO 9 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.832 n124-105-156:494067:494513 [7] NCCL INFO 10 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.832 n124-105-156:494067:494513 [7] NCCL INFO 11 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.838 n124-105-156:494063:494512 [3] NCCL INFO Pattern 4, crossNic 0, nChannels 12, bw 30.000000/30.000000, type NVL/PIX, sameChannels 1 2024-12-05 12:30:35.838 n124-105-156:494063:494512 [3] NCCL INFO 0 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.838 n124-105-156:494063:494512 [3] NCCL INFO 1 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.838 n124-105-156:494063:494512 [3] NCCL INFO 2 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.838 n124-105-156:494063:494512 [3] NCCL INFO 3 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.838 n124-105-156:494063:494512 [3] NCCL INFO 4 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.838 n124-105-156:494063:494512 [3] NCCL INFO 5 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.838 n124-105-156:494063:494512 [3] NCCL INFO 6 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.838 n124-105-156:494063:494512 [3] NCCL INFO 7 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.838 n124-105-156:494063:494512 [3] NCCL INFO 8 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.838 n124-105-156:494063:494512 [3] NCCL INFO 9 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.838 n124-105-156:494063:494512 [3] NCCL INFO 10 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.838 n124-105-156:494063:494512 [3] NCCL INFO 11 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.838 n124-105-156:494063:494512 [3] NCCL INFO Pattern 1, crossNic 0, nChannels 12, bw 30.000000/30.000000, type NVL/PIX, sameChannels 1 2024-12-05 12:30:35.838 n124-105-156:494063:494512 [3] NCCL INFO 0 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.838 n124-105-156:494063:494512 [3] NCCL INFO 1 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.838 n124-105-156:494063:494512 [3] NCCL INFO 2 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.838 n124-105-156:494063:494512 [3] NCCL INFO 3 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.838 n124-105-156:494063:494512 [3] NCCL INFO 4 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.838 n124-105-156:494063:494512 [3] NCCL INFO 5 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.838 n124-105-156:494063:494512 [3] NCCL INFO 6 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.838 n124-105-156:494063:494512 [3] NCCL INFO 7 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.838 n124-105-156:494063:494512 [3] NCCL INFO 8 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.838 n124-105-156:494063:494512 [3] NCCL INFO 9 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.838 n124-105-156:494063:494512 [3] NCCL INFO 10 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.838 n124-105-156:494063:494512 [3] NCCL INFO 11 : GPU/0 GPU/1 GPU/2 GPU/3 GPU/4 GPU/5 GPU/6 GPU/7 2024-12-05 12:30:35.838 n124-105-156:494063:494512 [3] NCCL INFO comm 0xf68bd60 rank 3 nRanks 8 nNodes 1 localRanks 8 localRank 3 MNNVL 0 2024-12-05 12:30:35.838 n124-105-156:494062:494514 [2] NCCL INFO comm 0xee8a820 rank 2 nRanks 8 nNodes 1 localRanks 8 localRank 2 MNNVL 0 2024-12-05 12:30:35.838 n124-105-156:494061:494511 [1] NCCL INFO comm 0xf829c20 rank 1 nRanks 8 nNodes 1 localRanks 8 localRank 1 MNNVL 0 2024-12-05 12:30:35.838 n124-105-156:494062:494514 [2] NCCL INFO NCCL_MIN_NCHANNELS set by environment to 4. 2024-12-05 12:30:35.838 n124-105-156:494063:494512 [3] NCCL INFO NCCL_MIN_NCHANNELS set by environment to 4. 2024-12-05 12:30:35.838 n124-105-156:494064:494508 [4] NCCL INFO comm 0xdf02040 rank 4 nRanks 8 nNodes 1 localRanks 8 localRank 4 MNNVL 0 2024-12-05 12:30:35.838 n124-105-156:494067:494513 [7] NCCL INFO comm 0xf8d09c0 rank 7 nRanks 8 nNodes 1 localRanks 8 localRank 7 MNNVL 0 2024-12-05 12:30:35.838 n124-105-156:494061:494511 [1] NCCL INFO Tree 0 : 0 -> 1 -> 2/-1/-1 2024-12-05 12:30:35.838 n124-105-156:494065:494510 [5] NCCL INFO comm 0xe11f9c0 rank 5 nRanks 8 nNodes 1 localRanks 8 localRank 5 MNNVL 0 2024-12-05 12:30:35.838 n124-105-156:494063:494512 [3] NCCL INFO Ring 00 : 2 -> 3 -> 4 2024-12-05 12:30:35.838 n124-105-156:494062:494514 [2] NCCL INFO Ring 00 : 1 -> 2 -> 3 2024-12-05 12:30:35.838 n124-105-156:494066:494509 [6] NCCL INFO comm 0xe035b70 rank 6 nRanks 8 nNodes 1 localRanks 8 localRank 6 MNNVL 0 2024-12-05 12:30:35.838 n124-105-156:494061:494511 [1] NCCL INFO Tree 12 : 0 -> 1 -> 2/-1/-1 2024-12-05 12:30:35.838 n124-105-156:494063:494512 [3] NCCL INFO Ring 01 : 2 -> 3 -> 4 2024-12-05 12:30:35.838 n124-105-156:494062:494514 [2] NCCL INFO Ring 01 : 1 -> 2 -> 3 2024-12-05 12:30:35.838 n124-105-156:494061:494511 [1] NCCL INFO Tree 1 : 0 -> 1 -> 2/-1/-1 2024-12-05 12:30:35.838 n124-105-156:494063:494512 [3] NCCL INFO Ring 02 : 2 -> 3 -> 4 2024-12-05 12:30:35.838 n124-105-156:494062:494514 [2] NCCL INFO Ring 02 : 1 -> 2 -> 3 2024-12-05 12:30:35.838 n124-105-156:494067:494513 [7] NCCL INFO NCCL_MIN_NCHANNELS set by environment to 4. 2024-12-05 12:30:35.838 n124-105-156:494062:494514 [2] NCCL INFO Ring 03 : 1 -> 2 -> 3 2024-12-05 12:30:35.838 n124-105-156:494065:494510 [5] NCCL INFO NCCL_MIN_NCHANNELS set by environment to 4. 2024-12-05 12:30:35.838 n124-105-156:494064:494508 [4] NCCL INFO NCCL_MIN_NCHANNELS set by environment to 4. 2024-12-05 12:30:35.838 n124-105-156:494061:494511 [1] NCCL INFO Tree 13 : 0 -> 1 -> 2/-1/-1 2024-12-05 12:30:35.838 n124-105-156:494063:494512 [3] NCCL INFO Ring 03 : 2 -> 3 -> 4 2024-12-05 12:30:35.838 n124-105-156:494060:494507 [0] NCCL INFO comm 0xdebf210 rank 0 nRanks 8 nNodes 1 localRanks 8 localRank 0 MNNVL 0 2024-12-05 12:30:35.838 n124-105-156:494062:494514 [2] NCCL INFO Ring 04 : 1 -> 2 -> 3 2024-12-05 12:30:35.838 n124-105-156:494061:494511 [1] NCCL INFO Tree 2 : 0 -> 1 -> 2/-1/-1 2024-12-05 12:30:35.838 n124-105-156:494063:494512 [3] NCCL INFO Ring 04 : 2 -> 3 -> 4 2024-12-05 12:30:35.838 n124-105-156:494062:494514 [2] NCCL INFO Ring 05 : 1 -> 2 -> 3 2024-12-05 12:30:35.838 n124-105-156:494066:494509 [6] NCCL INFO NCCL_MIN_NCHANNELS set by environment to 4. 2024-12-05 12:30:35.838 n124-105-156:494061:494511 [1] NCCL INFO Tree 14 : 0 -> 1 -> 2/-1/-1 2024-12-05 12:30:35.838 n124-105-156:494062:494514 [2] NCCL INFO Ring 06 : 1 -> 2 -> 3 2024-12-05 12:30:35.838 n124-105-156:494063:494512 [3] NCCL INFO Ring 05 : 2 -> 3 -> 4 2024-12-05 12:30:35.838 n124-105-156:494067:494513 [7] NCCL INFO Ring 00 : 6 -> 7 -> 0 2024-12-05 12:30:35.838 n124-105-156:494061:494511 [1] NCCL INFO Tree 3 : 0 -> 1 -> 2/-1/-1 2024-12-05 12:30:35.838 n124-105-156:494065:494510 [5] NCCL INFO Ring 00 : 4 -> 5 -> 6 2024-12-05 12:30:35.838 n124-105-156:494064:494508 [4] NCCL INFO Ring 00 : 3 -> 4 -> 5 2024-12-05 12:30:35.838 n124-105-156:494067:494513 [7] NCCL INFO Ring 01 : 6 -> 7 -> 0 2024-12-05 12:30:35.838 n124-105-156:494062:494514 [2] NCCL INFO Ring 07 : 1 -> 2 -> 3 2024-12-05 12:30:35.838 n124-105-156:494063:494512 [3] NCCL INFO Ring 06 : 2 -> 3 -> 4 2024-12-05 12:30:35.838 n124-105-156:494060:494507 [0] NCCL INFO Tree 0 : -1 -> 0 -> 1/-1/-1 2024-12-05 12:30:35.838 n124-105-156:494061:494511 [1] NCCL INFO Tree 15 : 0 -> 1 -> 2/-1/-1 2024-12-05 12:30:35.838 n124-105-156:494065:494510 [5] NCCL INFO Ring 01 : 4 -> 5 -> 6 2024-12-05 12:30:35.838 n124-105-156:494064:494508 [4] NCCL INFO Ring 01 : 3 -> 4 -> 5 2024-12-05 12:30:35.838 n124-105-156:494060:494507 [0] NCCL INFO Tree 12 : -1 -> 0 -> 1/-1/-1 2024-12-05 12:30:35.838 n124-105-156:494067:494513 [7] NCCL INFO Ring 02 : 6 -> 7 -> 0 2024-12-05 12:30:35.838 n124-105-156:494065:494510 [5] NCCL INFO Ring 02 : 4 -> 5 -> 6 2024-12-05 12:30:35.838 n124-105-156:494066:494509 [6] NCCL INFO Ring 00 : 5 -> 6 -> 7 2024-12-05 12:30:35.838 n124-105-156:494063:494512 [3] NCCL INFO Ring 07 : 2 -> 3 -> 4 2024-12-05 12:30:35.838 n124-105-156:494062:494514 [2] NCCL INFO Ring 08 : 1 -> 2 -> 3 2024-12-05 12:30:35.838 n124-105-156:494061:494511 [1] NCCL INFO Tree 4 : 0 -> 1 -> 2/-1/-1 2024-12-05 12:30:35.838 n124-105-156:494060:494507 [0] NCCL INFO Tree 1 : -1 -> 0 -> 1/-1/-1 2024-12-05 12:30:35.838 n124-105-156:494064:494508 [4] NCCL INFO Ring 02 : 3 -> 4 -> 5 2024-12-05 12:30:35.838 n124-105-156:494067:494513 [7] NCCL INFO Ring 03 : 6 -> 7 -> 0 2024-12-05 12:30:35.838 n124-105-156:494065:494510 [5] NCCL INFO Ring 03 : 4 -> 5 -> 6 2024-12-05 12:30:35.838 n124-105-156:494066:494509 [6] NCCL INFO Ring 01 : 5 -> 6 -> 7 2024-12-05 12:30:35.838 n124-105-156:494064:494508 [4] NCCL INFO Ring 03 : 3 -> 4 -> 5 2024-12-05 12:30:35.838 n124-105-156:494062:494514 [2] NCCL INFO Ring 09 : 1 -> 2 -> 3 2024-12-05 12:30:35.838 n124-105-156:494061:494511 [1] NCCL INFO Tree 16 : 0 -> 1 -> 2/-1/-1 2024-12-05 12:30:35.838 n124-105-156:494063:494512 [3] NCCL INFO Ring 08 : 2 -> 3 -> 4 2024-12-05 12:30:35.838 n124-105-156:494060:494507 [0] NCCL INFO Tree 13 : -1 -> 0 -> 1/-1/-1 2024-12-05 12:30:35.838 n124-105-156:494067:494513 [7] NCCL INFO Ring 04 : 6 -> 7 -> 0 2024-12-05 12:30:35.838 n124-105-156:494063:494512 [3] NCCL INFO Ring 09 : 2 -> 3 -> 4 2024-12-05 12:30:35.838 n124-105-156:494065:494510 [5] NCCL INFO Ring 04 : 4 -> 5 -> 6 2024-12-05 12:30:35.838 n124-105-156:494067:494513 [7] NCCL INFO Ring 05 : 6 -> 7 -> 0 2024-12-05 12:30:35.838 n124-105-156:494062:494514 [2] NCCL INFO Ring 10 : 1 -> 2 -> 3 2024-12-05 12:30:35.838 n124-105-156:494061:494511 [1] NCCL INFO Tree 5 : 0 -> 1 -> 2/-1/-1 2024-12-05 12:30:35.838 n124-105-156:494060:494507 [0] NCCL INFO Tree 2 : -1 -> 0 -> 1/-1/-1 2024-12-05 12:30:35.838 n124-105-156:494067:494513 [7] NCCL INFO Ring 06 : 6 -> 7 -> 0 2024-12-05 12:30:35.838 n124-105-156:494062:494514 [2] NCCL INFO Ring 11 : 1 -> 2 -> 3 2024-12-05 12:30:35.838 n124-105-156:494064:494508 [4] NCCL INFO Ring 04 : 3 -> 4 -> 5 2024-12-05 12:30:35.838 n124-105-156:494066:494509 [6] NCCL INFO Ring 02 : 5 -> 6 -> 7 2024-12-05 12:30:35.838 n124-105-156:494063:494512 [3] NCCL INFO Ring 10 : 2 -> 3 -> 4 2024-12-05 12:30:35.838 n124-105-156:494062:494514 [2] NCCL INFO Ring 12 : 1 -> 2 -> 3 2024-12-05 12:30:35.838 n124-105-156:494065:494510 [5] NCCL INFO Ring 05 : 4 -> 5 -> 6 2024-12-05 12:30:35.838 n124-105-156:494061:494511 [1] NCCL INFO Tree 17 : 0 -> 1 -> 2/-1/-1 2024-12-05 12:30:35.838 n124-105-156:494066:494509 [6] NCCL INFO Ring 03 : 5 -> 6 -> 7 2024-12-05 12:30:35.838 n124-105-156:494060:494507 [0] NCCL INFO Tree 14 : -1 -> 0 -> 1/-1/-1 2024-12-05 12:30:35.838 n124-105-156:494067:494513 [7] NCCL INFO Ring 07 : 6 -> 7 -> 0 2024-12-05 12:30:35.838 n124-105-156:494061:494511 [1] NCCL INFO Tree 6 : 0 -> 1 -> 2/-1/-1 2024-12-05 12:30:35.838 n124-105-156:494064:494508 [4] NCCL INFO Ring 05 : 3 -> 4 -> 5 2024-12-05 12:30:35.838 n124-105-156:494067:494513 [7] NCCL INFO Ring 08 : 6 -> 7 -> 0 2024-12-05 12:30:35.838 n124-105-156:494063:494512 [3] NCCL INFO Ring 11 : 2 -> 3 -> 4 2024-12-05 12:30:35.838 n124-105-156:494061:494511 [1] NCCL INFO Tree 18 : 0 -> 1 -> 2/-1/-1 2024-12-05 12:30:35.838 n124-105-156:494062:494514 [2] NCCL INFO Ring 13 : 1 -> 2 -> 3 2024-12-05 12:30:35.838 n124-105-156:494065:494510 [5] NCCL INFO Ring 06 : 4 -> 5 -> 6 2024-12-05 12:30:35.838 n124-105-156:494064:494508 [4] NCCL INFO Ring 06 : 3 -> 4 -> 5 2024-12-05 12:30:35.838 n124-105-156:494061:494511 [1] NCCL INFO Tree 7 : 0 -> 1 -> 2/-1/-1 2024-12-05 12:30:35.838 n124-105-156:494066:494509 [6] NCCL INFO Ring 04 : 5 -> 6 -> 7 2024-12-05 12:30:35.838 n124-105-156:494060:494507 [0] NCCL INFO Tree 3 : -1 -> 0 -> 1/-1/-1 2024-12-05 12:30:35.838 n124-105-156:494067:494513 [7] NCCL INFO Ring 09 : 6 -> 7 -> 0 2024-12-05 12:30:35.838 n124-105-156:494063:494512 [3] NCCL INFO Ring 12 : 2 -> 3 -> 4 2024-12-05 12:30:35.838 n124-105-156:494062:494514 [2] NCCL INFO Ring 14 : 1 -> 2 -> 3 2024-12-05 12:30:35.838 n124-105-156:494065:494510 [5] NCCL INFO Ring 07 : 4 -> 5 -> 6 2024-12-05 12:30:35.838 n124-105-156:494061:494511 [1] NCCL INFO Tree 19 : 0 -> 1 -> 2/-1/-1 2024-12-05 12:30:35.838 n124-105-156:494064:494508 [4] NCCL INFO Ring 07 : 3 -> 4 -> 5 2024-12-05 12:30:35.838 n124-105-156:494062:494514 [2] NCCL INFO Ring 15 : 1 -> 2 -> 3 2024-12-05 12:30:35.838 n124-105-156:494061:494511 [1] NCCL INFO Tree 8 : 0 -> 1 -> 2/-1/-1 2024-12-05 12:30:35.838 n124-105-156:494060:494507 [0] NCCL INFO Tree 15 : -1 -> 0 -> 1/-1/-1 2024-12-05 12:30:35.838 n124-105-156:494066:494509 [6] NCCL INFO Ring 05 : 5 -> 6 -> 7 2024-12-05 12:30:35.838 n124-105-156:494067:494513 [7] NCCL INFO Ring 10 : 6 -> 7 -> 0 2024-12-05 12:30:35.838 n124-105-156:494064:494508 [4] NCCL INFO Ring 08 : 3 -> 4 -> 5 2024-12-05 12:30:35.838 n124-105-156:494063:494512 [3] NCCL INFO Ring 13 : 2 -> 3 -> 4 2024-12-05 12:30:35.838 n124-105-156:494065:494510 [5] NCCL INFO Ring 08 : 4 -> 5 -> 6 2024-12-05 12:30:35.838 n124-105-156:494062:494514 [2] NCCL INFO Ring 16 : 1 -> 2 -> 3 2024-12-05 12:30:35.838 n124-105-156:494063:494512 [3] NCCL INFO Ring 14 : 2 -> 3 -> 4 2024-12-05 12:30:35.838 n124-105-156:494065:494510 [5] NCCL INFO Ring 09 : 4 -> 5 -> 6 2024-12-05 12:30:35.838 n124-105-156:494061:494511 [1] NCCL INFO Tree 20 : 0 -> 1 -> 2/-1/-1 2024-12-05 12:30:35.838 n124-105-156:494062:494514 [2] NCCL INFO Ring 17 : 1 -> 2 -> 3 2024-12-05 12:30:35.838 n124-105-156:494060:494507 [0] NCCL INFO Tree 4 : -1 -> 0 -> 1/-1/-1 2024-12-05 12:30:35.838 n124-105-156:494066:494509 [6] NCCL INFO Ring 06 : 5 -> 6 -> 7 2024-12-05 12:30:35.838 n124-105-156:494061:494511 [1] NCCL INFO Tree 9 : 0 -> 1 -> 2/-1/-1 2024-12-05 12:30:35.838 n124-105-156:494067:494513 [7] NCCL INFO Ring 11 : 6 -> 7 -> 0 2024-12-05 12:30:35.838 n124-105-156:494064:494508 [4] NCCL INFO Ring 09 : 3 -> 4 -> 5 2024-12-05 12:30:35.838 n124-105-156:494063:494512 [3] NCCL INFO Ring 15 : 2 -> 3 -> 4 2024-12-05 12:30:35.838 n124-105-156:494065:494510 [5] NCCL INFO Ring 10 : 4 -> 5 -> 6 2024-12-05 12:30:35.838 n124-105-156:494060:494507 [0] NCCL INFO Tree 16 : -1 -> 0 -> 1/-1/-1 2024-12-05 12:30:35.838 n124-105-156:494062:494514 [2] NCCL INFO Ring 18 : 1 -> 2 -> 3 2024-12-05 12:30:35.838 n124-105-156:494066:494509 [6] NCCL INFO Ring 07 : 5 -> 6 -> 7 2024-12-05 12:30:35.838 n124-105-156:494061:494511 [1] NCCL INFO Tree 21 : 0 -> 1 -> 2/-1/-1 2024-12-05 12:30:35.838 n124-105-156:494067:494513 [7] NCCL INFO Ring 12 : 6 -> 7 -> 0 2024-12-05 12:30:35.838 n124-105-156:494064:494508 [4] NCCL INFO Ring 10 : 3 -> 4 -> 5 2024-12-05 12:30:35.838 n124-105-156:494063:494512 [3] NCCL INFO Ring 16 : 2 -> 3 -> 4 2024-12-05 12:30:35.838 n124-105-156:494067:494513 [7] NCCL INFO Ring 13 : 6 -> 7 -> 0 2024-12-05 12:30:35.838 n124-105-156:494065:494510 [5] NCCL INFO Ring 11 : 4 -> 5 -> 6 2024-12-05 12:30:35.838 n124-105-156:494060:494507 [0] NCCL INFO Tree 5 : -1 -> 0 -> 1/-1/-1 2024-12-05 12:30:35.838 n124-105-156:494067:494513 [7] NCCL INFO Ring 14 : 6 -> 7 -> 0 2024-12-05 12:30:35.838 n124-105-156:494062:494514 [2] NCCL INFO Ring 19 : 1 -> 2 -> 3 2024-12-05 12:30:35.838 n124-105-156:494066:494509 [6] NCCL INFO Ring 08 : 5 -> 6 -> 7 2024-12-05 12:30:35.838 n124-105-156:494061:494511 [1] NCCL INFO Tree 10 : 0 -> 1 -> 2/-1/-1 2024-12-05 12:30:35.838 n124-105-156:494064:494508 [4] NCCL INFO Ring 11 : 3 -> 4 -> 5 2024-12-05 12:30:35.838 n124-105-156:494063:494512 [3] NCCL INFO Ring 17 : 2 -> 3 -> 4 2024-12-05 12:30:35.838 n124-105-156:494065:494510 [5] NCCL INFO Ring 12 : 4 -> 5 -> 6 2024-12-05 12:30:35.838 n124-105-156:494060:494507 [0] NCCL INFO Tree 17 : -1 -> 0 -> 1/-1/-1 2024-12-05 12:30:35.838 n124-105-156:494067:494513 [7] NCCL INFO Ring 15 : 6 -> 7 -> 0 2024-12-05 12:30:35.838 n124-105-156:494064:494508 [4] NCCL INFO Ring 12 : 3 -> 4 -> 5 2024-12-05 12:30:35.838 n124-105-156:494062:494514 [2] NCCL INFO Ring 20 : 1 -> 2 -> 3 2024-12-05 12:30:35.838 n124-105-156:494061:494511 [1] NCCL INFO Tree 22 : 0 -> 1 -> 2/-1/-1 2024-12-05 12:30:35.838 n124-105-156:494066:494509 [6] NCCL INFO Ring 09 : 5 -> 6 -> 7 2024-12-05 12:30:35.838 n124-105-156:494063:494512 [3] NCCL INFO Ring 18 : 2 -> 3 -> 4 2024-12-05 12:30:35.838 n124-105-156:494065:494510 [5] NCCL INFO Ring 13 : 4 -> 5 -> 6 2024-12-05 12:30:35.838 n124-105-156:494060:494507 [0] NCCL INFO Tree 6 : -1 -> 0 -> 1/-1/-1 2024-12-05 12:30:35.838 n124-105-156:494067:494513 [7] NCCL INFO Ring 16 : 6 -> 7 -> 0 2024-12-05 12:30:35.838 n124-105-156:494064:494508 [4] NCCL INFO Ring 13 : 3 -> 4 -> 5 2024-12-05 12:30:35.838 n124-105-156:494062:494514 [2] NCCL INFO Ring 21 : 1 -> 2 -> 3 2024-12-05 12:30:35.838 n124-105-156:494061:494511 [1] NCCL INFO Tree 11 : 0 -> 1 -> 2/-1/-1 2024-12-05 12:30:35.838 n124-105-156:494066:494509 [6] NCCL INFO Ring 10 : 5 -> 6 -> 7 2024-12-05 12:30:35.838 n124-105-156:494063:494512 [3] NCCL INFO Ring 19 : 2 -> 3 -> 4 2024-12-05 12:30:35.838 n124-105-156:494062:494514 [2] NCCL INFO Ring 22 : 1 -> 2 -> 3 2024-12-05 12:30:35.838 n124-105-156:494061:494511 [1] NCCL INFO Tree 23 : 0 -> 1 -> 2/-1/-1 2024-12-05 12:30:35.838 n124-105-156:494065:494510 [5] NCCL INFO Ring 14 : 4 -> 5 -> 6 2024-12-05 12:30:35.838 n124-105-156:494060:494507 [0] NCCL INFO Tree 18 : -1 -> 0 -> 1/-1/-1 2024-12-05 12:30:35.838 n124-105-156:494067:494513 [7] NCCL INFO Ring 17 : 6 -> 7 -> 0 2024-12-05 12:30:35.838 n124-105-156:494065:494510 [5] NCCL INFO Ring 15 : 4 -> 5 -> 6 2024-12-05 12:30:35.838 n124-105-156:494064:494508 [4] NCCL INFO Ring 14 : 3 -> 4 -> 5 2024-12-05 12:30:35.838 n124-105-156:494066:494509 [6] NCCL INFO Ring 11 : 5 -> 6 -> 7 2024-12-05 12:30:35.838 n124-105-156:494063:494512 [3] NCCL INFO Ring 20 : 2 -> 3 -> 4 2024-12-05 12:30:35.838 n124-105-156:494062:494514 [2] NCCL INFO Ring 23 : 1 -> 2 -> 3 2024-12-05 12:30:35.838 n124-105-156:494060:494507 [0] NCCL INFO Tree 7 : -1 -> 0 -> 1/-1/-1 2024-12-05 12:30:35.838 n124-105-156:494066:494509 [6] NCCL INFO Ring 12 : 5 -> 6 -> 7 2024-12-05 12:30:35.838 n124-105-156:494067:494513 [7] NCCL INFO Ring 18 : 6 -> 7 -> 0 2024-12-05 12:30:35.838 n124-105-156:494065:494510 [5] NCCL INFO Ring 16 : 4 -> 5 -> 6 2024-12-05 12:30:35.838 n124-105-156:494064:494508 [4] NCCL INFO Ring 15 : 3 -> 4 -> 5 2024-12-05 12:30:35.838 n124-105-156:494061:494511 [1] NCCL INFO NCCL_MIN_NCHANNELS set by environment to 4. 2024-12-05 12:30:35.838 n124-105-156:494063:494512 [3] NCCL INFO Ring 21 : 2 -> 3 -> 4 2024-12-05 12:30:35.838 n124-105-156:494062:494514 [2] NCCL INFO Trees [0] 3/-1/-1->2->1 [1] 3/-1/-1->2->1 [2] 3/-1/-1->2->1 [3] 3/-1/-1->2->1 [4] 3/-1/-1->2->1 [5] 3/-1/-1->2->1 [6] 3/-1/-1->2->1 [7] 3/-1/-1->2->1 [8] 3/-1/-1->2->1 [9] 3/-1/-1->2->1 [10] 3/-1/-1->2->1 [11] 3/-1/-1->2->1 [12] 3/-1/-1->2->1 [13] 3/-1/-1->2->1 [14] 3/-1/-1->2->1 [15] 3/-1/-1->2->1 [16] 3/-1/-1->2->1 [17] 3/-1/-1->2->1 [18] 3/-1/-1->2->1 [19] 3/-1/-1->2->1 [20] 3/-1/-1->2->1 [21] 3/-1/-1->2->1 [22] 3/-1/-1->2->1 [23] 3/-1/-1->2->1 2024-12-05 12:30:35.838 n124-105-156:494060:494507 [0] NCCL INFO Tree 19 : -1 -> 0 -> 1/-1/-1 2024-12-05 12:30:35.838 n124-105-156:494067:494513 [7] NCCL INFO Ring 19 : 6 -> 7 -> 0 2024-12-05 12:30:35.838 n124-105-156:494066:494509 [6] NCCL INFO Ring 13 : 5 -> 6 -> 7 2024-12-05 12:30:35.838 n124-105-156:494065:494510 [5] NCCL INFO Ring 17 : 4 -> 5 -> 6 2024-12-05 12:30:35.838 n124-105-156:494064:494508 [4] NCCL INFO Ring 16 : 3 -> 4 -> 5 2024-12-05 12:30:35.838 n124-105-156:494067:494513 [7] NCCL INFO Ring 20 : 6 -> 7 -> 0 2024-12-05 12:30:35.838 n124-105-156:494063:494512 [3] NCCL INFO Ring 22 : 2 -> 3 -> 4 2024-12-05 12:30:35.838 n124-105-156:494060:494507 [0] NCCL INFO Tree 8 : -1 -> 0 -> 1/-1/-1 2024-12-05 12:30:35.838 n124-105-156:494062:494514 [2] NCCL INFO NCCL_BUFFSIZE set by environment to 4194304. 2024-12-05 12:30:35.838 n124-105-156:494065:494510 [5] NCCL INFO Ring 18 : 4 -> 5 -> 6 2024-12-05 12:30:35.838 n124-105-156:494067:494513 [7] NCCL INFO Ring 21 : 6 -> 7 -> 0 2024-12-05 12:30:35.838 n124-105-156:494064:494508 [4] NCCL INFO Ring 17 : 3 -> 4 -> 5 2024-12-05 12:30:35.838 n124-105-156:494061:494511 [1] NCCL INFO Ring 00 : 0 -> 1 -> 2 2024-12-05 12:30:35.838 n124-105-156:494060:494507 [0] NCCL INFO Tree 20 : -1 -> 0 -> 1/-1/-1 2024-12-05 12:30:35.838 n124-105-156:494065:494510 [5] NCCL INFO Ring 19 : 4 -> 5 -> 6 2024-12-05 12:30:35.838 n124-105-156:494062:494514 [2] NCCL INFO NCCL_P2P_NVL_CHUNKSIZE set by environment to 1048576. 2024-12-05 12:30:35.838 n124-105-156:494066:494509 [6] NCCL INFO Ring 14 : 5 -> 6 -> 7 2024-12-05 12:30:35.838 n124-105-156:494063:494512 [3] NCCL INFO Ring 23 : 2 -> 3 -> 4 2024-12-05 12:30:35.838 n124-105-156:494067:494513 [7] NCCL INFO Ring 22 : 6 -> 7 -> 0 2024-12-05 12:30:35.838 n124-105-156:494064:494508 [4] NCCL INFO Ring 18 : 3 -> 4 -> 5 2024-12-05 12:30:35.838 n124-105-156:494060:494507 [0] NCCL INFO Tree 9 : -1 -> 0 -> 1/-1/-1 2024-12-05 12:30:35.838 n124-105-156:494061:494511 [1] NCCL INFO Ring 01 : 0 -> 1 -> 2 2024-12-05 12:30:35.838 n124-105-156:494065:494510 [5] NCCL INFO Ring 20 : 4 -> 5 -> 6 2024-12-05 12:30:35.838 n124-105-156:494062:494514 [2] NCCL INFO P2P Chunksize set to 524288 2024-12-05 12:30:35.838 n124-105-156:494066:494509 [6] NCCL INFO Ring 15 : 5 -> 6 -> 7 2024-12-05 12:30:35.838 n124-105-156:494063:494512 [3] NCCL INFO Trees [0] 4/-1/-1->3->2 [1] 4/-1/-1->3->2 [2] 4/-1/-1->3->2 [3] 4/-1/-1->3->2 [4] 4/-1/-1->3->2 [5] 4/-1/-1->3->2 [6] 4/-1/-1->3->2 [7] 4/-1/-1->3->2 [8] 4/-1/-1->3->2 [9] 4/-1/-1->3->2 [10] 4/-1/-1->3->2 [11] 4/-1/-1->3->2 [12] 4/-1/-1->3->2 [13] 4/-1/-1->3->2 [14] 4/-1/-1->3->2 [15] 4/-1/-1->3->2 [16] 4/-1/-1->3->2 [17] 4/-1/-1->3->2 [18] 4/-1/-1->3->2 [19] 4/-1/-1->3->2 [20] 4/-1/-1->3->2 [21] 4/-1/-1->3->2 [22] 4/-1/-1->3->2 [23] 4/-1/-1->3->2 2024-12-05 12:30:35.838 n124-105-156:494067:494513 [7] NCCL INFO Ring 23 : 6 -> 7 -> 0 2024-12-05 12:30:35.838 n124-105-156:494064:494508 [4] NCCL INFO Ring 19 : 3 -> 4 -> 5 2024-12-05 12:30:35.838 n124-105-156:494060:494507 [0] NCCL INFO Tree 21 : -1 -> 0 -> 1/-1/-1 2024-12-05 12:30:35.838 n124-105-156:494061:494511 [1] NCCL INFO Ring 02 : 0 -> 1 -> 2 2024-12-05 12:30:35.838 n124-105-156:494065:494510 [5] NCCL INFO Ring 21 : 4 -> 5 -> 6 2024-12-05 12:30:35.838 n124-105-156:494067:494513 [7] NCCL INFO Trees [0] -1/-1/-1->7->6 [1] -1/-1/-1->7->6 [2] -1/-1/-1->7->6 [3] -1/-1/-1->7->6 [4] -1/-1/-1->7->6 [5] -1/-1/-1->7->6 [6] -1/-1/-1->7->6 [7] -1/-1/-1->7->6 [8] -1/-1/-1->7->6 [9] -1/-1/-1->7->6 [10] -1/-1/-1->7->6 [11] -1/-1/-1->7->6 [12] -1/-1/-1->7->6 [13] -1/-1/-1->7->6 [14] -1/-1/-1->7->6 [15] -1/-1/-1->7->6 [16] -1/-1/-1->7->6 [17] -1/-1/-1->7->6 [18] -1/-1/-1->7->6 [19] -1/-1/-1->7->6 [20] -1/-1/-1->7->6 [21] -1/-1/-1->7->6 [22] -1/-1/-1->7->6 [23] -1/-1/-1->7->6 2024-12-05 12:30:35.838 n124-105-156:494063:494512 [3] NCCL INFO NCCL_BUFFSIZE set by environment to 4194304. 2024-12-05 12:30:35.838 n124-105-156:494066:494509 [6] NCCL INFO Ring 16 : 5 -> 6 -> 7 2024-12-05 12:30:35.838 n124-105-156:494065:494510 [5] NCCL INFO Ring 22 : 4 -> 5 -> 6 2024-12-05 12:30:35.838 n124-105-156:494064:494508 [4] NCCL INFO Ring 20 : 3 -> 4 -> 5 2024-12-05 12:30:35.838 n124-105-156:494060:494507 [0] NCCL INFO Tree 10 : -1 -> 0 -> 1/-1/-1 2024-12-05 12:30:35.838 n124-105-156:494061:494511 [1] NCCL INFO Ring 03 : 0 -> 1 -> 2 2024-12-05 12:30:35.838 n124-105-156:494063:494512 [3] NCCL INFO NCCL_P2P_NVL_CHUNKSIZE set by environment to 1048576. 2024-12-05 12:30:35.838 n124-105-156:494065:494510 [5] NCCL INFO Ring 23 : 4 -> 5 -> 6 2024-12-05 12:30:35.838 n124-105-156:494066:494509 [6] NCCL INFO Ring 17 : 5 -> 6 -> 7 2024-12-05 12:30:35.838 n124-105-156:494067:494513 [7] NCCL INFO NCCL_BUFFSIZE set by environment to 4194304. 2024-12-05 12:30:35.838 n124-105-156:494060:494507 [0] NCCL INFO Tree 22 : -1 -> 0 -> 1/-1/-1 2024-12-05 12:30:35.838 n124-105-156:494065:494510 [5] NCCL INFO Trees [0] 6/-1/-1->5->4 [1] 6/-1/-1->5->4 [2] 6/-1/-1->5->4 [3] 6/-1/-1->5->4 [4] 6/-1/-1->5->4 [5] 6/-1/-1->5->4 [6] 6/-1/-1->5->4 [7] 6/-1/-1->5->4 [8] 6/-1/-1->5->4 [9] 6/-1/-1->5->4 [10] 6/-1/-1->5->4 [11] 6/-1/-1->5->4 [12] 6/-1/-1->5->4 [13] 6/-1/-1->5->4 [14] 6/-1/-1->5->4 [15] 6/-1/-1->5->4 [16] 6/-1/-1->5->4 [17] 6/-1/-1->5->4 [18] 6/-1/-1->5->4 [19] 6/-1/-1->5->4 [20] 6/-1/-1->5->4 [21] 6/-1/-1->5->4 [22] 6/-1/-1->5->4 [23] 6/-1/-1->5->4 2024-12-05 12:30:35.838 n124-105-156:494064:494508 [4] NCCL INFO Ring 21 : 3 -> 4 -> 5 2024-12-05 12:30:35.838 n124-105-156:494061:494511 [1] NCCL INFO Ring 04 : 0 -> 1 -> 2 2024-12-05 12:30:35.838 n124-105-156:494063:494512 [3] NCCL INFO P2P Chunksize set to 524288 2024-12-05 12:30:35.838 n124-105-156:494066:494509 [6] NCCL INFO Ring 18 : 5 -> 6 -> 7 2024-12-05 12:30:35.838 n124-105-156:494060:494507 [0] NCCL INFO Tree 11 : -1 -> 0 -> 1/-1/-1 2024-12-05 12:30:35.838 n124-105-156:494067:494513 [7] NCCL INFO NCCL_P2P_NVL_CHUNKSIZE set by environment to 1048576. 2024-12-05 12:30:35.838 n124-105-156:494064:494508 [4] NCCL INFO Ring 22 : 3 -> 4 -> 5 2024-12-05 12:30:35.838 n124-105-156:494061:494511 [1] NCCL INFO Ring 05 : 0 -> 1 -> 2 2024-12-05 12:30:35.838 n124-105-156:494065:494510 [5] NCCL INFO NCCL_BUFFSIZE set by environment to 4194304. 2024-12-05 12:30:35.838 n124-105-156:494060:494507 [0] NCCL INFO Tree 23 : -1 -> 0 -> 1/-1/-1 2024-12-05 12:30:35.838 n124-105-156:494067:494513 [7] NCCL INFO P2P Chunksize set to 524288 2024-12-05 12:30:35.838 n124-105-156:494066:494509 [6] NCCL INFO Ring 19 : 5 -> 6 -> 7 2024-12-05 12:30:35.838 n124-105-156:494064:494508 [4] NCCL INFO Ring 23 : 3 -> 4 -> 5 2024-12-05 12:30:35.838 n124-105-156:494065:494510 [5] NCCL INFO NCCL_P2P_NVL_CHUNKSIZE set by environment to 1048576. 2024-12-05 12:30:35.838 n124-105-156:494064:494508 [4] NCCL INFO Trees [0] 5/-1/-1->4->3 [1] 5/-1/-1->4->3 [2] 5/-1/-1->4->3 [3] 5/-1/-1->4->3 [4] 5/-1/-1->4->3 [5] 5/-1/-1->4->3 [6] 5/-1/-1->4->3 [7] 5/-1/-1->4->3 [8] 5/-1/-1->4->3 [9] 5/-1/-1->4->3 [10] 5/-1/-1->4->3 [11] 5/-1/-1->4->3 [12] 5/-1/-1->4->3 [13] 5/-1/-1->4->3 [14] 5/-1/-1->4->3 [15] 5/-1/-1->4->3 [16] 5/-1/-1->4->3 [17] 5/-1/-1->4->3 [18] 5/-1/-1->4->3 [19] 5/-1/-1->4->3 [20] 5/-1/-1->4->3 [21] 5/-1/-1->4->3 [22] 5/-1/-1->4->3 [23] 5/-1/-1->4->3 2024-12-05 12:30:35.838 n124-105-156:494061:494511 [1] NCCL INFO Ring 06 : 0 -> 1 -> 2 2024-12-05 12:30:35.838 n124-105-156:494065:494510 [5] NCCL INFO P2P Chunksize set to 524288 2024-12-05 12:30:35.838 n124-105-156:494066:494509 [6] NCCL INFO Ring 20 : 5 -> 6 -> 7 2024-12-05 12:30:35.838 n124-105-156:494060:494507 [0] NCCL INFO NCCL_MIN_NCHANNELS set by environment to 4. 2024-12-05 12:30:35.838 n124-105-156:494061:494511 [1] NCCL INFO Ring 07 : 0 -> 1 -> 2 2024-12-05 12:30:35.838 n124-105-156:494064:494508 [4] NCCL INFO NCCL_BUFFSIZE set by environment to 4194304. 2024-12-05 12:30:35.838 n124-105-156:494066:494509 [6] NCCL INFO Ring 21 : 5 -> 6 -> 7 2024-12-05 12:30:35.838 n124-105-156:494061:494511 [1] NCCL INFO Ring 08 : 0 -> 1 -> 2 2024-12-05 12:30:35.838 n124-105-156:494060:494507 [0] NCCL INFO Channel 00/24 : 0 1 2 3 4 5 6 7 2024-12-05 12:30:35.838 n124-105-156:494061:494511 [1] NCCL INFO Ring 09 : 0 -> 1 -> 2 2024-12-05 12:30:35.838 n124-105-156:494066:494509 [6] NCCL INFO Ring 22 : 5 -> 6 -> 7 2024-12-05 12:30:35.838 n124-105-156:494064:494508 [4] NCCL INFO NCCL_P2P_NVL_CHUNKSIZE set by environment to 1048576. 2024-12-05 12:30:35.838 n124-105-156:494060:494507 [0] NCCL INFO Channel 01/24 : 0 1 2 3 4 5 6 7 2024-12-05 12:30:35.838 n124-105-156:494061:494511 [1] NCCL INFO Ring 10 : 0 -> 1 -> 2 2024-12-05 12:30:35.838 n124-105-156:494064:494508 [4] NCCL INFO P2P Chunksize set to 524288 2024-12-05 12:30:35.838 n124-105-156:494066:494509 [6] NCCL INFO Ring 23 : 5 -> 6 -> 7 2024-12-05 12:30:35.838 n124-105-156:494061:494511 [1] NCCL INFO Ring 11 : 0 -> 1 -> 2 2024-12-05 12:30:35.838 n124-105-156:494060:494507 [0] NCCL INFO Channel 02/24 : 0 1 2 3 4 5 6 7 2024-12-05 12:30:35.838 n124-105-156:494066:494509 [6] NCCL INFO Trees [0] 7/-1/-1->6->5 [1] 7/-1/-1->6->5 [2] 7/-1/-1->6->5 [3] 7/-1/-1->6->5 [4] 7/-1/-1->6->5 [5] 7/-1/-1->6->5 [6] 7/-1/-1->6->5 [7] 7/-1/-1->6->5 [8] 7/-1/-1->6->5 [9] 7/-1/-1->6->5 [10] 7/-1/-1->6->5 [11] 7/-1/-1->6->5 [12] 7/-1/-1->6->5 [13] 7/-1/-1->6->5 [14] 7/-1/-1->6->5 [15] 7/-1/-1->6->5 [16] 7/-1/-1->6->5 [17] 7/-1/-1->6->5 [18] 7/-1/-1->6->5 [19] 7/-1/-1->6->5 [20] 7/-1/-1->6->5 [21] 7/-1/-1->6->5 [22] 7/-1/-1->6->5 [23] 7/-1/-1->6->5 2024-12-05 12:30:35.838 n124-105-156:494061:494511 [1] NCCL INFO Ring 12 : 0 -> 1 -> 2 2024-12-05 12:30:35.838 n124-105-156:494060:494507 [0] NCCL INFO Channel 03/24 : 0 1 2 3 4 5 6 7 2024-12-05 12:30:35.838 n124-105-156:494061:494511 [1] NCCL INFO Ring 13 : 0 -> 1 -> 2 2024-12-05 12:30:35.838 n124-105-156:494060:494507 [0] NCCL INFO Channel 04/24 : 0 1 2 3 4 5 6 7 2024-12-05 12:30:35.838 n124-105-156:494066:494509 [6] NCCL INFO NCCL_BUFFSIZE set by environment to 4194304. 2024-12-05 12:30:35.838 n124-105-156:494061:494511 [1] NCCL INFO Ring 14 : 0 -> 1 -> 2 2024-12-05 12:30:35.838 n124-105-156:494060:494507 [0] NCCL INFO Channel 05/24 : 0 1 2 3 4 5 6 7 2024-12-05 12:30:35.838 n124-105-156:494061:494511 [1] NCCL INFO Ring 15 : 0 -> 1 -> 2 2024-12-05 12:30:35.838 n124-105-156:494060:494507 [0] NCCL INFO Channel 06/24 : 0 1 2 3 4 5 6 7 2024-12-05 12:30:35.838 n124-105-156:494066:494509 [6] NCCL INFO NCCL_P2P_NVL_CHUNKSIZE set by environment to 1048576. 2024-12-05 12:30:35.838 n124-105-156:494061:494511 [1] NCCL INFO Ring 16 : 0 -> 1 -> 2 2024-12-05 12:30:35.838 n124-105-156:494060:494507 [0] NCCL INFO Channel 07/24 : 0 1 2 3 4 5 6 7 2024-12-05 12:30:35.838 n124-105-156:494061:494511 [1] NCCL INFO Ring 17 : 0 -> 1 -> 2 2024-12-05 12:30:35.838 n124-105-156:494066:494509 [6] NCCL INFO P2P Chunksize set to 524288 2024-12-05 12:30:35.838 n124-105-156:494060:494507 [0] NCCL INFO Channel 08/24 : 0 1 2 3 4 5 6 7 2024-12-05 12:30:35.838 n124-105-156:494061:494511 [1] NCCL INFO Ring 18 : 0 -> 1 -> 2 2024-12-05 12:30:35.838 n124-105-156:494060:494507 [0] NCCL INFO Channel 09/24 : 0 1 2 3 4 5 6 7 2024-12-05 12:30:35.838 n124-105-156:494061:494511 [1] NCCL INFO Ring 19 : 0 -> 1 -> 2 2024-12-05 12:30:35.838 n124-105-156:494060:494507 [0] NCCL INFO Channel 10/24 : 0 1 2 3 4 5 6 7 2024-12-05 12:30:35.838 n124-105-156:494061:494511 [1] NCCL INFO Ring 20 : 0 -> 1 -> 2 2024-12-05 12:30:35.838 n124-105-156:494061:494511 [1] NCCL INFO Ring 21 : 0 -> 1 -> 2 2024-12-05 12:30:35.838 n124-105-156:494060:494507 [0] NCCL INFO Channel 11/24 : 0 1 2 3 4 5 6 7 2024-12-05 12:30:35.838 n124-105-156:494061:494511 [1] NCCL INFO Ring 22 : 0 -> 1 -> 2 2024-12-05 12:30:35.838 n124-105-156:494060:494507 [0] NCCL INFO Channel 12/24 : 0 1 2 3 4 5 6 7 2024-12-05 12:30:35.838 n124-105-156:494061:494511 [1] NCCL INFO Ring 23 : 0 -> 1 -> 2 2024-12-05 12:30:35.838 n124-105-156:494060:494507 [0] NCCL INFO Channel 13/24 : 0 1 2 3 4 5 6 7 2024-12-05 12:30:35.838 n124-105-156:494061:494511 [1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] 2/-1/-1->1->0 [2] 2/-1/-1->1->0 [3] 2/-1/-1->1->0 [4] 2/-1/-1->1->0 [5] 2/-1/-1->1->0 [6] 2/-1/-1->1->0 [7] 2/-1/-1->1->0 [8] 2/-1/-1->1->0 [9] 2/-1/-1->1->0 [10] 2/-1/-1->1->0 [11] 2/-1/-1->1->0 [12] 2/-1/-1->1->0 [13] 2/-1/-1->1->0 [14] 2/-1/-1->1->0 [15] 2/-1/-1->1->0 [16] 2/-1/-1->1->0 [17] 2/-1/-1->1->0 [18] 2/-1/-1->1->0 [19] 2/-1/-1->1->0 [20] 2/-1/-1->1->0 [21] 2/-1/-1->1->0 [22] 2/-1/-1->1->0 [23] 2/-1/-1->1->0 2024-12-05 12:30:35.838 n124-105-156:494060:494507 [0] NCCL INFO Channel 14/24 : 0 1 2 3 4 5 6 7 2024-12-05 12:30:35.838 n124-105-156:494060:494507 [0] NCCL INFO Channel 15/24 : 0 1 2 3 4 5 6 7 2024-12-05 12:30:35.838 n124-105-156:494061:494511 [1] NCCL INFO NCCL_BUFFSIZE set by environment to 4194304. 2024-12-05 12:30:35.838 n124-105-156:494060:494507 [0] NCCL INFO Channel 16/24 : 0 1 2 3 4 5 6 7 2024-12-05 12:30:35.838 n124-105-156:494061:494511 [1] NCCL INFO NCCL_P2P_NVL_CHUNKSIZE set by environment to 1048576. 2024-12-05 12:30:35.838 n124-105-156:494060:494507 [0] NCCL INFO Channel 17/24 : 0 1 2 3 4 5 6 7 2024-12-05 12:30:35.838 n124-105-156:494061:494511 [1] NCCL INFO P2P Chunksize set to 524288 2024-12-05 12:30:35.838 n124-105-156:494060:494507 [0] NCCL INFO Channel 18/24 : 0 1 2 3 4 5 6 7 2024-12-05 12:30:35.838 n124-105-156:494060:494507 [0] NCCL INFO Channel 19/24 : 0 1 2 3 4 5 6 7 2024-12-05 12:30:35.838 n124-105-156:494060:494507 [0] NCCL INFO Channel 20/24 : 0 1 2 3 4 5 6 7 2024-12-05 12:30:35.838 n124-105-156:494060:494507 [0] NCCL INFO Channel 21/24 : 0 1 2 3 4 5 6 7 2024-12-05 12:30:35.838 n124-105-156:494060:494507 [0] NCCL INFO Channel 22/24 : 0 1 2 3 4 5 6 7 2024-12-05 12:30:35.838 n124-105-156:494060:494507 [0] NCCL INFO Channel 23/24 : 0 1 2 3 4 5 6 7 2024-12-05 12:30:35.839 n124-105-156:494060:494507 [0] NCCL INFO Ring 00 : 7 -> 0 -> 1 2024-12-05 12:30:35.839 n124-105-156:494060:494507 [0] NCCL INFO Ring 01 : 7 -> 0 -> 1 2024-12-05 12:30:35.839 n124-105-156:494060:494507 [0] NCCL INFO Ring 02 : 7 -> 0 -> 1 2024-12-05 12:30:35.839 n124-105-156:494060:494507 [0] NCCL INFO Ring 03 : 7 -> 0 -> 1 2024-12-05 12:30:35.839 n124-105-156:494060:494507 [0] NCCL INFO Ring 04 : 7 -> 0 -> 1 2024-12-05 12:30:35.839 n124-105-156:494060:494507 [0] NCCL INFO Ring 05 : 7 -> 0 -> 1 2024-12-05 12:30:35.839 n124-105-156:494060:494507 [0] NCCL INFO Ring 06 : 7 -> 0 -> 1 2024-12-05 12:30:35.839 n124-105-156:494060:494507 [0] NCCL INFO Ring 07 : 7 -> 0 -> 1 2024-12-05 12:30:35.839 n124-105-156:494060:494507 [0] NCCL INFO Ring 08 : 7 -> 0 -> 1 2024-12-05 12:30:35.839 n124-105-156:494060:494507 [0] NCCL INFO Ring 09 : 7 -> 0 -> 1 2024-12-05 12:30:35.839 n124-105-156:494060:494507 [0] NCCL INFO Ring 10 : 7 -> 0 -> 1 2024-12-05 12:30:35.839 n124-105-156:494060:494507 [0] NCCL INFO Ring 11 : 7 -> 0 -> 1 2024-12-05 12:30:35.839 n124-105-156:494060:494507 [0] NCCL INFO Ring 12 : 7 -> 0 -> 1 2024-12-05 12:30:35.839 n124-105-156:494060:494507 [0] NCCL INFO Ring 13 : 7 -> 0 -> 1 2024-12-05 12:30:35.839 n124-105-156:494060:494507 [0] NCCL INFO Ring 14 : 7 -> 0 -> 1 2024-12-05 12:30:35.839 n124-105-156:494060:494507 [0] NCCL INFO Ring 15 : 7 -> 0 -> 1 2024-12-05 12:30:35.839 n124-105-156:494060:494507 [0] NCCL INFO Ring 16 : 7 -> 0 -> 1 2024-12-05 12:30:35.839 n124-105-156:494060:494507 [0] NCCL INFO Ring 17 : 7 -> 0 -> 1 2024-12-05 12:30:35.839 n124-105-156:494060:494507 [0] NCCL INFO Ring 18 : 7 -> 0 -> 1 2024-12-05 12:30:35.839 n124-105-156:494060:494507 [0] NCCL INFO Ring 19 : 7 -> 0 -> 1 2024-12-05 12:30:35.839 n124-105-156:494060:494507 [0] NCCL INFO Ring 20 : 7 -> 0 -> 1 2024-12-05 12:30:35.839 n124-105-156:494060:494507 [0] NCCL INFO Ring 21 : 7 -> 0 -> 1 2024-12-05 12:30:35.839 n124-105-156:494060:494507 [0] NCCL INFO Ring 22 : 7 -> 0 -> 1 2024-12-05 12:30:35.839 n124-105-156:494060:494507 [0] NCCL INFO Ring 23 : 7 -> 0 -> 1 2024-12-05 12:30:35.839 n124-105-156:494060:494507 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 [2] 1/-1/-1->0->-1 [3] 1/-1/-1->0->-1 [4] 1/-1/-1->0->-1 [5] 1/-1/-1->0->-1 [6] 1/-1/-1->0->-1 [7] 1/-1/-1->0->-1 [8] 1/-1/-1->0->-1 [9] 1/-1/-1->0->-1 [10] 1/-1/-1->0->-1 [11] 1/-1/-1->0->-1 [12] 1/-1/-1->0->-1 [13] 1/-1/-1->0->-1 [14] 1/-1/-1->0->-1 [15] 1/-1/-1->0->-1 [16] 1/-1/-1->0->-1 [17] 1/-1/-1->0->-1 [18] 1/-1/-1->0->-1 [19] 1/-1/-1->0->-1 [20] 1/-1/-1->0->-1 [21] 1/-1/-1->0->-1 [22] 1/-1/-1->0->-1 [23] 1/-1/-1->0->-1 2024-12-05 12:30:35.839 n124-105-156:494060:494507 [0] NCCL INFO NCCL_BUFFSIZE set by environment to 4194304. 2024-12-05 12:30:35.839 n124-105-156:494060:494507 [0] NCCL INFO NCCL_P2P_NVL_CHUNKSIZE set by environment to 1048576. 2024-12-05 12:30:35.839 n124-105-156:494060:494507 [0] NCCL INFO P2P Chunksize set to 524288 2024-12-05 12:30:35.906 n124-105-156:494061:494511 [1] NCCL INFO Ring Channel 00/0 : 1[1] -> 2[2] via P2P/CUMEM 2024-12-05 12:30:35.909 n124-105-156:494061:494511 [1] NCCL INFO Ring Channel 01/0 : 1[1] -> 2[2] via P2P/CUMEM 2024-12-05 12:30:35.909 n124-105-156:494061:494511 [1] NCCL INFO Ring Channel 02/0 : 1[1] -> 2[2] via P2P/CUMEM 2024-12-05 12:30:35.910 n124-105-156:494061:494511 [1] NCCL INFO Ring Channel 03/0 : 1[1] -> 2[2] via P2P/CUMEM 2024-12-05 12:30:35.910 n124-105-156:494061:494511 [1] NCCL INFO Ring Channel 04/0 : 1[1] -> 2[2] via P2P/CUMEM 2024-12-05 12:30:35.910 n124-105-156:494065:494510 [5] NCCL INFO Ring Channel 00/0 : 5[5] -> 6[6] via P2P/CUMEM 2024-12-05 12:30:35.911 n124-105-156:494061:494511 [1] NCCL INFO Ring Channel 05/0 : 1[1] -> 2[2] via P2P/CUMEM 2024-12-05 12:30:35.911 n124-105-156:494065:494510 [5] NCCL INFO Ring Channel 01/0 : 5[5] -> 6[6] via P2P/CUMEM 2024-12-05 12:30:35.911 n124-105-156:494061:494511 [1] NCCL INFO Ring Channel 06/0 : 1[1] -> 2[2] via P2P/CUMEM 2024-12-05 12:30:35.911 n124-105-156:494065:494510 [5] NCCL INFO Ring Channel 02/0 : 5[5] -> 6[6] via P2P/CUMEM 2024-12-05 12:30:35.912 n124-105-156:494061:494511 [1] NCCL INFO Ring Channel 07/0 : 1[1] -> 2[2] via P2P/CUMEM 2024-12-05 12:30:35.912 n124-105-156:494065:494510 [5] NCCL INFO Ring Channel 03/0 : 5[5] -> 6[6] via P2P/CUMEM 2024-12-05 12:30:35.912 n124-105-156:494061:494511 [1] NCCL INFO Ring Channel 08/0 : 1[1] -> 2[2] via P2P/CUMEM 2024-12-05 12:30:35.913 n124-105-156:494065:494510 [5] NCCL INFO Ring Channel 04/0 : 5[5] -> 6[6] via P2P/CUMEM 2024-12-05 12:30:35.913 n124-105-156:494067:494513 [7] NCCL INFO Ring Channel 00/0 : 7[7] -> 0[0] via P2P/CUMEM 2024-12-05 12:30:35.913 n124-105-156:494061:494511 [1] NCCL INFO Ring Channel 09/0 : 1[1] -> 2[2] via P2P/CUMEM 2024-12-05 12:30:35.913 n124-105-156:494065:494510 [5] NCCL INFO Ring Channel 05/0 : 5[5] -> 6[6] via P2P/CUMEM 2024-12-05 12:30:35.913 n124-105-156:494067:494513 [7] NCCL INFO Ring Channel 01/0 : 7[7] -> 0[0] via P2P/CUMEM 2024-12-05 12:30:35.914 n124-105-156:494061:494511 [1] NCCL INFO Ring Channel 10/0 : 1[1] -> 2[2] via P2P/CUMEM 2024-12-05 12:30:35.914 n124-105-156:494065:494510 [5] NCCL INFO Ring Channel 06/0 : 5[5] -> 6[6] via P2P/CUMEM 2024-12-05 12:30:35.914 n124-105-156:494067:494513 [7] NCCL INFO Ring Channel 02/0 : 7[7] -> 0[0] via P2P/CUMEM 2024-12-05 12:30:35.914 n124-105-156:494061:494511 [1] NCCL INFO Ring Channel 11/0 : 1[1] -> 2[2] via P2P/CUMEM 2024-12-05 12:30:35.915 n124-105-156:494065:494510 [5] NCCL INFO Ring Channel 07/0 : 5[5] -> 6[6] via P2P/CUMEM 2024-12-05 12:30:35.915 n124-105-156:494067:494513 [7] NCCL INFO Ring Channel 03/0 : 7[7] -> 0[0] via P2P/CUMEM 2024-12-05 12:30:35.915 n124-105-156:494061:494511 [1] NCCL INFO Ring Channel 12/0 : 1[1] -> 2[2] via P2P/CUMEM 2024-12-05 12:30:35.915 n124-105-156:494065:494510 [5] NCCL INFO Ring Channel 08/0 : 5[5] -> 6[6] via P2P/CUMEM 2024-12-05 12:30:35.915 n124-105-156:494067:494513 [7] NCCL INFO Ring Channel 04/0 : 7[7] -> 0[0] via P2P/CUMEM 2024-12-05 12:30:35.916 n124-105-156:494061:494511 [1] NCCL INFO Ring Channel 13/0 : 1[1] -> 2[2] via P2P/CUMEM 2024-12-05 12:30:35.916 n124-105-156:494065:494510 [5] NCCL INFO Ring Channel 09/0 : 5[5] -> 6[6] via P2P/CUMEM 2024-12-05 12:30:35.916 n124-105-156:494067:494513 [7] NCCL INFO Ring Channel 05/0 : 7[7] -> 0[0] via P2P/CUMEM 2024-12-05 12:30:35.916 n124-105-156:494061:494511 [1] NCCL INFO Ring Channel 14/0 : 1[1] -> 2[2] via P2P/CUMEM 2024-12-05 12:30:35.916 n124-105-156:494065:494510 [5] NCCL INFO Ring Channel 10/0 : 5[5] -> 6[6] via P2P/CUMEM 2024-12-05 12:30:35.917 n124-105-156:494067:494513 [7] NCCL INFO Ring Channel 06/0 : 7[7] -> 0[0] via P2P/CUMEM 2024-12-05 12:30:35.917 n124-105-156:494061:494511 [1] NCCL INFO Ring Channel 15/0 : 1[1] -> 2[2] via P2P/CUMEM 2024-12-05 12:30:35.917 n124-105-156:494065:494510 [5] NCCL INFO Ring Channel 11/0 : 5[5] -> 6[6] via P2P/CUMEM 2024-12-05 12:30:35.917 n124-105-156:494067:494513 [7] NCCL INFO Ring Channel 07/0 : 7[7] -> 0[0] via P2P/CUMEM 2024-12-05 12:30:35.918 n124-105-156:494061:494511 [1] NCCL INFO Ring Channel 16/0 : 1[1] -> 2[2] via P2P/CUMEM 2024-12-05 12:30:35.918 n124-105-156:494065:494510 [5] NCCL INFO Ring Channel 12/0 : 5[5] -> 6[6] via P2P/CUMEM 2024-12-05 12:30:35.918 n124-105-156:494067:494513 [7] NCCL INFO Ring Channel 08/0 : 7[7] -> 0[0] via P2P/CUMEM 2024-12-05 12:30:35.918 n124-105-156:494061:494511 [1] NCCL INFO Ring Channel 17/0 : 1[1] -> 2[2] via P2P/CUMEM 2024-12-05 12:30:35.918 n124-105-156:494065:494510 [5] NCCL INFO Ring Channel 13/0 : 5[5] -> 6[6] via P2P/CUMEM 2024-12-05 12:30:35.919 n124-105-156:494067:494513 [7] NCCL INFO Ring Channel 09/0 : 7[7] -> 0[0] via P2P/CUMEM 2024-12-05 12:30:35.919 n124-105-156:494061:494511 [1] NCCL INFO Ring Channel 18/0 : 1[1] -> 2[2] via P2P/CUMEM 2024-12-05 12:30:35.919 n124-105-156:494065:494510 [5] NCCL INFO Ring Channel 14/0 : 5[5] -> 6[6] via P2P/CUMEM 2024-12-05 12:30:35.919 n124-105-156:494067:494513 [7] NCCL INFO Ring Channel 10/0 : 7[7] -> 0[0] via P2P/CUMEM 2024-12-05 12:30:35.919 n124-105-156:494061:494511 [1] NCCL INFO Ring Channel 19/0 : 1[1] -> 2[2] via P2P/CUMEM 2024-12-05 12:30:35.920 n124-105-156:494065:494510 [5] NCCL INFO Ring Channel 15/0 : 5[5] -> 6[6] via P2P/CUMEM 2024-12-05 12:30:35.920 n124-105-156:494067:494513 [7] NCCL INFO Ring Channel 11/0 : 7[7] -> 0[0] via P2P/CUMEM 2024-12-05 12:30:35.920 n124-105-156:494061:494511 [1] NCCL INFO Ring Channel 20/0 : 1[1] -> 2[2] via P2P/CUMEM 2024-12-05 12:30:35.920 n124-105-156:494065:494510 [5] NCCL INFO Ring Channel 16/0 : 5[5] -> 6[6] via P2P/CUMEM 2024-12-05 12:30:35.920 n124-105-156:494067:494513 [7] NCCL INFO Ring Channel 12/0 : 7[7] -> 0[0] via P2P/CUMEM 2024-12-05 12:30:35.921 n124-105-156:494061:494511 [1] NCCL INFO Ring Channel 21/0 : 1[1] -> 2[2] via P2P/CUMEM 2024-12-05 12:30:35.921 n124-105-156:494065:494510 [5] NCCL INFO Ring Channel 17/0 : 5[5] -> 6[6] via P2P/CUMEM 2024-12-05 12:30:35.921 n124-105-156:494067:494513 [7] NCCL INFO Ring Channel 13/0 : 7[7] -> 0[0] via P2P/CUMEM 2024-12-05 12:30:35.921 n124-105-156:494061:494511 [1] NCCL INFO Ring Channel 22/0 : 1[1] -> 2[2] via P2P/CUMEM 2024-12-05 12:30:35.921 n124-105-156:494065:494510 [5] NCCL INFO Ring Channel 18/0 : 5[5] -> 6[6] via P2P/CUMEM 2024-12-05 12:30:35.922 n124-105-156:494067:494513 [7] NCCL INFO Ring Channel 14/0 : 7[7] -> 0[0] via P2P/CUMEM 2024-12-05 12:30:35.922 n124-105-156:494061:494511 [1] NCCL INFO Ring Channel 23/0 : 1[1] -> 2[2] via P2P/CUMEM 2024-12-05 12:30:35.922 n124-105-156:494065:494510 [5] NCCL INFO Ring Channel 19/0 : 5[5] -> 6[6] via P2P/CUMEM 2024-12-05 12:30:35.922 n124-105-156:494067:494513 [7] NCCL INFO Ring Channel 15/0 : 7[7] -> 0[0] via P2P/CUMEM 2024-12-05 12:30:35.923 n124-105-156:494065:494510 [5] NCCL INFO Ring Channel 20/0 : 5[5] -> 6[6] via P2P/CUMEM 2024-12-05 12:30:35.923 n124-105-156:494066:494509 [6] NCCL INFO Ring Channel 00/0 : 6[6] -> 7[7] via P2P/CUMEM 2024-12-05 12:30:35.923 n124-105-156:494067:494513 [7] NCCL INFO Ring Channel 16/0 : 7[7] -> 0[0] via P2P/CUMEM 2024-12-05 12:30:35.923 n124-105-156:494064:494508 [4] NCCL INFO Ring Channel 00/0 : 4[4] -> 5[5] via P2P/CUMEM 2024-12-05 12:30:35.923 n124-105-156:494065:494510 [5] NCCL INFO Ring Channel 21/0 : 5[5] -> 6[6] via P2P/CUMEM 2024-12-05 12:30:35.923 n124-105-156:494066:494509 [6] NCCL INFO Ring Channel 01/0 : 6[6] -> 7[7] via P2P/CUMEM 2024-12-05 12:30:35.923 n124-105-156:494067:494513 [7] NCCL INFO Ring Channel 17/0 : 7[7] -> 0[0] via P2P/CUMEM 2024-12-05 12:30:35.924 n124-105-156:494064:494508 [4] NCCL INFO Ring Channel 01/0 : 4[4] -> 5[5] via P2P/CUMEM 2024-12-05 12:30:35.924 n124-105-156:494065:494510 [5] NCCL INFO Ring Channel 22/0 : 5[5] -> 6[6] via P2P/CUMEM 2024-12-05 12:30:35.924 n124-105-156:494066:494509 [6] NCCL INFO Ring Channel 02/0 : 6[6] -> 7[7] via P2P/CUMEM 2024-12-05 12:30:35.924 n124-105-156:494067:494513 [7] NCCL INFO Ring Channel 18/0 : 7[7] -> 0[0] via P2P/CUMEM 2024-12-05 12:30:35.924 n124-105-156:494064:494508 [4] NCCL INFO Ring Channel 02/0 : 4[4] -> 5[5] via P2P/CUMEM 2024-12-05 12:30:35.924 n124-105-156:494065:494510 [5] NCCL INFO Ring Channel 23/0 : 5[5] -> 6[6] via P2P/CUMEM 2024-12-05 12:30:35.924 n124-105-156:494066:494509 [6] NCCL INFO Ring Channel 03/0 : 6[6] -> 7[7] via P2P/CUMEM 2024-12-05 12:30:35.924 n124-105-156:494067:494513 [7] NCCL INFO Ring Channel 19/0 : 7[7] -> 0[0] via P2P/CUMEM 2024-12-05 12:30:35.925 n124-105-156:494064:494508 [4] NCCL INFO Ring Channel 03/0 : 4[4] -> 5[5] via P2P/CUMEM 2024-12-05 12:30:35.925 n124-105-156:494066:494509 [6] NCCL INFO Ring Channel 04/0 : 6[6] -> 7[7] via P2P/CUMEM 2024-12-05 12:30:35.925 n124-105-156:494067:494513 [7] NCCL INFO Ring Channel 20/0 : 7[7] -> 0[0] via P2P/CUMEM 2024-12-05 12:30:35.925 n124-105-156:494064:494508 [4] NCCL INFO Ring Channel 04/0 : 4[4] -> 5[5] via P2P/CUMEM 2024-12-05 12:30:35.925 n124-105-156:494067:494513 [7] NCCL INFO Ring Channel 21/0 : 7[7] -> 0[0] via P2P/CUMEM 2024-12-05 12:30:35.925 n124-105-156:494066:494509 [6] NCCL INFO Ring Channel 05/0 : 6[6] -> 7[7] via P2P/CUMEM 2024-12-05 12:30:35.926 n124-105-156:494064:494508 [4] NCCL INFO Ring Channel 05/0 : 4[4] -> 5[5] via P2P/CUMEM 2024-12-05 12:30:35.926 n124-105-156:494067:494513 [7] NCCL INFO Ring Channel 22/0 : 7[7] -> 0[0] via P2P/CUMEM 2024-12-05 12:30:35.926 n124-105-156:494066:494509 [6] NCCL INFO Ring Channel 06/0 : 6[6] -> 7[7] via P2P/CUMEM 2024-12-05 12:30:35.926 n124-105-156:494064:494508 [4] NCCL INFO Ring Channel 06/0 : 4[4] -> 5[5] via P2P/CUMEM 2024-12-05 12:30:35.926 n124-105-156:494067:494513 [7] NCCL INFO Ring Channel 23/0 : 7[7] -> 0[0] via P2P/CUMEM 2024-12-05 12:30:35.926 n124-105-156:494066:494509 [6] NCCL INFO Ring Channel 07/0 : 6[6] -> 7[7] via P2P/CUMEM 2024-12-05 12:30:35.927 n124-105-156:494064:494508 [4] NCCL INFO Ring Channel 07/0 : 4[4] -> 5[5] via P2P/CUMEM 2024-12-05 12:30:35.927 n124-105-156:494066:494509 [6] NCCL INFO Ring Channel 08/0 : 6[6] -> 7[7] via P2P/CUMEM 2024-12-05 12:30:35.927 n124-105-156:494063:494512 [3] NCCL INFO Ring Channel 00/0 : 3[3] -> 4[4] via P2P/CUMEM 2024-12-05 12:30:35.927 n124-105-156:494064:494508 [4] NCCL INFO Ring Channel 08/0 : 4[4] -> 5[5] via P2P/CUMEM 2024-12-05 12:30:35.927 n124-105-156:494066:494509 [6] NCCL INFO Ring Channel 09/0 : 6[6] -> 7[7] via P2P/CUMEM 2024-12-05 12:30:35.927 n124-105-156:494063:494512 [3] NCCL INFO Ring Channel 01/0 : 3[3] -> 4[4] via P2P/CUMEM 2024-12-05 12:30:35.928 n124-105-156:494064:494508 [4] NCCL INFO Ring Channel 09/0 : 4[4] -> 5[5] via P2P/CUMEM 2024-12-05 12:30:35.928 n124-105-156:494066:494509 [6] NCCL INFO Ring Channel 10/0 : 6[6] -> 7[7] via P2P/CUMEM 2024-12-05 12:30:35.928 n124-105-156:494063:494512 [3] NCCL INFO Ring Channel 02/0 : 3[3] -> 4[4] via P2P/CUMEM 2024-12-05 12:30:35.928 n124-105-156:494064:494508 [4] NCCL INFO Ring Channel 10/0 : 4[4] -> 5[5] via P2P/CUMEM 2024-12-05 12:30:35.928 n124-105-156:494063:494512 [3] NCCL INFO Ring Channel 03/0 : 3[3] -> 4[4] via P2P/CUMEM 2024-12-05 12:30:35.928 n124-105-156:494066:494509 [6] NCCL INFO Ring Channel 11/0 : 6[6] -> 7[7] via P2P/CUMEM 2024-12-05 12:30:35.928 n124-105-156:494060:494507 [0] NCCL INFO Ring Channel 00/0 : 0[0] -> 1[1] via P2P/CUMEM 2024-12-05 12:30:35.928 n124-105-156:494064:494508 [4] NCCL INFO Ring Channel 11/0 : 4[4] -> 5[5] via P2P/CUMEM 2024-12-05 12:30:35.928 n124-105-156:494063:494512 [3] NCCL INFO Ring Channel 04/0 : 3[3] -> 4[4] via P2P/CUMEM 2024-12-05 12:30:35.929 n124-105-156:494060:494507 [0] NCCL INFO Ring Channel 01/0 : 0[0] -> 1[1] via P2P/CUMEM 2024-12-05 12:30:35.929 n124-105-156:494066:494509 [6] NCCL INFO Ring Channel 12/0 : 6[6] -> 7[7] via P2P/CUMEM 2024-12-05 12:30:35.929 n124-105-156:494064:494508 [4] NCCL INFO Ring Channel 12/0 : 4[4] -> 5[5] via P2P/CUMEM 2024-12-05 12:30:35.929 n124-105-156:494063:494512 [3] NCCL INFO Ring Channel 05/0 : 3[3] -> 4[4] via P2P/CUMEM 2024-12-05 12:30:35.929 n124-105-156:494060:494507 [0] NCCL INFO Ring Channel 02/0 : 0[0] -> 1[1] via P2P/CUMEM 2024-12-05 12:30:35.929 n124-105-156:494064:494508 [4] NCCL INFO Ring Channel 13/0 : 4[4] -> 5[5] via P2P/CUMEM 2024-12-05 12:30:35.929 n124-105-156:494066:494509 [6] NCCL INFO Ring Channel 13/0 : 6[6] -> 7[7] via P2P/CUMEM 2024-12-05 12:30:35.929 n124-105-156:494063:494512 [3] NCCL INFO Ring Channel 06/0 : 3[3] -> 4[4] via P2P/CUMEM 2024-12-05 12:30:35.929 n124-105-156:494060:494507 [0] NCCL INFO Ring Channel 03/0 : 0[0] -> 1[1] via P2P/CUMEM 2024-12-05 12:30:35.929 n124-105-156:494064:494508 [4] NCCL INFO Ring Channel 14/0 : 4[4] -> 5[5] via P2P/CUMEM 2024-12-05 12:30:35.930 n124-105-156:494063:494512 [3] NCCL INFO Ring Channel 07/0 : 3[3] -> 4[4] via P2P/CUMEM 2024-12-05 12:30:35.930 n124-105-156:494066:494509 [6] NCCL INFO Ring Channel 14/0 : 6[6] -> 7[7] via P2P/CUMEM 2024-12-05 12:30:35.930 n124-105-156:494060:494507 [0] NCCL INFO Ring Channel 04/0 : 0[0] -> 1[1] via P2P/CUMEM 2024-12-05 12:30:35.930 n124-105-156:494064:494508 [4] NCCL INFO Ring Channel 15/0 : 4[4] -> 5[5] via P2P/CUMEM 2024-12-05 12:30:35.930 n124-105-156:494063:494512 [3] NCCL INFO Ring Channel 08/0 : 3[3] -> 4[4] via P2P/CUMEM 2024-12-05 12:30:35.930 n124-105-156:494066:494509 [6] NCCL INFO Ring Channel 15/0 : 6[6] -> 7[7] via P2P/CUMEM 2024-12-05 12:30:35.930 n124-105-156:494060:494507 [0] NCCL INFO Ring Channel 05/0 : 0[0] -> 1[1] via P2P/CUMEM 2024-12-05 12:30:35.930 n124-105-156:494064:494508 [4] NCCL INFO Ring Channel 16/0 : 4[4] -> 5[5] via P2P/CUMEM 2024-12-05 12:30:35.930 n124-105-156:494063:494512 [3] NCCL INFO Ring Channel 09/0 : 3[3] -> 4[4] via P2P/CUMEM 2024-12-05 12:30:35.930 n124-105-156:494066:494509 [6] NCCL INFO Ring Channel 16/0 : 6[6] -> 7[7] via P2P/CUMEM 2024-12-05 12:30:35.930 n124-105-156:494060:494507 [0] NCCL INFO Ring Channel 06/0 : 0[0] -> 1[1] via P2P/CUMEM 2024-12-05 12:30:35.930 n124-105-156:494064:494508 [4] NCCL INFO Ring Channel 17/0 : 4[4] -> 5[5] via P2P/CUMEM 2024-12-05 12:30:35.931 n124-105-156:494063:494512 [3] NCCL INFO Ring Channel 10/0 : 3[3] -> 4[4] via P2P/CUMEM 2024-12-05 12:30:35.931 n124-105-156:494066:494509 [6] NCCL INFO Ring Channel 17/0 : 6[6] -> 7[7] via P2P/CUMEM 2024-12-05 12:30:35.931 n124-105-156:494060:494507 [0] NCCL INFO Ring Channel 07/0 : 0[0] -> 1[1] via P2P/CUMEM 2024-12-05 12:30:35.931 n124-105-156:494062:494514 [2] NCCL INFO Ring Channel 00/0 : 2[2] -> 3[3] via P2P/CUMEM 2024-12-05 12:30:35.931 n124-105-156:494064:494508 [4] NCCL INFO Ring Channel 18/0 : 4[4] -> 5[5] via P2P/CUMEM 2024-12-05 12:30:35.931 n124-105-156:494063:494512 [3] NCCL INFO Ring Channel 11/0 : 3[3] -> 4[4] via P2P/CUMEM 2024-12-05 12:30:35.931 n124-105-156:494066:494509 [6] NCCL INFO Ring Channel 18/0 : 6[6] -> 7[7] via P2P/CUMEM 2024-12-05 12:30:35.931 n124-105-156:494060:494507 [0] NCCL INFO Ring Channel 08/0 : 0[0] -> 1[1] via P2P/CUMEM 2024-12-05 12:30:35.931 n124-105-156:494062:494514 [2] NCCL INFO Ring Channel 01/0 : 2[2] -> 3[3] via P2P/CUMEM 2024-12-05 12:30:35.931 n124-105-156:494064:494508 [4] NCCL INFO Ring Channel 19/0 : 4[4] -> 5[5] via P2P/CUMEM 2024-12-05 12:30:35.931 n124-105-156:494063:494512 [3] NCCL INFO Ring Channel 12/0 : 3[3] -> 4[4] via P2P/CUMEM 2024-12-05 12:30:35.931 n124-105-156:494066:494509 [6] NCCL INFO Ring Channel 19/0 : 6[6] -> 7[7] via P2P/CUMEM 2024-12-05 12:30:35.931 n124-105-156:494060:494507 [0] NCCL INFO Ring Channel 09/0 : 0[0] -> 1[1] via P2P/CUMEM 2024-12-05 12:30:35.931 n124-105-156:494062:494514 [2] NCCL INFO Ring Channel 02/0 : 2[2] -> 3[3] via P2P/CUMEM 2024-12-05 12:30:35.932 n124-105-156:494064:494508 [4] NCCL INFO Ring Channel 20/0 : 4[4] -> 5[5] via P2P/CUMEM 2024-12-05 12:30:35.932 n124-105-156:494063:494512 [3] NCCL INFO Ring Channel 13/0 : 3[3] -> 4[4] via P2P/CUMEM 2024-12-05 12:30:35.932 n124-105-156:494066:494509 [6] NCCL INFO Ring Channel 20/0 : 6[6] -> 7[7] via P2P/CUMEM 2024-12-05 12:30:35.932 n124-105-156:494060:494507 [0] NCCL INFO Ring Channel 10/0 : 0[0] -> 1[1] via P2P/CUMEM 2024-12-05 12:30:35.932 n124-105-156:494062:494514 [2] NCCL INFO Ring Channel 03/0 : 2[2] -> 3[3] via P2P/CUMEM 2024-12-05 12:30:35.932 n124-105-156:494064:494508 [4] NCCL INFO Ring Channel 21/0 : 4[4] -> 5[5] via P2P/CUMEM 2024-12-05 12:30:35.932 n124-105-156:494063:494512 [3] NCCL INFO Ring Channel 14/0 : 3[3] -> 4[4] via P2P/CUMEM 2024-12-05 12:30:35.932 n124-105-156:494066:494509 [6] NCCL INFO Ring Channel 21/0 : 6[6] -> 7[7] via P2P/CUMEM 2024-12-05 12:30:35.932 n124-105-156:494060:494507 [0] NCCL INFO Ring Channel 11/0 : 0[0] -> 1[1] via P2P/CUMEM 2024-12-05 12:30:35.932 n124-105-156:494062:494514 [2] NCCL INFO Ring Channel 04/0 : 2[2] -> 3[3] via P2P/CUMEM 2024-12-05 12:30:35.932 n124-105-156:494064:494508 [4] NCCL INFO Ring Channel 22/0 : 4[4] -> 5[5] via P2P/CUMEM 2024-12-05 12:30:35.932 n124-105-156:494063:494512 [3] NCCL INFO Ring Channel 15/0 : 3[3] -> 4[4] via P2P/CUMEM 2024-12-05 12:30:35.933 n124-105-156:494066:494509 [6] NCCL INFO Ring Channel 22/0 : 6[6] -> 7[7] via P2P/CUMEM 2024-12-05 12:30:35.933 n124-105-156:494060:494507 [0] NCCL INFO Ring Channel 12/0 : 0[0] -> 1[1] via P2P/CUMEM 2024-12-05 12:30:35.933 n124-105-156:494062:494514 [2] NCCL INFO Ring Channel 05/0 : 2[2] -> 3[3] via P2P/CUMEM 2024-12-05 12:30:35.933 n124-105-156:494064:494508 [4] NCCL INFO Ring Channel 23/0 : 4[4] -> 5[5] via P2P/CUMEM 2024-12-05 12:30:35.933 n124-105-156:494063:494512 [3] NCCL INFO Ring Channel 16/0 : 3[3] -> 4[4] via P2P/CUMEM 2024-12-05 12:30:35.933 n124-105-156:494066:494509 [6] NCCL INFO Ring Channel 23/0 : 6[6] -> 7[7] via P2P/CUMEM 2024-12-05 12:30:35.933 n124-105-156:494060:494507 [0] NCCL INFO Ring Channel 13/0 : 0[0] -> 1[1] via P2P/CUMEM 2024-12-05 12:30:35.933 n124-105-156:494062:494514 [2] NCCL INFO Ring Channel 06/0 : 2[2] -> 3[3] via P2P/CUMEM 2024-12-05 12:30:35.933 n124-105-156:494063:494512 [3] NCCL INFO Ring Channel 17/0 : 3[3] -> 4[4] via P2P/CUMEM 2024-12-05 12:30:35.933 n124-105-156:494060:494507 [0] NCCL INFO Ring Channel 14/0 : 0[0] -> 1[1] via P2P/CUMEM 2024-12-05 12:30:35.933 n124-105-156:494062:494514 [2] NCCL INFO Ring Channel 07/0 : 2[2] -> 3[3] via P2P/CUMEM 2024-12-05 12:30:35.933 n124-105-156:494063:494512 [3] NCCL INFO Ring Channel 18/0 : 3[3] -> 4[4] via P2P/CUMEM 2024-12-05 12:30:35.933 n124-105-156:494060:494507 [0] NCCL INFO Ring Channel 15/0 : 0[0] -> 1[1] via P2P/CUMEM 2024-12-05 12:30:35.934 n124-105-156:494062:494514 [2] NCCL INFO Ring Channel 08/0 : 2[2] -> 3[3] via P2P/CUMEM 2024-12-05 12:30:35.934 n124-105-156:494063:494512 [3] NCCL INFO Ring Channel 19/0 : 3[3] -> 4[4] via P2P/CUMEM 2024-12-05 12:30:35.934 n124-105-156:494060:494507 [0] NCCL INFO Ring Channel 16/0 : 0[0] -> 1[1] via P2P/CUMEM 2024-12-05 12:30:35.934 n124-105-156:494062:494514 [2] NCCL INFO Ring Channel 09/0 : 2[2] -> 3[3] via P2P/CUMEM 2024-12-05 12:30:35.934 n124-105-156:494063:494512 [3] NCCL INFO Ring Channel 20/0 : 3[3] -> 4[4] via P2P/CUMEM 2024-12-05 12:30:35.934 n124-105-156:494060:494507 [0] NCCL INFO Ring Channel 17/0 : 0[0] -> 1[1] via P2P/CUMEM 2024-12-05 12:30:35.934 n124-105-156:494062:494514 [2] NCCL INFO Ring Channel 10/0 : 2[2] -> 3[3] via P2P/CUMEM 2024-12-05 12:30:35.934 n124-105-156:494063:494512 [3] NCCL INFO Ring Channel 21/0 : 3[3] -> 4[4] via P2P/CUMEM 2024-12-05 12:30:35.934 n124-105-156:494060:494507 [0] NCCL INFO Ring Channel 18/0 : 0[0] -> 1[1] via P2P/CUMEM 2024-12-05 12:30:35.936 n124-105-156:494062:494514 [2] NCCL INFO Ring Channel 11/0 : 2[2] -> 3[3] via P2P/CUMEM 2024-12-05 12:30:35.936 n124-105-156:494063:494512 [3] NCCL INFO Ring Channel 22/0 : 3[3] -> 4[4] via P2P/CUMEM 2024-12-05 12:30:35.936 n124-105-156:494060:494507 [0] NCCL INFO Ring Channel 19/0 : 0[0] -> 1[1] via P2P/CUMEM 2024-12-05 12:30:35.936 n124-105-156:494062:494514 [2] NCCL INFO Ring Channel 12/0 : 2[2] -> 3[3] via P2P/CUMEM 2024-12-05 12:30:35.936 n124-105-156:494063:494512 [3] NCCL INFO Ring Channel 23/0 : 3[3] -> 4[4] via P2P/CUMEM 2024-12-05 12:30:35.936 n124-105-156:494060:494507 [0] NCCL INFO Ring Channel 20/0 : 0[0] -> 1[1] via P2P/CUMEM 2024-12-05 12:30:35.939 n124-105-156:494062:494514 [2] NCCL INFO Ring Channel 13/0 : 2[2] -> 3[3] via P2P/CUMEM 2024-12-05 12:30:35.941 n124-105-156:494060:494507 [0] NCCL INFO Ring Channel 21/0 : 0[0] -> 1[1] via P2P/CUMEM 2024-12-05 12:30:35.941 n124-105-156:494062:494514 [2] NCCL INFO Ring Channel 14/0 : 2[2] -> 3[3] via P2P/CUMEM 2024-12-05 12:30:35.942 n124-105-156:494062:494514 [2] NCCL INFO Ring Channel 15/0 : 2[2] -> 3[3] via P2P/CUMEM 2024-12-05 12:30:35.945 n124-105-156:494060:494507 [0] NCCL INFO Ring Channel 22/0 : 0[0] -> 1[1] via P2P/CUMEM 2024-12-05 12:30:35.948 n124-105-156:494062:494514 [2] NCCL INFO Ring Channel 16/0 : 2[2] -> 3[3] via P2P/CUMEM 2024-12-05 12:30:35.949 n124-105-156:494060:494507 [0] NCCL INFO Ring Channel 23/0 : 0[0] -> 1[1] via P2P/CUMEM 2024-12-05 12:30:35.951 n124-105-156:494062:494514 [2] NCCL INFO Ring Channel 17/0 : 2[2] -> 3[3] via P2P/CUMEM 2024-12-05 12:30:35.952 n124-105-156:494062:494514 [2] NCCL INFO Ring Channel 18/0 : 2[2] -> 3[3] via P2P/CUMEM 2024-12-05 12:30:35.953 n124-105-156:494062:494514 [2] NCCL INFO Ring Channel 19/0 : 2[2] -> 3[3] via P2P/CUMEM 2024-12-05 12:30:35.956 n124-105-156:494062:494514 [2] NCCL INFO Ring Channel 20/0 : 2[2] -> 3[3] via P2P/CUMEM 2024-12-05 12:30:35.963 n124-105-156:494062:494514 [2] NCCL INFO Ring Channel 21/0 : 2[2] -> 3[3] via P2P/CUMEM 2024-12-05 12:30:35.971 n124-105-156:494062:494514 [2] NCCL INFO Ring Channel 22/0 : 2[2] -> 3[3] via P2P/CUMEM 2024-12-05 12:30:35.980 n124-105-156:494062:494514 [2] NCCL INFO Ring Channel 23/0 : 2[2] -> 3[3] via P2P/CUMEM 2024-12-05 12:30:37.084 n124-105-156:494061:494511 [1] NCCL INFO Connected all rings 2024-12-05 12:30:37.084 n124-105-156:494060:494507 [0] NCCL INFO Connected all rings 2024-12-05 12:30:37.090 n124-105-156:494064:494508 [4] NCCL INFO Connected all rings 2024-12-05 12:30:37.090 n124-105-156:494063:494512 [3] NCCL INFO Connected all rings 2024-12-05 12:30:37.090 n124-105-156:494062:494514 [2] NCCL INFO Connected all rings 2024-12-05 12:30:37.104 n124-105-156:494067:494513 [7] NCCL INFO Connected all rings 2024-12-05 12:30:37.104 n124-105-156:494067:494513 [7] NCCL INFO Tree Channel 00/0 : 7[7] -> 6[6] via P2P/CUMEM 2024-12-05 12:30:37.104 n124-105-156:494065:494510 [5] NCCL INFO Connected all rings 2024-12-05 12:30:37.104 n124-105-156:494066:494509 [6] NCCL INFO Connected all rings 2024-12-05 12:30:37.105 n124-105-156:494067:494513 [7] NCCL INFO Tree Channel 01/0 : 7[7] -> 6[6] via P2P/CUMEM 2024-12-05 12:30:37.105 n124-105-156:494067:494513 [7] NCCL INFO Tree Channel 02/0 : 7[7] -> 6[6] via P2P/CUMEM 2024-12-05 12:30:37.105 n124-105-156:494067:494513 [7] NCCL INFO Tree Channel 03/0 : 7[7] -> 6[6] via P2P/CUMEM 2024-12-05 12:30:37.105 n124-105-156:494067:494513 [7] NCCL INFO Tree Channel 04/0 : 7[7] -> 6[6] via P2P/CUMEM 2024-12-05 12:30:37.106 n124-105-156:494067:494513 [7] NCCL INFO Tree Channel 05/0 : 7[7] -> 6[6] via P2P/CUMEM 2024-12-05 12:30:37.106 n124-105-156:494067:494513 [7] NCCL INFO Tree Channel 06/0 : 7[7] -> 6[6] via P2P/CUMEM 2024-12-05 12:30:37.106 n124-105-156:494067:494513 [7] NCCL INFO Tree Channel 07/0 : 7[7] -> 6[6] via P2P/CUMEM 2024-12-05 12:30:37.107 n124-105-156:494067:494513 [7] NCCL INFO Tree Channel 08/0 : 7[7] -> 6[6] via P2P/CUMEM 2024-12-05 12:30:37.107 n124-105-156:494067:494513 [7] NCCL INFO Tree Channel 09/0 : 7[7] -> 6[6] via P2P/CUMEM 2024-12-05 12:30:37.108 n124-105-156:494067:494513 [7] NCCL INFO Tree Channel 10/0 : 7[7] -> 6[6] via P2P/CUMEM 2024-12-05 12:30:37.108 n124-105-156:494067:494513 [7] NCCL INFO Tree Channel 11/0 : 7[7] -> 6[6] via P2P/CUMEM 2024-12-05 12:30:37.108 n124-105-156:494063:494512 [3] NCCL INFO Tree Channel 00/0 : 3[3] -> 2[2] via P2P/CUMEM 2024-12-05 12:30:37.108 n124-105-156:494067:494513 [7] NCCL INFO Tree Channel 12/0 : 7[7] -> 6[6] via P2P/CUMEM 2024-12-05 12:30:37.108 n124-105-156:494063:494512 [3] NCCL INFO Tree Channel 01/0 : 3[3] -> 2[2] via P2P/CUMEM 2024-12-05 12:30:37.109 n124-105-156:494067:494513 [7] NCCL INFO Tree Channel 13/0 : 7[7] -> 6[6] via P2P/CUMEM 2024-12-05 12:30:37.109 n124-105-156:494063:494512 [3] NCCL INFO Tree Channel 02/0 : 3[3] -> 2[2] via P2P/CUMEM 2024-12-05 12:30:37.109 n124-105-156:494067:494513 [7] NCCL INFO Tree Channel 14/0 : 7[7] -> 6[6] via P2P/CUMEM 2024-12-05 12:30:37.109 n124-105-156:494064:494508 [4] NCCL INFO Tree Channel 00/0 : 4[4] -> 3[3] via P2P/CUMEM 2024-12-05 12:30:37.109 n124-105-156:494063:494512 [3] NCCL INFO Tree Channel 03/0 : 3[3] -> 2[2] via P2P/CUMEM 2024-12-05 12:30:37.110 n124-105-156:494067:494513 [7] NCCL INFO Tree Channel 15/0 : 7[7] -> 6[6] via P2P/CUMEM 2024-12-05 12:30:37.110 n124-105-156:494064:494508 [4] NCCL INFO Tree Channel 01/0 : 4[4] -> 3[3] via P2P/CUMEM 2024-12-05 12:30:37.110 n124-105-156:494063:494512 [3] NCCL INFO Tree Channel 04/0 : 3[3] -> 2[2] via P2P/CUMEM 2024-12-05 12:30:37.110 n124-105-156:494067:494513 [7] NCCL INFO Tree Channel 16/0 : 7[7] -> 6[6] via P2P/CUMEM 2024-12-05 12:30:37.110 n124-105-156:494064:494508 [4] NCCL INFO Tree Channel 02/0 : 4[4] -> 3[3] via P2P/CUMEM 2024-12-05 12:30:37.110 n124-105-156:494063:494512 [3] NCCL INFO Tree Channel 05/0 : 3[3] -> 2[2] via P2P/CUMEM 2024-12-05 12:30:37.111 n124-105-156:494067:494513 [7] NCCL INFO Tree Channel 17/0 : 7[7] -> 6[6] via P2P/CUMEM 2024-12-05 12:30:37.111 n124-105-156:494064:494508 [4] NCCL INFO Tree Channel 03/0 : 4[4] -> 3[3] via P2P/CUMEM 2024-12-05 12:30:37.111 n124-105-156:494063:494512 [3] NCCL INFO Tree Channel 06/0 : 3[3] -> 2[2] via P2P/CUMEM 2024-12-05 12:30:37.111 n124-105-156:494067:494513 [7] NCCL INFO Tree Channel 18/0 : 7[7] -> 6[6] via P2P/CUMEM 2024-12-05 12:30:37.112 n124-105-156:494064:494508 [4] NCCL INFO Tree Channel 04/0 : 4[4] -> 3[3] via P2P/CUMEM 2024-12-05 12:30:37.112 n124-105-156:494063:494512 [3] NCCL INFO Tree Channel 07/0 : 3[3] -> 2[2] via P2P/CUMEM 2024-12-05 12:30:37.112 n124-105-156:494067:494513 [7] NCCL INFO Tree Channel 19/0 : 7[7] -> 6[6] via P2P/CUMEM 2024-12-05 12:30:37.112 n124-105-156:494064:494508 [4] NCCL INFO Tree Channel 05/0 : 4[4] -> 3[3] via P2P/CUMEM 2024-12-05 12:30:37.112 n124-105-156:494063:494512 [3] NCCL INFO Tree Channel 08/0 : 3[3] -> 2[2] via P2P/CUMEM 2024-12-05 12:30:37.112 n124-105-156:494061:494511 [1] NCCL INFO Tree Channel 00/0 : 1[1] -> 0[0] via P2P/CUMEM 2024-12-05 12:30:37.112 n124-105-156:494067:494513 [7] NCCL INFO Tree Channel 20/0 : 7[7] -> 6[6] via P2P/CUMEM 2024-12-05 12:30:37.113 n124-105-156:494064:494508 [4] NCCL INFO Tree Channel 06/0 : 4[4] -> 3[3] via P2P/CUMEM 2024-12-05 12:30:37.113 n124-105-156:494063:494512 [3] NCCL INFO Tree Channel 09/0 : 3[3] -> 2[2] via P2P/CUMEM 2024-12-05 12:30:37.113 n124-105-156:494061:494511 [1] NCCL INFO Tree Channel 01/0 : 1[1] -> 0[0] via P2P/CUMEM 2024-12-05 12:30:37.113 n124-105-156:494067:494513 [7] NCCL INFO Tree Channel 21/0 : 7[7] -> 6[6] via P2P/CUMEM 2024-12-05 12:30:37.113 n124-105-156:494064:494508 [4] NCCL INFO Tree Channel 07/0 : 4[4] -> 3[3] via P2P/CUMEM 2024-12-05 12:30:37.113 n124-105-156:494063:494512 [3] NCCL INFO Tree Channel 10/0 : 3[3] -> 2[2] via P2P/CUMEM 2024-12-05 12:30:37.113 n124-105-156:494061:494511 [1] NCCL INFO Tree Channel 02/0 : 1[1] -> 0[0] via P2P/CUMEM 2024-12-05 12:30:37.113 n124-105-156:494067:494513 [7] NCCL INFO Tree Channel 22/0 : 7[7] -> 6[6] via P2P/CUMEM 2024-12-05 12:30:37.114 n124-105-156:494064:494508 [4] NCCL INFO Tree Channel 08/0 : 4[4] -> 3[3] via P2P/CUMEM 2024-12-05 12:30:37.114 n124-105-156:494063:494512 [3] NCCL INFO Tree Channel 11/0 : 3[3] -> 2[2] via P2P/CUMEM 2024-12-05 12:30:37.114 n124-105-156:494061:494511 [1] NCCL INFO Tree Channel 03/0 : 1[1] -> 0[0] via P2P/CUMEM 2024-12-05 12:30:37.114 n124-105-156:494067:494513 [7] NCCL INFO Tree Channel 23/0 : 7[7] -> 6[6] via P2P/CUMEM 2024-12-05 12:30:37.114 n124-105-156:494064:494508 [4] NCCL INFO Tree Channel 09/0 : 4[4] -> 3[3] via P2P/CUMEM 2024-12-05 12:30:37.114 n124-105-156:494063:494512 [3] NCCL INFO Tree Channel 12/0 : 3[3] -> 2[2] via P2P/CUMEM 2024-12-05 12:30:37.114 n124-105-156:494065:494510 [5] NCCL INFO Tree Channel 00/0 : 5[5] -> 4[4] via P2P/CUMEM 2024-12-05 12:30:37.115 n124-105-156:494061:494511 [1] NCCL INFO Tree Channel 04/0 : 1[1] -> 0[0] via P2P/CUMEM 2024-12-05 12:30:37.115 n124-105-156:494064:494508 [4] NCCL INFO Tree Channel 10/0 : 4[4] -> 3[3] via P2P/CUMEM 2024-12-05 12:30:37.115 n124-105-156:494066:494509 [6] NCCL INFO Tree Channel 00/0 : 6[6] -> 5[5] via P2P/CUMEM 2024-12-05 12:30:37.115 n124-105-156:494063:494512 [3] NCCL INFO Tree Channel 13/0 : 3[3] -> 2[2] via P2P/CUMEM 2024-12-05 12:30:37.115 n124-105-156:494065:494510 [5] NCCL INFO Tree Channel 01/0 : 5[5] -> 4[4] via P2P/CUMEM 2024-12-05 12:30:37.115 n124-105-156:494061:494511 [1] NCCL INFO Tree Channel 05/0 : 1[1] -> 0[0] via P2P/CUMEM 2024-12-05 12:30:37.115 n124-105-156:494062:494514 [2] NCCL INFO Tree Channel 00/0 : 2[2] -> 1[1] via P2P/CUMEM 2024-12-05 12:30:37.115 n124-105-156:494064:494508 [4] NCCL INFO Tree Channel 11/0 : 4[4] -> 3[3] via P2P/CUMEM 2024-12-05 12:30:37.115 n124-105-156:494066:494509 [6] NCCL INFO Tree Channel 01/0 : 6[6] -> 5[5] via P2P/CUMEM 2024-12-05 12:30:37.115 n124-105-156:494063:494512 [3] NCCL INFO Tree Channel 14/0 : 3[3] -> 2[2] via P2P/CUMEM 2024-12-05 12:30:37.115 n124-105-156:494065:494510 [5] NCCL INFO Tree Channel 02/0 : 5[5] -> 4[4] via P2P/CUMEM 2024-12-05 12:30:37.115 n124-105-156:494061:494511 [1] NCCL INFO Tree Channel 06/0 : 1[1] -> 0[0] via P2P/CUMEM 2024-12-05 12:30:37.116 n124-105-156:494062:494514 [2] NCCL INFO Tree Channel 01/0 : 2[2] -> 1[1] via P2P/CUMEM 2024-12-05 12:30:37.116 n124-105-156:494064:494508 [4] NCCL INFO Tree Channel 12/0 : 4[4] -> 3[3] via P2P/CUMEM 2024-12-05 12:30:37.116 n124-105-156:494066:494509 [6] NCCL INFO Tree Channel 02/0 : 6[6] -> 5[5] via P2P/CUMEM 2024-12-05 12:30:37.116 n124-105-156:494065:494510 [5] NCCL INFO Tree Channel 03/0 : 5[5] -> 4[4] via P2P/CUMEM 2024-12-05 12:30:37.116 n124-105-156:494063:494512 [3] NCCL INFO Tree Channel 15/0 : 3[3] -> 2[2] via P2P/CUMEM 2024-12-05 12:30:37.116 n124-105-156:494061:494511 [1] NCCL INFO Tree Channel 07/0 : 1[1] -> 0[0] via P2P/CUMEM 2024-12-05 12:30:37.116 n124-105-156:494062:494514 [2] NCCL INFO Tree Channel 02/0 : 2[2] -> 1[1] via P2P/CUMEM 2024-12-05 12:30:37.116 n124-105-156:494064:494508 [4] NCCL INFO Tree Channel 13/0 : 4[4] -> 3[3] via P2P/CUMEM 2024-12-05 12:30:37.116 n124-105-156:494066:494509 [6] NCCL INFO Tree Channel 03/0 : 6[6] -> 5[5] via P2P/CUMEM 2024-12-05 12:30:37.116 n124-105-156:494065:494510 [5] NCCL INFO Tree Channel 04/0 : 5[5] -> 4[4] via P2P/CUMEM 2024-12-05 12:30:37.116 n124-105-156:494063:494512 [3] NCCL INFO Tree Channel 16/0 : 3[3] -> 2[2] via P2P/CUMEM 2024-12-05 12:30:37.116 n124-105-156:494061:494511 [1] NCCL INFO Tree Channel 08/0 : 1[1] -> 0[0] via P2P/CUMEM 2024-12-05 12:30:37.116 n124-105-156:494062:494514 [2] NCCL INFO Tree Channel 03/0 : 2[2] -> 1[1] via P2P/CUMEM 2024-12-05 12:30:37.116 n124-105-156:494064:494508 [4] NCCL INFO Tree Channel 14/0 : 4[4] -> 3[3] via P2P/CUMEM 2024-12-05 12:30:37.117 n124-105-156:494066:494509 [6] NCCL INFO Tree Channel 04/0 : 6[6] -> 5[5] via P2P/CUMEM 2024-12-05 12:30:37.117 n124-105-156:494065:494510 [5] NCCL INFO Tree Channel 05/0 : 5[5] -> 4[4] via P2P/CUMEM 2024-12-05 12:30:37.117 n124-105-156:494063:494512 [3] NCCL INFO Tree Channel 17/0 : 3[3] -> 2[2] via P2P/CUMEM 2024-12-05 12:30:37.117 n124-105-156:494061:494511 [1] NCCL INFO Tree Channel 09/0 : 1[1] -> 0[0] via P2P/CUMEM 2024-12-05 12:30:37.117 n124-105-156:494062:494514 [2] NCCL INFO Tree Channel 04/0 : 2[2] -> 1[1] via P2P/CUMEM 2024-12-05 12:30:37.117 n124-105-156:494064:494508 [4] NCCL INFO Tree Channel 15/0 : 4[4] -> 3[3] via P2P/CUMEM 2024-12-05 12:30:37.117 n124-105-156:494065:494510 [5] NCCL INFO Tree Channel 06/0 : 5[5] -> 4[4] via P2P/CUMEM 2024-12-05 12:30:37.117 n124-105-156:494066:494509 [6] NCCL INFO Tree Channel 05/0 : 6[6] -> 5[5] via P2P/CUMEM 2024-12-05 12:30:37.117 n124-105-156:494063:494512 [3] NCCL INFO Tree Channel 18/0 : 3[3] -> 2[2] via P2P/CUMEM 2024-12-05 12:30:37.117 n124-105-156:494061:494511 [1] NCCL INFO Tree Channel 10/0 : 1[1] -> 0[0] via P2P/CUMEM 2024-12-05 12:30:37.117 n124-105-156:494062:494514 [2] NCCL INFO Tree Channel 05/0 : 2[2] -> 1[1] via P2P/CUMEM 2024-12-05 12:30:37.117 n124-105-156:494064:494508 [4] NCCL INFO Tree Channel 16/0 : 4[4] -> 3[3] via P2P/CUMEM 2024-12-05 12:30:37.118 n124-105-156:494065:494510 [5] NCCL INFO Tree Channel 07/0 : 5[5] -> 4[4] via P2P/CUMEM 2024-12-05 12:30:37.118 n124-105-156:494066:494509 [6] NCCL INFO Tree Channel 06/0 : 6[6] -> 5[5] via P2P/CUMEM 2024-12-05 12:30:37.118 n124-105-156:494063:494512 [3] NCCL INFO Tree Channel 19/0 : 3[3] -> 2[2] via P2P/CUMEM 2024-12-05 12:30:37.118 n124-105-156:494061:494511 [1] NCCL INFO Tree Channel 11/0 : 1[1] -> 0[0] via P2P/CUMEM 2024-12-05 12:30:37.118 n124-105-156:494062:494514 [2] NCCL INFO Tree Channel 06/0 : 2[2] -> 1[1] via P2P/CUMEM 2024-12-05 12:30:37.118 n124-105-156:494064:494508 [4] NCCL INFO Tree Channel 17/0 : 4[4] -> 3[3] via P2P/CUMEM 2024-12-05 12:30:37.118 n124-105-156:494065:494510 [5] NCCL INFO Tree Channel 08/0 : 5[5] -> 4[4] via P2P/CUMEM 2024-12-05 12:30:37.118 n124-105-156:494066:494509 [6] NCCL INFO Tree Channel 07/0 : 6[6] -> 5[5] via P2P/CUMEM 2024-12-05 12:30:37.118 n124-105-156:494063:494512 [3] NCCL INFO Tree Channel 20/0 : 3[3] -> 2[2] via P2P/CUMEM 2024-12-05 12:30:37.118 n124-105-156:494061:494511 [1] NCCL INFO Tree Channel 12/0 : 1[1] -> 0[0] via P2P/CUMEM 2024-12-05 12:30:37.118 n124-105-156:494062:494514 [2] NCCL INFO Tree Channel 07/0 : 2[2] -> 1[1] via P2P/CUMEM 2024-12-05 12:30:37.118 n124-105-156:494064:494508 [4] NCCL INFO Tree Channel 18/0 : 4[4] -> 3[3] via P2P/CUMEM 2024-12-05 12:30:37.118 n124-105-156:494065:494510 [5] NCCL INFO Tree Channel 09/0 : 5[5] -> 4[4] via P2P/CUMEM 2024-12-05 12:30:37.119 n124-105-156:494066:494509 [6] NCCL INFO Tree Channel 08/0 : 6[6] -> 5[5] via P2P/CUMEM 2024-12-05 12:30:37.119 n124-105-156:494063:494512 [3] NCCL INFO Tree Channel 21/0 : 3[3] -> 2[2] via P2P/CUMEM 2024-12-05 12:30:37.119 n124-105-156:494061:494511 [1] NCCL INFO Tree Channel 13/0 : 1[1] -> 0[0] via P2P/CUMEM 2024-12-05 12:30:37.119 n124-105-156:494062:494514 [2] NCCL INFO Tree Channel 08/0 : 2[2] -> 1[1] via P2P/CUMEM 2024-12-05 12:30:37.119 n124-105-156:494064:494508 [4] NCCL INFO Tree Channel 19/0 : 4[4] -> 3[3] via P2P/CUMEM 2024-12-05 12:30:37.119 n124-105-156:494065:494510 [5] NCCL INFO Tree Channel 10/0 : 5[5] -> 4[4] via P2P/CUMEM 2024-12-05 12:30:37.119 n124-105-156:494063:494512 [3] NCCL INFO Tree Channel 22/0 : 3[3] -> 2[2] via P2P/CUMEM 2024-12-05 12:30:37.119 n124-105-156:494066:494509 [6] NCCL INFO Tree Channel 09/0 : 6[6] -> 5[5] via P2P/CUMEM 2024-12-05 12:30:37.119 n124-105-156:494061:494511 [1] NCCL INFO Tree Channel 14/0 : 1[1] -> 0[0] via P2P/CUMEM 2024-12-05 12:30:37.119 n124-105-156:494062:494514 [2] NCCL INFO Tree Channel 09/0 : 2[2] -> 1[1] via P2P/CUMEM 2024-12-05 12:30:37.119 n124-105-156:494064:494508 [4] NCCL INFO Tree Channel 20/0 : 4[4] -> 3[3] via P2P/CUMEM 2024-12-05 12:30:37.119 n124-105-156:494065:494510 [5] NCCL INFO Tree Channel 11/0 : 5[5] -> 4[4] via P2P/CUMEM 2024-12-05 12:30:37.119 n124-105-156:494063:494512 [3] NCCL INFO Tree Channel 23/0 : 3[3] -> 2[2] via P2P/CUMEM 2024-12-05 12:30:37.119 n124-105-156:494066:494509 [6] NCCL INFO Tree Channel 10/0 : 6[6] -> 5[5] via P2P/CUMEM 2024-12-05 12:30:37.120 n124-105-156:494061:494511 [1] NCCL INFO Tree Channel 15/0 : 1[1] -> 0[0] via P2P/CUMEM 2024-12-05 12:30:37.120 n124-105-156:494062:494514 [2] NCCL INFO Tree Channel 10/0 : 2[2] -> 1[1] via P2P/CUMEM 2024-12-05 12:30:37.120 n124-105-156:494064:494508 [4] NCCL INFO Tree Channel 21/0 : 4[4] -> 3[3] via P2P/CUMEM 2024-12-05 12:30:37.120 n124-105-156:494065:494510 [5] NCCL INFO Tree Channel 12/0 : 5[5] -> 4[4] via P2P/CUMEM 2024-12-05 12:30:37.120 n124-105-156:494066:494509 [6] NCCL INFO Tree Channel 11/0 : 6[6] -> 5[5] via P2P/CUMEM 2024-12-05 12:30:37.120 n124-105-156:494061:494511 [1] NCCL INFO Tree Channel 16/0 : 1[1] -> 0[0] via P2P/CUMEM 2024-12-05 12:30:37.120 n124-105-156:494062:494514 [2] NCCL INFO Tree Channel 11/0 : 2[2] -> 1[1] via P2P/CUMEM 2024-12-05 12:30:37.120 n124-105-156:494064:494508 [4] NCCL INFO Tree Channel 22/0 : 4[4] -> 3[3] via P2P/CUMEM 2024-12-05 12:30:37.120 n124-105-156:494065:494510 [5] NCCL INFO Tree Channel 13/0 : 5[5] -> 4[4] via P2P/CUMEM 2024-12-05 12:30:37.120 n124-105-156:494066:494509 [6] NCCL INFO Tree Channel 12/0 : 6[6] -> 5[5] via P2P/CUMEM 2024-12-05 12:30:37.120 n124-105-156:494061:494511 [1] NCCL INFO Tree Channel 17/0 : 1[1] -> 0[0] via P2P/CUMEM 2024-12-05 12:30:37.120 n124-105-156:494062:494514 [2] NCCL INFO Tree Channel 12/0 : 2[2] -> 1[1] via P2P/CUMEM 2024-12-05 12:30:37.120 n124-105-156:494064:494508 [4] NCCL INFO Tree Channel 23/0 : 4[4] -> 3[3] via P2P/CUMEM 2024-12-05 12:30:37.120 n124-105-156:494065:494510 [5] NCCL INFO Tree Channel 14/0 : 5[5] -> 4[4] via P2P/CUMEM 2024-12-05 12:30:37.121 n124-105-156:494066:494509 [6] NCCL INFO Tree Channel 13/0 : 6[6] -> 5[5] via P2P/CUMEM 2024-12-05 12:30:37.121 n124-105-156:494061:494511 [1] NCCL INFO Tree Channel 18/0 : 1[1] -> 0[0] via P2P/CUMEM 2024-12-05 12:30:37.121 n124-105-156:494062:494514 [2] NCCL INFO Tree Channel 13/0 : 2[2] -> 1[1] via P2P/CUMEM 2024-12-05 12:30:37.121 n124-105-156:494065:494510 [5] NCCL INFO Tree Channel 15/0 : 5[5] -> 4[4] via P2P/CUMEM 2024-12-05 12:30:37.121 n124-105-156:494061:494511 [1] NCCL INFO Tree Channel 19/0 : 1[1] -> 0[0] via P2P/CUMEM 2024-12-05 12:30:37.121 n124-105-156:494066:494509 [6] NCCL INFO Tree Channel 14/0 : 6[6] -> 5[5] via P2P/CUMEM 2024-12-05 12:30:37.121 n124-105-156:494062:494514 [2] NCCL INFO Tree Channel 14/0 : 2[2] -> 1[1] via P2P/CUMEM 2024-12-05 12:30:37.121 n124-105-156:494065:494510 [5] NCCL INFO Tree Channel 16/0 : 5[5] -> 4[4] via P2P/CUMEM 2024-12-05 12:30:37.121 n124-105-156:494061:494511 [1] NCCL INFO Tree Channel 20/0 : 1[1] -> 0[0] via P2P/CUMEM 2024-12-05 12:30:37.121 n124-105-156:494066:494509 [6] NCCL INFO Tree Channel 15/0 : 6[6] -> 5[5] via P2P/CUMEM 2024-12-05 12:30:37.121 n124-105-156:494062:494514 [2] NCCL INFO Tree Channel 15/0 : 2[2] -> 1[1] via P2P/CUMEM 2024-12-05 12:30:37.121 n124-105-156:494065:494510 [5] NCCL INFO Tree Channel 17/0 : 5[5] -> 4[4] via P2P/CUMEM 2024-12-05 12:30:37.121 n124-105-156:494061:494511 [1] NCCL INFO Tree Channel 21/0 : 1[1] -> 0[0] via P2P/CUMEM 2024-12-05 12:30:37.122 n124-105-156:494066:494509 [6] NCCL INFO Tree Channel 16/0 : 6[6] -> 5[5] via P2P/CUMEM 2024-12-05 12:30:37.122 n124-105-156:494062:494514 [2] NCCL INFO Tree Channel 16/0 : 2[2] -> 1[1] via P2P/CUMEM 2024-12-05 12:30:37.122 n124-105-156:494065:494510 [5] NCCL INFO Tree Channel 18/0 : 5[5] -> 4[4] via P2P/CUMEM 2024-12-05 12:30:37.122 n124-105-156:494061:494511 [1] NCCL INFO Tree Channel 22/0 : 1[1] -> 0[0] via P2P/CUMEM 2024-12-05 12:30:37.122 n124-105-156:494066:494509 [6] NCCL INFO Tree Channel 17/0 : 6[6] -> 5[5] via P2P/CUMEM 2024-12-05 12:30:37.122 n124-105-156:494062:494514 [2] NCCL INFO Tree Channel 17/0 : 2[2] -> 1[1] via P2P/CUMEM 2024-12-05 12:30:37.122 n124-105-156:494065:494510 [5] NCCL INFO Tree Channel 19/0 : 5[5] -> 4[4] via P2P/CUMEM 2024-12-05 12:30:37.122 n124-105-156:494061:494511 [1] NCCL INFO Tree Channel 23/0 : 1[1] -> 0[0] via P2P/CUMEM 2024-12-05 12:30:37.122 n124-105-156:494066:494509 [6] NCCL INFO Tree Channel 18/0 : 6[6] -> 5[5] via P2P/CUMEM 2024-12-05 12:30:37.122 n124-105-156:494062:494514 [2] NCCL INFO Tree Channel 18/0 : 2[2] -> 1[1] via P2P/CUMEM 2024-12-05 12:30:37.122 n124-105-156:494065:494510 [5] NCCL INFO Tree Channel 20/0 : 5[5] -> 4[4] via P2P/CUMEM 2024-12-05 12:30:37.122 n124-105-156:494066:494509 [6] NCCL INFO Tree Channel 19/0 : 6[6] -> 5[5] via P2P/CUMEM 2024-12-05 12:30:37.122 n124-105-156:494062:494514 [2] NCCL INFO Tree Channel 19/0 : 2[2] -> 1[1] via P2P/CUMEM 2024-12-05 12:30:37.123 n124-105-156:494065:494510 [5] NCCL INFO Tree Channel 21/0 : 5[5] -> 4[4] via P2P/CUMEM 2024-12-05 12:30:37.123 n124-105-156:494066:494509 [6] NCCL INFO Tree Channel 20/0 : 6[6] -> 5[5] via P2P/CUMEM 2024-12-05 12:30:37.123 n124-105-156:494065:494510 [5] NCCL INFO Tree Channel 22/0 : 5[5] -> 4[4] via P2P/CUMEM 2024-12-05 12:30:37.123 n124-105-156:494062:494514 [2] NCCL INFO Tree Channel 20/0 : 2[2] -> 1[1] via P2P/CUMEM 2024-12-05 12:30:37.123 n124-105-156:494066:494509 [6] NCCL INFO Tree Channel 21/0 : 6[6] -> 5[5] via P2P/CUMEM 2024-12-05 12:30:37.123 n124-105-156:494065:494510 [5] NCCL INFO Tree Channel 23/0 : 5[5] -> 4[4] via P2P/CUMEM 2024-12-05 12:30:37.123 n124-105-156:494062:494514 [2] NCCL INFO Tree Channel 21/0 : 2[2] -> 1[1] via P2P/CUMEM 2024-12-05 12:30:37.123 n124-105-156:494066:494509 [6] NCCL INFO Tree Channel 22/0 : 6[6] -> 5[5] via P2P/CUMEM 2024-12-05 12:30:37.123 n124-105-156:494062:494514 [2] NCCL INFO Tree Channel 22/0 : 2[2] -> 1[1] via P2P/CUMEM 2024-12-05 12:30:37.125 n124-105-156:494066:494509 [6] NCCL INFO Tree Channel 23/0 : 6[6] -> 5[5] via P2P/CUMEM 2024-12-05 12:30:37.125 n124-105-156:494062:494514 [2] NCCL INFO Tree Channel 23/0 : 2[2] -> 1[1] via P2P/CUMEM 2024-12-05 12:30:38.048 n124-105-156:494060:494507 [0] NCCL INFO Connected all trees 2024-12-05 12:30:38.049 n124-105-156:494060:494507 [0] NCCL INFO NCCL_PROTO set by environment to Simple 2024-12-05 12:30:38.049 n124-105-156:494060:494507 [0] NCCL INFO NCCL_ALGO set by environment to Ring,Tree 2024-12-05 12:30:38.049 n124-105-156:494060:494507 [0] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 2024-12-05 12:30:38.049 n124-105-156:494060:494507 [0] NCCL INFO 24 coll channels, 24 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer 2024-12-05 12:30:38.130 n124-105-156:494062:494514 [2] NCCL INFO Connected all trees 2024-12-05 12:30:38.131 n124-105-156:494062:494514 [2] NCCL INFO NCCL_PROTO set by environment to Simple 2024-12-05 12:30:38.131 n124-105-156:494062:494514 [2] NCCL INFO NCCL_ALGO set by environment to Ring,Tree 2024-12-05 12:30:38.131 n124-105-156:494062:494514 [2] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 2024-12-05 12:30:38.131 n124-105-156:494062:494514 [2] NCCL INFO 24 coll channels, 24 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer 2024-12-05 12:30:38.132 n124-105-156:494061:494511 [1] NCCL INFO Connected all trees 2024-12-05 12:30:38.132 n124-105-156:494061:494511 [1] NCCL INFO NCCL_PROTO set by environment to Simple 2024-12-05 12:30:38.132 n124-105-156:494061:494511 [1] NCCL INFO NCCL_ALGO set by environment to Ring,Tree 2024-12-05 12:30:38.132 n124-105-156:494061:494511 [1] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 2024-12-05 12:30:38.132 n124-105-156:494061:494511 [1] NCCL INFO 24 coll channels, 24 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer 2024-12-05 12:30:38.154 n124-105-156:494063:494512 [3] NCCL INFO Connected all trees 2024-12-05 12:30:38.154 n124-105-156:494063:494512 [3] NCCL INFO NCCL_PROTO set by environment to Simple 2024-12-05 12:30:38.154 n124-105-156:494063:494512 [3] NCCL INFO NCCL_ALGO set by environment to Ring,Tree 2024-12-05 12:30:38.154 n124-105-156:494063:494512 [3] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 2024-12-05 12:30:38.154 n124-105-156:494063:494512 [3] NCCL INFO 24 coll channels, 24 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer 2024-12-05 12:30:38.172 n124-105-156:494067:494513 [7] NCCL INFO Connected all trees 2024-12-05 12:30:38.172 n124-105-156:494067:494513 [7] NCCL INFO NCCL_PROTO set by environment to Simple 2024-12-05 12:30:38.172 n124-105-156:494067:494513 [7] NCCL INFO NCCL_ALGO set by environment to Ring,Tree 2024-12-05 12:30:38.172 n124-105-156:494067:494513 [7] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 2024-12-05 12:30:38.172 n124-105-156:494067:494513 [7] NCCL INFO 24 coll channels, 24 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer 2024-12-05 12:30:38.191 n124-105-156:494064:494508 [4] NCCL INFO Connected all trees 2024-12-05 12:30:38.191 n124-105-156:494064:494508 [4] NCCL INFO NCCL_PROTO set by environment to Simple 2024-12-05 12:30:38.191 n124-105-156:494064:494508 [4] NCCL INFO NCCL_ALGO set by environment to Ring,Tree 2024-12-05 12:30:38.191 n124-105-156:494064:494508 [4] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 2024-12-05 12:30:38.191 n124-105-156:494064:494508 [4] NCCL INFO 24 coll channels, 24 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer 2024-12-05 12:30:38.191 n124-105-156:494065:494510 [5] NCCL INFO Connected all trees 2024-12-05 12:30:38.191 n124-105-156:494065:494510 [5] NCCL INFO NCCL_PROTO set by environment to Simple 2024-12-05 12:30:38.191 n124-105-156:494066:494509 [6] NCCL INFO Connected all trees 2024-12-05 12:30:38.191 n124-105-156:494065:494510 [5] NCCL INFO NCCL_ALGO set by environment to Ring,Tree 2024-12-05 12:30:38.191 n124-105-156:494065:494510 [5] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 2024-12-05 12:30:38.191 n124-105-156:494065:494510 [5] NCCL INFO 24 coll channels, 24 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer 2024-12-05 12:30:38.192 n124-105-156:494066:494509 [6] NCCL INFO NCCL_PROTO set by environment to Simple 2024-12-05 12:30:38.192 n124-105-156:494066:494509 [6] NCCL INFO NCCL_ALGO set by environment to Ring,Tree 2024-12-05 12:30:38.192 n124-105-156:494066:494509 [6] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 2024-12-05 12:30:38.192 n124-105-156:494066:494509 [6] NCCL INFO 24 coll channels, 24 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer 2024-12-05 12:30:38.216 n124-105-156:494066:494509 [6] NCCL INFO TUNER/Plugin: NCCL_TUNER_PLUGIN set to libnccl-tuner.so 2024-12-05 12:30:38.216 n124-105-156:494061:494511 [1] NCCL INFO TUNER/Plugin: NCCL_TUNER_PLUGIN set to libnccl-tuner.so 2024-12-05 12:30:38.216 n124-105-156:494065:494510 [5] NCCL INFO TUNER/Plugin: NCCL_TUNER_PLUGIN set to libnccl-tuner.so 2024-12-05 12:30:38.216 n124-105-156:494067:494513 [7] NCCL INFO TUNER/Plugin: NCCL_TUNER_PLUGIN set to libnccl-tuner.so 2024-12-05 12:30:38.216 n124-105-156:494062:494514 [2] NCCL INFO TUNER/Plugin: NCCL_TUNER_PLUGIN set to libnccl-tuner.so 2024-12-05 12:30:38.216 n124-105-156:494063:494512 [3] NCCL INFO TUNER/Plugin: NCCL_TUNER_PLUGIN set to libnccl-tuner.so 2024-12-05 12:30:38.216 n124-105-156:494060:494507 [0] NCCL INFO TUNER/Plugin: NCCL_TUNER_PLUGIN set to libnccl-tuner.so 2024-12-05 12:30:38.216 n124-105-156:494064:494508 [4] NCCL INFO TUNER/Plugin: NCCL_TUNER_PLUGIN set to libnccl-tuner.so 2024-12-05 12:30:38.216 n124-105-156:494066:494509 [6] NCCL INFO TUNER/Plugin: Most recent plugin load returned 11 : libnccl-net-gcp-fastrak.so: cannot open shared object file: No such file or directory. All attempts to load 'libnccl-tuner.so libnccl-tuner-libnccl-tuner.so.so gcp-fastrak' also failed. 2024-12-05 12:30:38.216 n124-105-156:494066:494509 [6] NCCL INFO TUNER/Plugin: Using internal tuner plugin. 2024-12-05 12:30:38.216 n124-105-156:494066:494509 [6] NCCL INFO ncclCommInitRank comm 0xe035b70 rank 6 nranks 8 cudaDev 6 nvmlDev 6 busId 8b000 commId 0x6d6ed6148153dfe0 - Init COMPLETE 2024-12-05 12:30:38.216 n124-105-156:494065:494510 [5] NCCL INFO TUNER/Plugin: Most recent plugin load returned 11 : libnccl-net-gcp-fastrak.so: cannot open shared object file: No such file or directory. All attempts to load 'libnccl-tuner.so libnccl-tuner-libnccl-tuner.so.so gcp-fastrak' also failed. 2024-12-05 12:30:38.216 n124-105-156:494065:494510 [5] NCCL INFO TUNER/Plugin: Using internal tuner plugin. 2024-12-05 12:30:38.216 n124-105-156:494065:494510 [5] NCCL INFO ncclCommInitRank comm 0xe11f9c0 rank 5 nranks 8 cudaDev 5 nvmlDev 5 busId 85000 commId 0x6d6ed6148153dfe0 - Init COMPLETE 2024-12-05 12:30:38.216 n124-105-156:494064:494508 [4] NCCL INFO TUNER/Plugin: Most recent plugin load returned 11 : libnccl-net-gcp-fastrak.so: cannot open shared object file: No such file or directory. All attempts to load 'libnccl-tuner.so libnccl-tuner-libnccl-tuner.so.so gcp-fastrak' also failed. 2024-12-05 12:30:38.216 n124-105-156:494067:494513 [7] NCCL INFO TUNER/Plugin: Most recent plugin load returned 11 : libnccl-net-gcp-fastrak.so: cannot open shared object file: No such file or directory. All attempts to load 'libnccl-tuner.so libnccl-tuner-libnccl-tuner.so.so gcp-fastrak' also failed. 2024-12-05 12:30:38.217 n124-105-156:494064:494508 [4] NCCL INFO TUNER/Plugin: Using internal tuner plugin. 2024-12-05 12:30:38.217 n124-105-156:494067:494513 [7] NCCL INFO TUNER/Plugin: Using internal tuner plugin. 2024-12-05 12:30:38.217 n124-105-156:494064:494508 [4] NCCL INFO ncclCommInitRank comm 0xdf02040 rank 4 nranks 8 cudaDev 4 nvmlDev 4 busId 84000 commId 0x6d6ed6148153dfe0 - Init COMPLETE 2024-12-05 12:30:38.217 n124-105-156:494067:494513 [7] NCCL INFO ncclCommInitRank comm 0xf8d09c0 rank 7 nranks 8 cudaDev 7 nvmlDev 7 busId 8c000 commId 0x6d6ed6148153dfe0 - Init COMPLETE 2024-12-05 12:30:38.217 n124-105-156:494062:494514 [2] NCCL INFO TUNER/Plugin: Most recent plugin load returned 11 : libnccl-net-gcp-fastrak.so: cannot open shared object file: No such file or directory. All attempts to load 'libnccl-tuner.so libnccl-tuner-libnccl-tuner.so.so gcp-fastrak' also failed. 2024-12-05 12:30:38.217 n124-105-156:494063:494512 [3] NCCL INFO TUNER/Plugin: Most recent plugin load returned 11 : libnccl-net-gcp-fastrak.so: cannot open shared object file: No such file or directory. All attempts to load 'libnccl-tuner.so libnccl-tuner-libnccl-tuner.so.so gcp-fastrak' also failed. 2024-12-05 12:30:38.217 n124-105-156:494063:494512 [3] NCCL INFO TUNER/Plugin: Using internal tuner plugin. 2024-12-05 12:30:38.217 n124-105-156:494063:494512 [3] NCCL INFO ncclCommInitRank comm 0xf68bd60 rank 3 nranks 8 cudaDev 3 nvmlDev 3 busId c000 commId 0x6d6ed6148153dfe0 - Init COMPLETE 2024-12-05 12:30:38.217 n124-105-156:494062:494514 [2] NCCL INFO TUNER/Plugin: Using internal tuner plugin. 2024-12-05 12:30:38.217 n124-105-156:494060:494507 [0] NCCL INFO TUNER/Plugin: Most recent plugin load returned 11 : libnccl-net-gcp-fastrak.so: cannot open shared object file: No such file or directory. All attempts to load 'libnccl-tuner.so libnccl-tuner-libnccl-tuner.so.so gcp-fastrak' also failed. 2024-12-05 12:30:38.217 n124-105-156:494062:494514 [2] NCCL INFO ncclCommInitRank comm 0xee8a820 rank 2 nranks 8 cudaDev 2 nvmlDev 2 busId b000 commId 0x6d6ed6148153dfe0 - Init COMPLETE 2024-12-05 12:30:38.217 n124-105-156:494061:494511 [1] NCCL INFO TUNER/Plugin: Most recent plugin load returned 11 : libnccl-net-gcp-fastrak.so: cannot open shared object file: No such file or directory. All attempts to load 'libnccl-tuner.so libnccl-tuner-libnccl-tuner.so.so gcp-fastrak' also failed. 2024-12-05 12:30:38.217 n124-105-156:494060:494507 [0] NCCL INFO TUNER/Plugin: Using internal tuner plugin. 2024-12-05 12:30:38.217 n124-105-156:494061:494511 [1] NCCL INFO TUNER/Plugin: Using internal tuner plugin. 2024-12-05 12:30:38.217 n124-105-156:494060:494507 [0] NCCL INFO ncclCommInitRank comm 0xdebf210 rank 0 nranks 8 cudaDev 0 nvmlDev 0 busId 4000 commId 0x6d6ed6148153dfe0 - Init COMPLETE 2024-12-05 12:30:38.217 n124-105-156:494061:494511 [1] NCCL INFO ncclCommInitRank comm 0xf829c20 rank 1 nranks 8 cudaDev 1 nvmlDev 1 busId 5000 commId 0x6d6ed6148153dfe0 - Init COMPLETE [I ProcessGroupNCCL.cpp:1805] NCCL_DEBUG: INFO /usr/local/lib/python3.9/dist-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. warnings.warn( /usr/local/lib/python3.9/dist-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. warnings.warn( /usr/local/lib/python3.9/dist-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. warnings.warn( /usr/local/lib/python3.9/dist-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. warnings.warn( /usr/local/lib/python3.9/dist-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. warnings.warn( /usr/local/lib/python3.9/dist-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. warnings.warn( /usr/local/lib/python3.9/dist-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. warnings.warn( /usr/local/lib/python3.9/dist-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. warnings.warn( [2024-12-05 12:30:40,847] [INFO] [partition_parameters.py:347:__exit__] finished initializing model - num_params = 687, num_elems = 8.35B Loading checkpoint shards: 0%| | 0/4 [00:00<|begin_of_text|><|start_header_id|>system<|end_header_id|> You are a helpful language and vision assistant. You are able to understand the visual content that the user provides, and assist the user with a variety of tasks using natural language.<|eot_id|><|start_header_id|>user<|end_header_id|> <|end_of_text|> What precautions should be taken when handling the items on the counter?<|eot_id|><|start_header_id|>assistant<|end_header_id|> When handling the wine bottles and other items on the counter, several precautions should be taken to ensure safety and to prevent any unintentional damages. Firstly, it is essential to secure a firm and non-slip grip when picking up the wine bottles, as they can be slippery and shatter upon impact if accidentally dropped. One should also make sure the bottles are placed on a stable and flat surface to prevent them from rolling or falling. Secondly, considering there's a knife on the counter, always handle it carefully by gripping the handle rather than the blade to avoid injuries. Additionally, when placing the knife, it's best to position it with the sharp edge pointing away from the edge of the counter to minimize accidental cuts or injuries. When dealing with wine glasses (if any), handle them with care, as they may be delicate and can easily break if not managed properly. Lastly, pay attention to items such as corkscrews, as they have sharp components that can cause injuries if not handled appropriately.<|eot_id|> ====================seperator==================== <|begin_of_text|><|begin_of_text|><|start_header_id|>system<|end_header_id|> You are a helpful language and vision assistant. You are able to understand the visual content that the user provides, and assist the user with a variety of tasks using natural language.<|eot_id|><|start_header_id|>user<|end_header_id|> <|end_of_text|> Provide a one-sentence caption for the provided image. Reference OCR token: Consistent, low, price, Consistent, low, price, PerfectMeatsforBraisin, Braising, good, through, 2011!, good, through, 2011!, Pork, Shoulder, Pork, Tenderloin, low, 1.29, 2.99, prices!, t<|eot_id|><|start_header_id|>assistant<|end_header_id|> In this section of the grocery store one can purchase Pork Tenderloin at $2.99.<|eot_id|> ====================seperator==================== <|begin_of_text|><|begin_of_text|><|start_header_id|>system<|end_header_id|> You are a helpful language and vision assistant. You are able to understand the visual content that the user provides, and assist the user with a variety of tasks using natural language.<|eot_id|><|start_header_id|>user<|end_header_id|> <|end_of_text|> What breed of cat is on the table? Answer the question using a single word or phrase.<|eot_id|><|start_header_id|>assistant<|end_header_id|> None<|eot_id|><|start_header_id|>user<|end_header_id|> How many people are in the picture?<|eot_id|><|start_header_id|>assistant<|end_header_id|> 4<|eot_id|><|start_header_id|>user<|end_header_id|> Where can you find this skatepark online?<|eot_id|><|start_header_id|>assistant<|end_header_id|> Wwwlehangar-skateparkcom<|eot_id|> ====================seperator==================== [I ProcessGroupNCCL.cpp:1139] [Rank 0] ProcessGroupNCCL initialization options:NCCL_ASYNC_ERROR_HANDLING: 1, NCCL_DESYNC_DEBUG: 0, NCCL_ENABLE_TIMING: 0, NCCL_BLOCKING_WAIT: 0, TIMEOUT(ms): 1800000, USE_HIGH_PRIORITY_STREAM: 0, TORCH_DISTRIBUTED_DEBUG: INFO, NCCL_DEBUG: INFO, ID=565548736 Parameter Offload: Total persistent parameters: 603136 in 313 params wandb: ⭐️ View project at https://ml.byteintl.net/experiment/tracking/detail?Id=project_20241204_bc91a3c4 wandb: 🚀 View run at https://ml.byteintl.net/experiment/tracking/detail?Id=project_20241204_bc91a3c4&selectedTrial=run_20241205_7a25086f wandb: - Waiting for wandb.init()... wandb: \ Waiting for wandb.init()... wandb: | Waiting for wandb.init()... wandb: Tracking run with wandb version 0.13.74 wandb: Run data is saved locally in /mnt/bn/liangkeg/ruohongz/LLaVA-Reasoner-DPO/llava_reasoner/wandb/run-20241205_123315-run_20241205_7a25086f wandb: Run `wandb offline` to turn off syncing. 0%| | 0/12313 [00:00 0 -> 1/-1/-1 2024-12-05 12:33:31.962 n124-105-156:494060:500859 [0] NCCL INFO Tree 12 : -1 -> 0 -> 1/-1/-1 2024-12-05 12:33:31.962 n124-105-156:494060:500859 [0] NCCL INFO Tree 1 : -1 -> 0 -> 1/-1/-1 2024-12-05 12:33:31.962 n124-105-156:494063:500861 [3] NCCL INFO comm 0x7fd2f00807b0 rank 3 nRanks 8 nNodes 1 localRanks 8 localRank 3 MNNVL 0 2024-12-05 12:33:31.962 n124-105-156:494066:500860 [6] NCCL INFO Ring 00 : 5 -> 6 -> 7 2024-12-05 12:33:31.962 n124-105-156:494067:500865 [7] NCCL INFO Ring 00 : 6 -> 7 -> 0 2024-12-05 12:33:31.962 n124-105-156:494065:500863 [5] NCCL INFO Ring 00 : 4 -> 5 -> 6 2024-12-05 12:33:31.962 n124-105-156:494064:500866 [4] NCCL INFO Ring 00 : 3 -> 4 -> 5 2024-12-05 12:33:31.962 n124-105-156:494066:500860 [6] NCCL INFO Ring 01 : 5 -> 6 -> 7 2024-12-05 12:33:31.962 n124-105-156:494062:500862 [2] NCCL INFO comm 0x7fca94080730 rank 2 nRanks 8 nNodes 1 localRanks 8 localRank 2 MNNVL 0 2024-12-05 12:33:31.962 n124-105-156:494065:500863 [5] NCCL INFO Ring 01 : 4 -> 5 -> 6 2024-12-05 12:33:31.962 n124-105-156:494066:500860 [6] NCCL INFO Ring 02 : 5 -> 6 -> 7 2024-12-05 12:33:31.962 n124-105-156:494061:500864 [1] NCCL INFO comm 0x7fae440807b0 rank 1 nRanks 8 nNodes 1 localRanks 8 localRank 1 MNNVL 0 2024-12-05 12:33:31.962 n124-105-156:494060:500859 [0] NCCL INFO Tree 13 : -1 -> 0 -> 1/-1/-1 2024-12-05 12:33:31.962 n124-105-156:494065:500863 [5] NCCL INFO Ring 02 : 4 -> 5 -> 6 2024-12-05 12:33:31.962 n124-105-156:494066:500860 [6] NCCL INFO Ring 03 : 5 -> 6 -> 7 2024-12-05 12:33:31.962 n124-105-156:494067:500865 [7] NCCL INFO Ring 01 : 6 -> 7 -> 0 2024-12-05 12:33:31.962 n124-105-156:494064:500866 [4] NCCL INFO Ring 01 : 3 -> 4 -> 5 2024-12-05 12:33:31.962 n124-105-156:494060:500859 [0] NCCL INFO Tree 2 : -1 -> 0 -> 1/-1/-1 2024-12-05 12:33:31.962 n124-105-156:494065:500863 [5] NCCL INFO Ring 03 : 4 -> 5 -> 6 2024-12-05 12:33:31.962 n124-105-156:494064:500866 [4] NCCL INFO Ring 02 : 3 -> 4 -> 5 2024-12-05 12:33:31.962 n124-105-156:494060:500859 [0] NCCL INFO Tree 14 : -1 -> 0 -> 1/-1/-1 2024-12-05 12:33:31.962 n124-105-156:494066:500860 [6] NCCL INFO Ring 04 : 5 -> 6 -> 7 2024-12-05 12:33:31.962 n124-105-156:494064:500866 [4] NCCL INFO Ring 03 : 3 -> 4 -> 5 2024-12-05 12:33:31.962 n124-105-156:494067:500865 [7] NCCL INFO Ring 02 : 6 -> 7 -> 0 2024-12-05 12:33:31.962 n124-105-156:494063:500861 [3] NCCL INFO Ring 00 : 2 -> 3 -> 4 2024-12-05 12:33:31.962 n124-105-156:494061:500864 [1] NCCL INFO Tree 0 : 0 -> 1 -> 2/-1/-1 2024-12-05 12:33:31.962 n124-105-156:494065:500863 [5] NCCL INFO Ring 04 : 4 -> 5 -> 6 2024-12-05 12:33:31.962 n124-105-156:494060:500859 [0] NCCL INFO Tree 3 : -1 -> 0 -> 1/-1/-1 2024-12-05 12:33:31.962 n124-105-156:494062:500862 [2] NCCL INFO Ring 00 : 1 -> 2 -> 3 2024-12-05 12:33:31.962 n124-105-156:494064:500866 [4] NCCL INFO Ring 04 : 3 -> 4 -> 5 2024-12-05 12:33:31.962 n124-105-156:494066:500860 [6] NCCL INFO Ring 05 : 5 -> 6 -> 7 2024-12-05 12:33:31.962 n124-105-156:494067:500865 [7] NCCL INFO Ring 03 : 6 -> 7 -> 0 2024-12-05 12:33:31.962 n124-105-156:494063:500861 [3] NCCL INFO Ring 01 : 2 -> 3 -> 4 2024-12-05 12:33:31.962 n124-105-156:494065:500863 [5] NCCL INFO Ring 05 : 4 -> 5 -> 6 2024-12-05 12:33:31.962 n124-105-156:494066:500860 [6] NCCL INFO Ring 06 : 5 -> 6 -> 7 2024-12-05 12:33:31.962 n124-105-156:494060:500859 [0] NCCL INFO Tree 15 : -1 -> 0 -> 1/-1/-1 2024-12-05 12:33:31.962 n124-105-156:494065:500863 [5] NCCL INFO Ring 06 : 4 -> 5 -> 6 2024-12-05 12:33:31.962 n124-105-156:494061:500864 [1] NCCL INFO Tree 12 : 0 -> 1 -> 2/-1/-1 2024-12-05 12:33:31.962 n124-105-156:494060:500859 [0] NCCL INFO Tree 4 : -1 -> 0 -> 1/-1/-1 2024-12-05 12:33:31.962 n124-105-156:494062:500862 [2] NCCL INFO Ring 01 : 1 -> 2 -> 3 2024-12-05 12:33:31.962 n124-105-156:494064:500866 [4] NCCL INFO Ring 05 : 3 -> 4 -> 5 2024-12-05 12:33:31.962 n124-105-156:494062:500862 [2] NCCL INFO Ring 02 : 1 -> 2 -> 3 2024-12-05 12:33:31.962 n124-105-156:494067:500865 [7] NCCL INFO Ring 04 : 6 -> 7 -> 0 2024-12-05 12:33:31.962 n124-105-156:494063:500861 [3] NCCL INFO Ring 02 : 2 -> 3 -> 4 2024-12-05 12:33:31.962 n124-105-156:494062:500862 [2] NCCL INFO Ring 03 : 1 -> 2 -> 3 2024-12-05 12:33:31.962 n124-105-156:494066:500860 [6] NCCL INFO Ring 07 : 5 -> 6 -> 7 2024-12-05 12:33:31.962 n124-105-156:494065:500863 [5] NCCL INFO Ring 07 : 4 -> 5 -> 6 2024-12-05 12:33:31.962 n124-105-156:494063:500861 [3] NCCL INFO Ring 03 : 2 -> 3 -> 4 2024-12-05 12:33:31.962 n124-105-156:494062:500862 [2] NCCL INFO Ring 04 : 1 -> 2 -> 3 2024-12-05 12:33:31.962 n124-105-156:494060:500859 [0] NCCL INFO Tree 16 : -1 -> 0 -> 1/-1/-1 2024-12-05 12:33:31.962 n124-105-156:494066:500860 [6] NCCL INFO Ring 08 : 5 -> 6 -> 7 2024-12-05 12:33:31.962 n124-105-156:494061:500864 [1] NCCL INFO Tree 1 : 0 -> 1 -> 2/-1/-1 2024-12-05 12:33:31.962 n124-105-156:494063:500861 [3] NCCL INFO Ring 04 : 2 -> 3 -> 4 2024-12-05 12:33:31.962 n124-105-156:494062:500862 [2] NCCL INFO Ring 05 : 1 -> 2 -> 3 2024-12-05 12:33:31.962 n124-105-156:494066:500860 [6] NCCL INFO Ring 09 : 5 -> 6 -> 7 2024-12-05 12:33:31.962 n124-105-156:494064:500866 [4] NCCL INFO Ring 06 : 3 -> 4 -> 5 2024-12-05 12:33:31.962 n124-105-156:494067:500865 [7] NCCL INFO Ring 05 : 6 -> 7 -> 0 2024-12-05 12:33:31.962 n124-105-156:494065:500863 [5] NCCL INFO Ring 08 : 4 -> 5 -> 6 2024-12-05 12:33:31.962 n124-105-156:494060:500859 [0] NCCL INFO Tree 5 : -1 -> 0 -> 1/-1/-1 2024-12-05 12:33:31.962 n124-105-156:494061:500864 [1] NCCL INFO Tree 13 : 0 -> 1 -> 2/-1/-1 2024-12-05 12:33:31.962 n124-105-156:494066:500860 [6] NCCL INFO Ring 10 : 5 -> 6 -> 7 2024-12-05 12:33:31.962 n124-105-156:494060:500859 [0] NCCL INFO Tree 17 : -1 -> 0 -> 1/-1/-1 2024-12-05 12:33:31.962 n124-105-156:494064:500866 [4] NCCL INFO Ring 07 : 3 -> 4 -> 5 2024-12-05 12:33:31.962 n124-105-156:494066:500860 [6] NCCL INFO Ring 11 : 5 -> 6 -> 7 2024-12-05 12:33:31.962 n124-105-156:494062:500862 [2] NCCL INFO Ring 06 : 1 -> 2 -> 3 2024-12-05 12:33:31.962 n124-105-156:494063:500861 [3] NCCL INFO Ring 05 : 2 -> 3 -> 4 2024-12-05 12:33:31.962 n124-105-156:494064:500866 [4] NCCL INFO Ring 08 : 3 -> 4 -> 5 2024-12-05 12:33:31.962 n124-105-156:494067:500865 [7] NCCL INFO Ring 06 : 6 -> 7 -> 0 2024-12-05 12:33:31.962 n124-105-156:494065:500863 [5] NCCL INFO Ring 09 : 4 -> 5 -> 6 2024-12-05 12:33:31.962 n124-105-156:494063:500861 [3] NCCL INFO Ring 06 : 2 -> 3 -> 4 2024-12-05 12:33:31.962 n124-105-156:494061:500864 [1] NCCL INFO Tree 2 : 0 -> 1 -> 2/-1/-1 2024-12-05 12:33:31.962 n124-105-156:494060:500859 [0] NCCL INFO Tree 6 : -1 -> 0 -> 1/-1/-1 2024-12-05 12:33:31.962 n124-105-156:494067:500865 [7] NCCL INFO Ring 07 : 6 -> 7 -> 0 2024-12-05 12:33:31.962 n124-105-156:494063:500861 [3] NCCL INFO Ring 07 : 2 -> 3 -> 4 2024-12-05 12:33:31.962 n124-105-156:494060:500859 [0] NCCL INFO Tree 18 : -1 -> 0 -> 1/-1/-1 2024-12-05 12:33:31.962 n124-105-156:494066:500860 [6] NCCL INFO Ring 12 : 5 -> 6 -> 7 2024-12-05 12:33:31.962 n124-105-156:494062:500862 [2] NCCL INFO Ring 07 : 1 -> 2 -> 3 2024-12-05 12:33:31.962 n124-105-156:494060:500859 [0] NCCL INFO Tree 7 : -1 -> 0 -> 1/-1/-1 2024-12-05 12:33:31.962 n124-105-156:494064:500866 [4] NCCL INFO Ring 09 : 3 -> 4 -> 5 2024-12-05 12:33:31.962 n124-105-156:494065:500863 [5] NCCL INFO Ring 10 : 4 -> 5 -> 6 2024-12-05 12:33:31.962 n124-105-156:494061:500864 [1] NCCL INFO Tree 14 : 0 -> 1 -> 2/-1/-1 2024-12-05 12:33:31.962 n124-105-156:494067:500865 [7] NCCL INFO Ring 08 : 6 -> 7 -> 0 2024-12-05 12:33:31.962 n124-105-156:494063:500861 [3] NCCL INFO Ring 08 : 2 -> 3 -> 4 2024-12-05 12:33:31.962 n124-105-156:494066:500860 [6] NCCL INFO Ring 13 : 5 -> 6 -> 7 2024-12-05 12:33:31.962 n124-105-156:494060:500859 [0] NCCL INFO Tree 19 : -1 -> 0 -> 1/-1/-1 2024-12-05 12:33:31.962 n124-105-156:494062:500862 [2] NCCL INFO Ring 08 : 1 -> 2 -> 3 2024-12-05 12:33:31.962 n124-105-156:494064:500866 [4] NCCL INFO Ring 10 : 3 -> 4 -> 5 2024-12-05 12:33:31.962 n124-105-156:494065:500863 [5] NCCL INFO Ring 11 : 4 -> 5 -> 6 2024-12-05 12:33:31.962 n124-105-156:494067:500865 [7] NCCL INFO Ring 09 : 6 -> 7 -> 0 2024-12-05 12:33:31.962 n124-105-156:494061:500864 [1] NCCL INFO Tree 3 : 0 -> 1 -> 2/-1/-1 2024-12-05 12:33:31.962 n124-105-156:494063:500861 [3] NCCL INFO Ring 09 : 2 -> 3 -> 4 2024-12-05 12:33:31.962 n124-105-156:494060:500859 [0] NCCL INFO Tree 8 : -1 -> 0 -> 1/-1/-1 2024-12-05 12:33:31.962 n124-105-156:494066:500860 [6] NCCL INFO Ring 14 : 5 -> 6 -> 7 2024-12-05 12:33:31.962 n124-105-156:494062:500862 [2] NCCL INFO Ring 09 : 1 -> 2 -> 3 2024-12-05 12:33:31.962 n124-105-156:494064:500866 [4] NCCL INFO Ring 11 : 3 -> 4 -> 5 2024-12-05 12:33:31.962 n124-105-156:494065:500863 [5] NCCL INFO Ring 12 : 4 -> 5 -> 6 2024-12-05 12:33:31.962 n124-105-156:494063:500861 [3] NCCL INFO Ring 10 : 2 -> 3 -> 4 2024-12-05 12:33:31.962 n124-105-156:494067:500865 [7] NCCL INFO Ring 10 : 6 -> 7 -> 0 2024-12-05 12:33:31.962 n124-105-156:494064:500866 [4] NCCL INFO Ring 12 : 3 -> 4 -> 5 2024-12-05 12:33:31.962 n124-105-156:494061:500864 [1] NCCL INFO Tree 15 : 0 -> 1 -> 2/-1/-1 2024-12-05 12:33:31.962 n124-105-156:494063:500861 [3] NCCL INFO Ring 11 : 2 -> 3 -> 4 2024-12-05 12:33:31.962 n124-105-156:494066:500860 [6] NCCL INFO Ring 15 : 5 -> 6 -> 7 2024-12-05 12:33:31.962 n124-105-156:494060:500859 [0] NCCL INFO Tree 20 : -1 -> 0 -> 1/-1/-1 2024-12-05 12:33:31.962 n124-105-156:494062:500862 [2] NCCL INFO Ring 10 : 1 -> 2 -> 3 2024-12-05 12:33:31.962 n124-105-156:494063:500861 [3] NCCL INFO Ring 12 : 2 -> 3 -> 4 2024-12-05 12:33:31.962 n124-105-156:494060:500859 [0] NCCL INFO Tree 9 : -1 -> 0 -> 1/-1/-1 2024-12-05 12:33:31.962 n124-105-156:494065:500863 [5] NCCL INFO Ring 13 : 4 -> 5 -> 6 2024-12-05 12:33:31.962 n124-105-156:494067:500865 [7] NCCL INFO Ring 11 : 6 -> 7 -> 0 2024-12-05 12:33:31.962 n124-105-156:494064:500866 [4] NCCL INFO Ring 13 : 3 -> 4 -> 5 2024-12-05 12:33:31.962 n124-105-156:494061:500864 [1] NCCL INFO Tree 4 : 0 -> 1 -> 2/-1/-1 2024-12-05 12:33:31.962 n124-105-156:494066:500860 [6] NCCL INFO Ring 16 : 5 -> 6 -> 7 2024-12-05 12:33:31.962 n124-105-156:494062:500862 [2] NCCL INFO Ring 11 : 1 -> 2 -> 3 2024-12-05 12:33:31.962 n124-105-156:494060:500859 [0] NCCL INFO Tree 21 : -1 -> 0 -> 1/-1/-1 2024-12-05 12:33:31.962 n124-105-156:494063:500861 [3] NCCL INFO Ring 13 : 2 -> 3 -> 4 2024-12-05 12:33:31.962 n124-105-156:494065:500863 [5] NCCL INFO Ring 14 : 4 -> 5 -> 6 2024-12-05 12:33:31.962 n124-105-156:494067:500865 [7] NCCL INFO Ring 12 : 6 -> 7 -> 0 2024-12-05 12:33:31.962 n124-105-156:494063:500861 [3] NCCL INFO Ring 14 : 2 -> 3 -> 4 2024-12-05 12:33:31.962 n124-105-156:494065:500863 [5] NCCL INFO Ring 15 : 4 -> 5 -> 6 2024-12-05 12:33:31.962 n124-105-156:494067:500865 [7] NCCL INFO Ring 13 : 6 -> 7 -> 0 2024-12-05 12:33:31.962 n124-105-156:494063:500861 [3] NCCL INFO Ring 15 : 2 -> 3 -> 4 2024-12-05 12:33:31.962 n124-105-156:494064:500866 [4] NCCL INFO Ring 14 : 3 -> 4 -> 5 2024-12-05 12:33:31.962 n124-105-156:494066:500860 [6] NCCL INFO Ring 17 : 5 -> 6 -> 7 2024-12-05 12:33:31.962 n124-105-156:494061:500864 [1] NCCL INFO Tree 16 : 0 -> 1 -> 2/-1/-1 2024-12-05 12:33:31.962 n124-105-156:494060:500859 [0] NCCL INFO Tree 10 : -1 -> 0 -> 1/-1/-1 2024-12-05 12:33:31.962 n124-105-156:494066:500860 [6] NCCL INFO Ring 18 : 5 -> 6 -> 7 2024-12-05 12:33:31.962 n124-105-156:494062:500862 [2] NCCL INFO Ring 12 : 1 -> 2 -> 3 2024-12-05 12:33:31.962 n124-105-156:494065:500863 [5] NCCL INFO Ring 16 : 4 -> 5 -> 6 2024-12-05 12:33:31.962 n124-105-156:494067:500865 [7] NCCL INFO Ring 14 : 6 -> 7 -> 0 2024-12-05 12:33:31.962 n124-105-156:494063:500861 [3] NCCL INFO Ring 16 : 2 -> 3 -> 4 2024-12-05 12:33:31.962 n124-105-156:494064:500866 [4] NCCL INFO Ring 15 : 3 -> 4 -> 5 2024-12-05 12:33:31.962 n124-105-156:494060:500859 [0] NCCL INFO Tree 22 : -1 -> 0 -> 1/-1/-1 2024-12-05 12:33:31.962 n124-105-156:494063:500861 [3] NCCL INFO Ring 17 : 2 -> 3 -> 4 2024-12-05 12:33:31.962 n124-105-156:494066:500860 [6] NCCL INFO Ring 19 : 5 -> 6 -> 7 2024-12-05 12:33:31.962 n124-105-156:494061:500864 [1] NCCL INFO Tree 5 : 0 -> 1 -> 2/-1/-1 2024-12-05 12:33:31.962 n124-105-156:494062:500862 [2] NCCL INFO Ring 13 : 1 -> 2 -> 3 2024-12-05 12:33:31.962 n124-105-156:494063:500861 [3] NCCL INFO Ring 18 : 2 -> 3 -> 4 2024-12-05 12:33:31.962 n124-105-156:494065:500863 [5] NCCL INFO Ring 17 : 4 -> 5 -> 6 2024-12-05 12:33:31.962 n124-105-156:494067:500865 [7] NCCL INFO Ring 15 : 6 -> 7 -> 0 2024-12-05 12:33:31.962 n124-105-156:494064:500866 [4] NCCL INFO Ring 16 : 3 -> 4 -> 5 2024-12-05 12:33:31.962 n124-105-156:494065:500863 [5] NCCL INFO Ring 18 : 4 -> 5 -> 6 2024-12-05 12:33:31.962 n124-105-156:494060:500859 [0] NCCL INFO Tree 11 : -1 -> 0 -> 1/-1/-1 2024-12-05 12:33:31.962 n124-105-156:494066:500860 [6] NCCL INFO Ring 20 : 5 -> 6 -> 7 2024-12-05 12:33:31.962 n124-105-156:494062:500862 [2] NCCL INFO Ring 14 : 1 -> 2 -> 3 2024-12-05 12:33:31.962 n124-105-156:494063:500861 [3] NCCL INFO Ring 19 : 2 -> 3 -> 4 2024-12-05 12:33:31.962 n124-105-156:494061:500864 [1] NCCL INFO Tree 17 : 0 -> 1 -> 2/-1/-1 2024-12-05 12:33:31.962 n124-105-156:494067:500865 [7] NCCL INFO Ring 16 : 6 -> 7 -> 0 2024-12-05 12:33:31.962 n124-105-156:494064:500866 [4] NCCL INFO Ring 17 : 3 -> 4 -> 5 2024-12-05 12:33:31.962 n124-105-156:494060:500859 [0] NCCL INFO Tree 23 : -1 -> 0 -> 1/-1/-1 2024-12-05 12:33:31.962 n124-105-156:494065:500863 [5] NCCL INFO Ring 19 : 4 -> 5 -> 6 2024-12-05 12:33:31.962 n124-105-156:494067:500865 [7] NCCL INFO Ring 17 : 6 -> 7 -> 0 2024-12-05 12:33:31.962 n124-105-156:494066:500860 [6] NCCL INFO Ring 21 : 5 -> 6 -> 7 2024-12-05 12:33:31.962 n124-105-156:494062:500862 [2] NCCL INFO Ring 15 : 1 -> 2 -> 3 2024-12-05 12:33:31.962 n124-105-156:494063:500861 [3] NCCL INFO Ring 20 : 2 -> 3 -> 4 2024-12-05 12:33:31.962 n124-105-156:494061:500864 [1] NCCL INFO Tree 6 : 0 -> 1 -> 2/-1/-1 2024-12-05 12:33:31.962 n124-105-156:494066:500860 [6] NCCL INFO Ring 22 : 5 -> 6 -> 7 2024-12-05 12:33:31.962 n124-105-156:494060:500859 [0] NCCL INFO Channel 00/24 : 0 1 2 3 4 5 6 7 2024-12-05 12:33:31.962 n124-105-156:494062:500862 [2] NCCL INFO Ring 16 : 1 -> 2 -> 3 2024-12-05 12:33:31.962 n124-105-156:494064:500866 [4] NCCL INFO Ring 18 : 3 -> 4 -> 5 2024-12-05 12:33:31.962 n124-105-156:494065:500863 [5] NCCL INFO Ring 20 : 4 -> 5 -> 6 2024-12-05 12:33:31.962 n124-105-156:494067:500865 [7] NCCL INFO Ring 18 : 6 -> 7 -> 0 2024-12-05 12:33:31.962 n124-105-156:494063:500861 [3] NCCL INFO Ring 21 : 2 -> 3 -> 4 2024-12-05 12:33:31.962 n124-105-156:494066:500860 [6] NCCL INFO Ring 23 : 5 -> 6 -> 7 2024-12-05 12:33:31.962 n124-105-156:494061:500864 [1] NCCL INFO Tree 18 : 0 -> 1 -> 2/-1/-1 2024-12-05 12:33:31.962 n124-105-156:494063:500861 [3] NCCL INFO Ring 22 : 2 -> 3 -> 4 2024-12-05 12:33:31.962 n124-105-156:494066:500860 [6] NCCL INFO Trees [0] 7/-1/-1->6->5 [1] 7/-1/-1->6->5 [2] 7/-1/-1->6->5 [3] 7/-1/-1->6->5 [4] 7/-1/-1->6->5 [5] 7/-1/-1->6->5 [6] 7/-1/-1->6->5 [7] 7/-1/-1->6->5 [8] 7/-1/-1->6->5 [9] 7/-1/-1->6->5 [10] 7/-1/-1->6->5 [11] 7/-1/-1->6->5 [12] 7/-1/-1->6->5 [13] 7/-1/-1->6->5 [14] 7/-1/-1->6->5 [15] 7/-1/-1->6->5 [16] 7/-1/-1->6->5 [17] 7/-1/-1->6->5 [18] 7/-1/-1->6->5 [19] 7/-1/-1->6->5 [20] 7/-1/-1->6->5 [21] 7/-1/-1->6->5 [22] 7/-1/-1->6->5 [23] 7/-1/-1->6->5 2024-12-05 12:33:31.962 n124-105-156:494063:500861 [3] NCCL INFO Ring 23 : 2 -> 3 -> 4 2024-12-05 12:33:31.962 n124-105-156:494066:500860 [6] NCCL INFO P2P Chunksize set to 524288 2024-12-05 12:33:31.962 n124-105-156:494063:500861 [3] NCCL INFO Trees [0] 4/-1/-1->3->2 [1] 4/-1/-1->3->2 [2] 4/-1/-1->3->2 [3] 4/-1/-1->3->2 [4] 4/-1/-1->3->2 [5] 4/-1/-1->3->2 [6] 4/-1/-1->3->2 [7] 4/-1/-1->3->2 [8] 4/-1/-1->3->2 [9] 4/-1/-1->3->2 [10] 4/-1/-1->3->2 [11] 4/-1/-1->3->2 [12] 4/-1/-1->3->2 [13] 4/-1/-1->3->2 [14] 4/-1/-1->3->2 [15] 4/-1/-1->3->2 [16] 4/-1/-1->3->2 [17] 4/-1/-1->3->2 [18] 4/-1/-1->3->2 [19] 4/-1/-1->3->2 [20] 4/-1/-1->3->2 [21] 4/-1/-1->3->2 [22] 4/-1/-1->3->2 [23] 4/-1/-1->3->2 2024-12-05 12:33:31.962 n124-105-156:494062:500862 [2] NCCL INFO Ring 17 : 1 -> 2 -> 3 2024-12-05 12:33:31.962 n124-105-156:494060:500859 [0] NCCL INFO Channel 01/24 : 0 1 2 3 4 5 6 7 2024-12-05 12:33:31.962 n124-105-156:494064:500866 [4] NCCL INFO Ring 19 : 3 -> 4 -> 5 2024-12-05 12:33:31.962 n124-105-156:494065:500863 [5] NCCL INFO Ring 21 : 4 -> 5 -> 6 2024-12-05 12:33:31.962 n124-105-156:494067:500865 [7] NCCL INFO Ring 19 : 6 -> 7 -> 0 2024-12-05 12:33:31.962 n124-105-156:494061:500864 [1] NCCL INFO Tree 7 : 0 -> 1 -> 2/-1/-1 2024-12-05 12:33:31.962 n124-105-156:494060:500859 [0] NCCL INFO Channel 02/24 : 0 1 2 3 4 5 6 7 2024-12-05 12:33:31.962 n124-105-156:494062:500862 [2] NCCL INFO Ring 18 : 1 -> 2 -> 3 2024-12-05 12:33:31.962 n124-105-156:494064:500866 [4] NCCL INFO Ring 20 : 3 -> 4 -> 5 2024-12-05 12:33:31.962 n124-105-156:494063:500861 [3] NCCL INFO P2P Chunksize set to 524288 2024-12-05 12:33:31.962 n124-105-156:494060:500859 [0] NCCL INFO Channel 03/24 : 0 1 2 3 4 5 6 7 2024-12-05 12:33:31.962 n124-105-156:494065:500863 [5] NCCL INFO Ring 22 : 4 -> 5 -> 6 2024-12-05 12:33:31.962 n124-105-156:494067:500865 [7] NCCL INFO Ring 20 : 6 -> 7 -> 0 2024-12-05 12:33:31.962 n124-105-156:494061:500864 [1] NCCL INFO Tree 19 : 0 -> 1 -> 2/-1/-1 2024-12-05 12:33:31.962 n124-105-156:494064:500866 [4] NCCL INFO Ring 21 : 3 -> 4 -> 5 2024-12-05 12:33:31.962 n124-105-156:494062:500862 [2] NCCL INFO Ring 19 : 1 -> 2 -> 3 2024-12-05 12:33:31.962 n124-105-156:494065:500863 [5] NCCL INFO Ring 23 : 4 -> 5 -> 6 2024-12-05 12:33:31.962 n124-105-156:494060:500859 [0] NCCL INFO Channel 04/24 : 0 1 2 3 4 5 6 7 2024-12-05 12:33:31.962 n124-105-156:494064:500866 [4] NCCL INFO Ring 22 : 3 -> 4 -> 5 2024-12-05 12:33:31.962 n124-105-156:494067:500865 [7] NCCL INFO Ring 21 : 6 -> 7 -> 0 2024-12-05 12:33:31.962 n124-105-156:494061:500864 [1] NCCL INFO Tree 8 : 0 -> 1 -> 2/-1/-1 2024-12-05 12:33:31.962 n124-105-156:494062:500862 [2] NCCL INFO Ring 20 : 1 -> 2 -> 3 2024-12-05 12:33:31.962 n124-105-156:494065:500863 [5] NCCL INFO Trees [0] 6/-1/-1->5->4 [1] 6/-1/-1->5->4 [2] 6/-1/-1->5->4 [3] 6/-1/-1->5->4 [4] 6/-1/-1->5->4 [5] 6/-1/-1->5->4 [6] 6/-1/-1->5->4 [7] 6/-1/-1->5->4 [8] 6/-1/-1->5->4 [9] 6/-1/-1->5->4 [10] 6/-1/-1->5->4 [11] 6/-1/-1->5->4 [12] 6/-1/-1->5->4 [13] 6/-1/-1->5->4 [14] 6/-1/-1->5->4 [15] 6/-1/-1->5->4 [16] 6/-1/-1->5->4 [17] 6/-1/-1->5->4 [18] 6/-1/-1->5->4 [19] 6/-1/-1->5->4 [20] 6/-1/-1->5->4 [21] 6/-1/-1->5->4 [22] 6/-1/-1->5->4 [23] 6/-1/-1->5->4 2024-12-05 12:33:31.962 n124-105-156:494060:500859 [0] NCCL INFO Channel 05/24 : 0 1 2 3 4 5 6 7 2024-12-05 12:33:31.962 n124-105-156:494064:500866 [4] NCCL INFO Ring 23 : 3 -> 4 -> 5 2024-12-05 12:33:31.962 n124-105-156:494067:500865 [7] NCCL INFO Ring 22 : 6 -> 7 -> 0 2024-12-05 12:33:31.962 n124-105-156:494061:500864 [1] NCCL INFO Tree 20 : 0 -> 1 -> 2/-1/-1 2024-12-05 12:33:31.962 n124-105-156:494062:500862 [2] NCCL INFO Ring 21 : 1 -> 2 -> 3 2024-12-05 12:33:31.962 n124-105-156:494065:500863 [5] NCCL INFO P2P Chunksize set to 524288 2024-12-05 12:33:31.962 n124-105-156:494067:500865 [7] NCCL INFO Ring 23 : 6 -> 7 -> 0 2024-12-05 12:33:31.962 n124-105-156:494064:500866 [4] NCCL INFO Trees [0] 5/-1/-1->4->3 [1] 5/-1/-1->4->3 [2] 5/-1/-1->4->3 [3] 5/-1/-1->4->3 [4] 5/-1/-1->4->3 [5] 5/-1/-1->4->3 [6] 5/-1/-1->4->3 [7] 5/-1/-1->4->3 [8] 5/-1/-1->4->3 [9] 5/-1/-1->4->3 [10] 5/-1/-1->4->3 [11] 5/-1/-1->4->3 [12] 5/-1/-1->4->3 [13] 5/-1/-1->4->3 [14] 5/-1/-1->4->3 [15] 5/-1/-1->4->3 [16] 5/-1/-1->4->3 [17] 5/-1/-1->4->3 [18] 5/-1/-1->4->3 [19] 5/-1/-1->4->3 [20] 5/-1/-1->4->3 [21] 5/-1/-1->4->3 [22] 5/-1/-1->4->3 [23] 5/-1/-1->4->3 2024-12-05 12:33:31.962 n124-105-156:494060:500859 [0] NCCL INFO Channel 06/24 : 0 1 2 3 4 5 6 7 2024-12-05 12:33:31.962 n124-105-156:494062:500862 [2] NCCL INFO Ring 22 : 1 -> 2 -> 3 2024-12-05 12:33:31.962 n124-105-156:494061:500864 [1] NCCL INFO Tree 9 : 0 -> 1 -> 2/-1/-1 2024-12-05 12:33:31.962 n124-105-156:494064:500866 [4] NCCL INFO P2P Chunksize set to 524288 2024-12-05 12:33:31.962 n124-105-156:494067:500865 [7] NCCL INFO Trees [0] -1/-1/-1->7->6 [1] -1/-1/-1->7->6 [2] -1/-1/-1->7->6 [3] -1/-1/-1->7->6 [4] -1/-1/-1->7->6 [5] -1/-1/-1->7->6 [6] -1/-1/-1->7->6 [7] -1/-1/-1->7->6 [8] -1/-1/-1->7->6 [9] -1/-1/-1->7->6 [10] -1/-1/-1->7->6 [11] -1/-1/-1->7->6 [12] -1/-1/-1->7->6 [13] -1/-1/-1->7->6 [14] -1/-1/-1->7->6 [15] -1/-1/-1->7->6 [16] -1/-1/-1->7->6 [17] -1/-1/-1->7->6 [18] -1/-1/-1->7->6 [19] -1/-1/-1->7->6 [20] -1/-1/-1->7->6 [21] -1/-1/-1->7->6 [22] -1/-1/-1->7->6 [23] -1/-1/-1->7->6 2024-12-05 12:33:31.962 n124-105-156:494062:500862 [2] NCCL INFO Ring 23 : 1 -> 2 -> 3 2024-12-05 12:33:31.962 n124-105-156:494060:500859 [0] NCCL INFO Channel 07/24 : 0 1 2 3 4 5 6 7 2024-12-05 12:33:31.962 n124-105-156:494061:500864 [1] NCCL INFO Tree 21 : 0 -> 1 -> 2/-1/-1 2024-12-05 12:33:31.962 n124-105-156:494067:500865 [7] NCCL INFO P2P Chunksize set to 524288 2024-12-05 12:33:31.962 n124-105-156:494062:500862 [2] NCCL INFO Trees [0] 3/-1/-1->2->1 [1] 3/-1/-1->2->1 [2] 3/-1/-1->2->1 [3] 3/-1/-1->2->1 [4] 3/-1/-1->2->1 [5] 3/-1/-1->2->1 [6] 3/-1/-1->2->1 [7] 3/-1/-1->2->1 [8] 3/-1/-1->2->1 [9] 3/-1/-1->2->1 [10] 3/-1/-1->2->1 [11] 3/-1/-1->2->1 [12] 3/-1/-1->2->1 [13] 3/-1/-1->2->1 [14] 3/-1/-1->2->1 [15] 3/-1/-1->2->1 [16] 3/-1/-1->2->1 [17] 3/-1/-1->2->1 [18] 3/-1/-1->2->1 [19] 3/-1/-1->2->1 [20] 3/-1/-1->2->1 [21] 3/-1/-1->2->1 [22] 3/-1/-1->2->1 [23] 3/-1/-1->2->1 2024-12-05 12:33:31.962 n124-105-156:494060:500859 [0] NCCL INFO Channel 08/24 : 0 1 2 3 4 5 6 7 2024-12-05 12:33:31.962 n124-105-156:494061:500864 [1] NCCL INFO Tree 10 : 0 -> 1 -> 2/-1/-1 2024-12-05 12:33:31.962 n124-105-156:494060:500859 [0] NCCL INFO Channel 09/24 : 0 1 2 3 4 5 6 7 2024-12-05 12:33:31.962 n124-105-156:494061:500864 [1] NCCL INFO Tree 22 : 0 -> 1 -> 2/-1/-1 2024-12-05 12:33:31.962 n124-105-156:494062:500862 [2] NCCL INFO P2P Chunksize set to 524288 2024-12-05 12:33:31.962 n124-105-156:494060:500859 [0] NCCL INFO Channel 10/24 : 0 1 2 3 4 5 6 7 2024-12-05 12:33:31.962 n124-105-156:494061:500864 [1] NCCL INFO Tree 11 : 0 -> 1 -> 2/-1/-1 2024-12-05 12:33:31.962 n124-105-156:494060:500859 [0] NCCL INFO Channel 11/24 : 0 1 2 3 4 5 6 7 2024-12-05 12:33:31.962 n124-105-156:494061:500864 [1] NCCL INFO Tree 23 : 0 -> 1 -> 2/-1/-1 2024-12-05 12:33:31.962 n124-105-156:494060:500859 [0] NCCL INFO Channel 12/24 : 0 1 2 3 4 5 6 7 2024-12-05 12:33:31.962 n124-105-156:494060:500859 [0] NCCL INFO Channel 13/24 : 0 1 2 3 4 5 6 7 2024-12-05 12:33:31.962 n124-105-156:494060:500859 [0] NCCL INFO Channel 14/24 : 0 1 2 3 4 5 6 7 2024-12-05 12:33:31.962 n124-105-156:494060:500859 [0] NCCL INFO Channel 15/24 : 0 1 2 3 4 5 6 7 2024-12-05 12:33:31.962 n124-105-156:494060:500859 [0] NCCL INFO Channel 16/24 : 0 1 2 3 4 5 6 7 2024-12-05 12:33:31.962 n124-105-156:494060:500859 [0] NCCL INFO Channel 17/24 : 0 1 2 3 4 5 6 7 2024-12-05 12:33:31.962 n124-105-156:494060:500859 [0] NCCL INFO Channel 18/24 : 0 1 2 3 4 5 6 7 2024-12-05 12:33:31.962 n124-105-156:494061:500864 [1] NCCL INFO Ring 00 : 0 -> 1 -> 2 2024-12-05 12:33:31.962 n124-105-156:494060:500859 [0] NCCL INFO Channel 19/24 : 0 1 2 3 4 5 6 7 2024-12-05 12:33:31.962 n124-105-156:494060:500859 [0] NCCL INFO Channel 20/24 : 0 1 2 3 4 5 6 7 2024-12-05 12:33:31.962 n124-105-156:494061:500864 [1] NCCL INFO Ring 01 : 0 -> 1 -> 2 2024-12-05 12:33:31.962 n124-105-156:494060:500859 [0] NCCL INFO Channel 21/24 : 0 1 2 3 4 5 6 7 2024-12-05 12:33:31.962 n124-105-156:494061:500864 [1] NCCL INFO Ring 02 : 0 -> 1 -> 2 2024-12-05 12:33:31.962 n124-105-156:494060:500859 [0] NCCL INFO Channel 22/24 : 0 1 2 3 4 5 6 7 2024-12-05 12:33:31.962 n124-105-156:494061:500864 [1] NCCL INFO Ring 03 : 0 -> 1 -> 2 2024-12-05 12:33:31.962 n124-105-156:494060:500859 [0] NCCL INFO Channel 23/24 : 0 1 2 3 4 5 6 7 2024-12-05 12:33:31.962 n124-105-156:494061:500864 [1] NCCL INFO Ring 04 : 0 -> 1 -> 2 2024-12-05 12:33:31.962 n124-105-156:494060:500859 [0] NCCL INFO Ring 00 : 7 -> 0 -> 1 2024-12-05 12:33:31.962 n124-105-156:494060:500859 [0] NCCL INFO Ring 01 : 7 -> 0 -> 1 2024-12-05 12:33:31.962 n124-105-156:494061:500864 [1] NCCL INFO Ring 05 : 0 -> 1 -> 2 2024-12-05 12:33:31.962 n124-105-156:494060:500859 [0] NCCL INFO Ring 02 : 7 -> 0 -> 1 2024-12-05 12:33:31.962 n124-105-156:494061:500864 [1] NCCL INFO Ring 06 : 0 -> 1 -> 2 2024-12-05 12:33:31.962 n124-105-156:494061:500864 [1] NCCL INFO Ring 07 : 0 -> 1 -> 2 2024-12-05 12:33:31.962 n124-105-156:494060:500859 [0] NCCL INFO Ring 03 : 7 -> 0 -> 1 2024-12-05 12:33:31.962 n124-105-156:494061:500864 [1] NCCL INFO Ring 08 : 0 -> 1 -> 2 2024-12-05 12:33:31.962 n124-105-156:494060:500859 [0] NCCL INFO Ring 04 : 7 -> 0 -> 1 2024-12-05 12:33:31.962 n124-105-156:494061:500864 [1] NCCL INFO Ring 09 : 0 -> 1 -> 2 2024-12-05 12:33:31.962 n124-105-156:494060:500859 [0] NCCL INFO Ring 05 : 7 -> 0 -> 1 2024-12-05 12:33:31.962 n124-105-156:494061:500864 [1] NCCL INFO Ring 10 : 0 -> 1 -> 2 2024-12-05 12:33:31.962 n124-105-156:494060:500859 [0] NCCL INFO Ring 06 : 7 -> 0 -> 1 2024-12-05 12:33:31.962 n124-105-156:494061:500864 [1] NCCL INFO Ring 11 : 0 -> 1 -> 2 2024-12-05 12:33:31.962 n124-105-156:494060:500859 [0] NCCL INFO Ring 07 : 7 -> 0 -> 1 2024-12-05 12:33:31.962 n124-105-156:494061:500864 [1] NCCL INFO Ring 12 : 0 -> 1 -> 2 2024-12-05 12:33:31.962 n124-105-156:494060:500859 [0] NCCL INFO Ring 08 : 7 -> 0 -> 1 2024-12-05 12:33:31.962 n124-105-156:494061:500864 [1] NCCL INFO Ring 13 : 0 -> 1 -> 2 2024-12-05 12:33:31.962 n124-105-156:494060:500859 [0] NCCL INFO Ring 09 : 7 -> 0 -> 1 2024-12-05 12:33:31.962 n124-105-156:494061:500864 [1] NCCL INFO Ring 14 : 0 -> 1 -> 2 2024-12-05 12:33:31.962 n124-105-156:494060:500859 [0] NCCL INFO Ring 10 : 7 -> 0 -> 1 2024-12-05 12:33:31.962 n124-105-156:494061:500864 [1] NCCL INFO Ring 15 : 0 -> 1 -> 2 2024-12-05 12:33:31.962 n124-105-156:494060:500859 [0] NCCL INFO Ring 11 : 7 -> 0 -> 1 2024-12-05 12:33:31.962 n124-105-156:494061:500864 [1] NCCL INFO Ring 16 : 0 -> 1 -> 2 2024-12-05 12:33:31.962 n124-105-156:494060:500859 [0] NCCL INFO Ring 12 : 7 -> 0 -> 1 2024-12-05 12:33:31.962 n124-105-156:494061:500864 [1] NCCL INFO Ring 17 : 0 -> 1 -> 2 2024-12-05 12:33:31.962 n124-105-156:494060:500859 [0] NCCL INFO Ring 13 : 7 -> 0 -> 1 2024-12-05 12:33:31.962 n124-105-156:494061:500864 [1] NCCL INFO Ring 18 : 0 -> 1 -> 2 2024-12-05 12:33:31.962 n124-105-156:494060:500859 [0] NCCL INFO Ring 14 : 7 -> 0 -> 1 2024-12-05 12:33:31.962 n124-105-156:494061:500864 [1] NCCL INFO Ring 19 : 0 -> 1 -> 2 2024-12-05 12:33:31.962 n124-105-156:494060:500859 [0] NCCL INFO Ring 15 : 7 -> 0 -> 1 2024-12-05 12:33:31.962 n124-105-156:494061:500864 [1] NCCL INFO Ring 20 : 0 -> 1 -> 2 2024-12-05 12:33:31.962 n124-105-156:494060:500859 [0] NCCL INFO Ring 16 : 7 -> 0 -> 1 2024-12-05 12:33:31.962 n124-105-156:494061:500864 [1] NCCL INFO Ring 21 : 0 -> 1 -> 2 2024-12-05 12:33:31.962 n124-105-156:494060:500859 [0] NCCL INFO Ring 17 : 7 -> 0 -> 1 2024-12-05 12:33:31.962 n124-105-156:494061:500864 [1] NCCL INFO Ring 22 : 0 -> 1 -> 2 2024-12-05 12:33:31.962 n124-105-156:494060:500859 [0] NCCL INFO Ring 18 : 7 -> 0 -> 1 2024-12-05 12:33:31.962 n124-105-156:494061:500864 [1] NCCL INFO Ring 23 : 0 -> 1 -> 2 2024-12-05 12:33:31.962 n124-105-156:494060:500859 [0] NCCL INFO Ring 19 : 7 -> 0 -> 1 2024-12-05 12:33:31.962 n124-105-156:494061:500864 [1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] 2/-1/-1->1->0 [2] 2/-1/-1->1->0 [3] 2/-1/-1->1->0 [4] 2/-1/-1->1->0 [5] 2/-1/-1->1->0 [6] 2/-1/-1->1->0 [7] 2/-1/-1->1->0 [8] 2/-1/-1->1->0 [9] 2/-1/-1->1->0 [10] 2/-1/-1->1->0 [11] 2/-1/-1->1->0 [12] 2/-1/-1->1->0 [13] 2/-1/-1->1->0 [14] 2/-1/-1->1->0 [15] 2/-1/-1->1->0 [16] 2/-1/-1->1->0 [17] 2/-1/-1->1->0 [18] 2/-1/-1->1->0 [19] 2/-1/-1->1->0 [20] 2/-1/-1->1->0 [21] 2/-1/-1->1->0 [22] 2/-1/-1->1->0 [23] 2/-1/-1->1->0 2024-12-05 12:33:31.962 n124-105-156:494060:500859 [0] NCCL INFO Ring 20 : 7 -> 0 -> 1 2024-12-05 12:33:31.962 n124-105-156:494060:500859 [0] NCCL INFO Ring 21 : 7 -> 0 -> 1 2024-12-05 12:33:31.962 n124-105-156:494060:500859 [0] NCCL INFO Ring 22 : 7 -> 0 -> 1 2024-12-05 12:33:31.962 n124-105-156:494061:500864 [1] NCCL INFO P2P Chunksize set to 524288 2024-12-05 12:33:31.962 n124-105-156:494060:500859 [0] NCCL INFO Ring 23 : 7 -> 0 -> 1 2024-12-05 12:33:31.962 n124-105-156:494060:500859 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 [2] 1/-1/-1->0->-1 [3] 1/-1/-1->0->-1 [4] 1/-1/-1->0->-1 [5] 1/-1/-1->0->-1 [6] 1/-1/-1->0->-1 [7] 1/-1/-1->0->-1 [8] 1/-1/-1->0->-1 [9] 1/-1/-1->0->-1 [10] 1/-1/-1->0->-1 [11] 1/-1/-1->0->-1 [12] 1/-1/-1->0->-1 [13] 1/-1/-1->0->-1 [14] 1/-1/-1->0->-1 [15] 1/-1/-1->0->-1 [16] 1/-1/-1->0->-1 [17] 1/-1/-1->0->-1 [18] 1/-1/-1->0->-1 [19] 1/-1/-1->0->-1 [20] 1/-1/-1->0->-1 [21] 1/-1/-1->0->-1 [22] 1/-1/-1->0->-1 [23] 1/-1/-1->0->-1 2024-12-05 12:33:31.962 n124-105-156:494060:500859 [0] NCCL INFO P2P Chunksize set to 524288 2024-12-05 12:33:32.033 n124-105-156:494067:500865 [7] NCCL INFO Ring Channel 00/0 : 7[7] -> 0[0] via P2P/CUMEM 2024-12-05 12:33:32.033 n124-105-156:494065:500863 [5] NCCL INFO Ring Channel 00/0 : 5[5] -> 6[6] via P2P/CUMEM 2024-12-05 12:33:32.033 n124-105-156:494064:500866 [4] NCCL INFO Ring Channel 00/0 : 4[4] -> 5[5] via P2P/CUMEM 2024-12-05 12:33:32.036 n124-105-156:494067:500865 [7] NCCL INFO Ring Channel 01/0 : 7[7] -> 0[0] via P2P/CUMEM 2024-12-05 12:33:32.036 n124-105-156:494064:500866 [4] NCCL INFO Ring Channel 01/0 : 4[4] -> 5[5] via P2P/CUMEM 2024-12-05 12:33:32.036 n124-105-156:494067:500865 [7] NCCL INFO Ring Channel 02/0 : 7[7] -> 0[0] via P2P/CUMEM 2024-12-05 12:33:32.036 n124-105-156:494064:500866 [4] NCCL INFO Ring Channel 02/0 : 4[4] -> 5[5] via P2P/CUMEM 2024-12-05 12:33:32.036 n124-105-156:494065:500863 [5] NCCL INFO Ring Channel 01/0 : 5[5] -> 6[6] via P2P/CUMEM 2024-12-05 12:33:32.036 n124-105-156:494067:500865 [7] NCCL INFO Ring Channel 03/0 : 7[7] -> 0[0] via P2P/CUMEM 2024-12-05 12:33:32.036 n124-105-156:494064:500866 [4] NCCL INFO Ring Channel 03/0 : 4[4] -> 5[5] via P2P/CUMEM 2024-12-05 12:33:32.037 n124-105-156:494065:500863 [5] NCCL INFO Ring Channel 02/0 : 5[5] -> 6[6] via P2P/CUMEM 2024-12-05 12:33:32.037 n124-105-156:494067:500865 [7] NCCL INFO Ring Channel 04/0 : 7[7] -> 0[0] via P2P/CUMEM 2024-12-05 12:33:32.037 n124-105-156:494064:500866 [4] NCCL INFO Ring Channel 04/0 : 4[4] -> 5[5] via P2P/CUMEM 2024-12-05 12:33:32.037 n124-105-156:494065:500863 [5] NCCL INFO Ring Channel 03/0 : 5[5] -> 6[6] via P2P/CUMEM 2024-12-05 12:33:32.037 n124-105-156:494067:500865 [7] NCCL INFO Ring Channel 05/0 : 7[7] -> 0[0] via P2P/CUMEM 2024-12-05 12:33:32.037 n124-105-156:494064:500866 [4] NCCL INFO Ring Channel 05/0 : 4[4] -> 5[5] via P2P/CUMEM 2024-12-05 12:33:32.037 n124-105-156:494065:500863 [5] NCCL INFO Ring Channel 04/0 : 5[5] -> 6[6] via P2P/CUMEM 2024-12-05 12:33:32.037 n124-105-156:494067:500865 [7] NCCL INFO Ring Channel 06/0 : 7[7] -> 0[0] via P2P/CUMEM 2024-12-05 12:33:32.038 n124-105-156:494064:500866 [4] NCCL INFO Ring Channel 06/0 : 4[4] -> 5[5] via P2P/CUMEM 2024-12-05 12:33:32.038 n124-105-156:494065:500863 [5] NCCL INFO Ring Channel 05/0 : 5[5] -> 6[6] via P2P/CUMEM 2024-12-05 12:33:32.038 n124-105-156:494067:500865 [7] NCCL INFO Ring Channel 07/0 : 7[7] -> 0[0] via P2P/CUMEM 2024-12-05 12:33:32.038 n124-105-156:494061:500864 [1] NCCL INFO Ring Channel 00/0 : 1[1] -> 2[2] via P2P/CUMEM 2024-12-05 12:33:32.038 n124-105-156:494064:500866 [4] NCCL INFO Ring Channel 07/0 : 4[4] -> 5[5] via P2P/CUMEM 2024-12-05 12:33:32.038 n124-105-156:494065:500863 [5] NCCL INFO Ring Channel 06/0 : 5[5] -> 6[6] via P2P/CUMEM 2024-12-05 12:33:32.038 n124-105-156:494061:500864 [1] NCCL INFO Ring Channel 01/0 : 1[1] -> 2[2] via P2P/CUMEM 2024-12-05 12:33:32.039 n124-105-156:494061:500864 [1] NCCL INFO Ring Channel 02/0 : 1[1] -> 2[2] via P2P/CUMEM 2024-12-05 12:33:32.039 n124-105-156:494064:500866 [4] NCCL INFO Ring Channel 08/0 : 4[4] -> 5[5] via P2P/CUMEM 2024-12-05 12:33:32.039 n124-105-156:494061:500864 [1] NCCL INFO Ring Channel 03/0 : 1[1] -> 2[2] via P2P/CUMEM 2024-12-05 12:33:32.039 n124-105-156:494064:500866 [4] NCCL INFO Ring Channel 09/0 : 4[4] -> 5[5] via P2P/CUMEM 2024-12-05 12:33:32.039 n124-105-156:494066:500860 [6] NCCL INFO Ring Channel 00/0 : 6[6] -> 7[7] via P2P/CUMEM 2024-12-05 12:33:32.040 n124-105-156:494061:500864 [1] NCCL INFO Ring Channel 04/0 : 1[1] -> 2[2] via P2P/CUMEM 2024-12-05 12:33:32.040 n124-105-156:494064:500866 [4] NCCL INFO Ring Channel 10/0 : 4[4] -> 5[5] via P2P/CUMEM 2024-12-05 12:33:32.040 n124-105-156:494066:500860 [6] NCCL INFO Ring Channel 01/0 : 6[6] -> 7[7] via P2P/CUMEM 2024-12-05 12:33:32.040 n124-105-156:494061:500864 [1] NCCL INFO Ring Channel 05/0 : 1[1] -> 2[2] via P2P/CUMEM 2024-12-05 12:33:32.040 n124-105-156:494064:500866 [4] NCCL INFO Ring Channel 11/0 : 4[4] -> 5[5] via P2P/CUMEM 2024-12-05 12:33:32.040 n124-105-156:494066:500860 [6] NCCL INFO Ring Channel 02/0 : 6[6] -> 7[7] via P2P/CUMEM 2024-12-05 12:33:32.040 n124-105-156:494064:500866 [4] NCCL INFO Ring Channel 12/0 : 4[4] -> 5[5] via P2P/CUMEM 2024-12-05 12:33:32.040 n124-105-156:494061:500864 [1] NCCL INFO Ring Channel 06/0 : 1[1] -> 2[2] via P2P/CUMEM 2024-12-05 12:33:32.040 n124-105-156:494066:500860 [6] NCCL INFO Ring Channel 03/0 : 6[6] -> 7[7] via P2P/CUMEM 2024-12-05 12:33:32.041 n124-105-156:494064:500866 [4] NCCL INFO Ring Channel 13/0 : 4[4] -> 5[5] via P2P/CUMEM 2024-12-05 12:33:32.041 n124-105-156:494061:500864 [1] NCCL INFO Ring Channel 07/0 : 1[1] -> 2[2] via P2P/CUMEM 2024-12-05 12:33:32.041 n124-105-156:494066:500860 [6] NCCL INFO Ring Channel 04/0 : 6[6] -> 7[7] via P2P/CUMEM 2024-12-05 12:33:32.041 n124-105-156:494064:500866 [4] NCCL INFO Ring Channel 14/0 : 4[4] -> 5[5] via P2P/CUMEM 2024-12-05 12:33:32.041 n124-105-156:494061:500864 [1] NCCL INFO Ring Channel 08/0 : 1[1] -> 2[2] via P2P/CUMEM 2024-12-05 12:33:32.041 n124-105-156:494066:500860 [6] NCCL INFO Ring Channel 05/0 : 6[6] -> 7[7] via P2P/CUMEM 2024-12-05 12:33:32.041 n124-105-156:494064:500866 [4] NCCL INFO Ring Channel 15/0 : 4[4] -> 5[5] via P2P/CUMEM 2024-12-05 12:33:32.042 n124-105-156:494061:500864 [1] NCCL INFO Ring Channel 09/0 : 1[1] -> 2[2] via P2P/CUMEM 2024-12-05 12:33:32.042 n124-105-156:494066:500860 [6] NCCL INFO Ring Channel 06/0 : 6[6] -> 7[7] via P2P/CUMEM 2024-12-05 12:33:32.042 n124-105-156:494064:500866 [4] NCCL INFO Ring Channel 16/0 : 4[4] -> 5[5] via P2P/CUMEM 2024-12-05 12:33:32.042 n124-105-156:494061:500864 [1] NCCL INFO Ring Channel 10/0 : 1[1] -> 2[2] via P2P/CUMEM 2024-12-05 12:33:32.042 n124-105-156:494066:500860 [6] NCCL INFO Ring Channel 07/0 : 6[6] -> 7[7] via P2P/CUMEM 2024-12-05 12:33:32.042 n124-105-156:494064:500866 [4] NCCL INFO Ring Channel 17/0 : 4[4] -> 5[5] via P2P/CUMEM 2024-12-05 12:33:32.042 n124-105-156:494061:500864 [1] NCCL INFO Ring Channel 11/0 : 1[1] -> 2[2] via P2P/CUMEM 2024-12-05 12:33:32.043 n124-105-156:494066:500860 [6] NCCL INFO Ring Channel 08/0 : 6[6] -> 7[7] via P2P/CUMEM 2024-12-05 12:33:32.043 n124-105-156:494064:500866 [4] NCCL INFO Ring Channel 18/0 : 4[4] -> 5[5] via P2P/CUMEM 2024-12-05 12:33:32.043 n124-105-156:494061:500864 [1] NCCL INFO Ring Channel 12/0 : 1[1] -> 2[2] via P2P/CUMEM 2024-12-05 12:33:32.043 n124-105-156:494066:500860 [6] NCCL INFO Ring Channel 09/0 : 6[6] -> 7[7] via P2P/CUMEM 2024-12-05 12:33:32.043 n124-105-156:494064:500866 [4] NCCL INFO Ring Channel 19/0 : 4[4] -> 5[5] via P2P/CUMEM 2024-12-05 12:33:32.043 n124-105-156:494067:500865 [7] NCCL INFO Ring Channel 08/0 : 7[7] -> 0[0] via P2P/CUMEM 2024-12-05 12:33:32.043 n124-105-156:494061:500864 [1] NCCL INFO Ring Channel 13/0 : 1[1] -> 2[2] via P2P/CUMEM 2024-12-05 12:33:32.043 n124-105-156:494060:500859 [0] NCCL INFO Ring Channel 00/0 : 0[0] -> 1[1] via P2P/CUMEM 2024-12-05 12:33:32.044 n124-105-156:494067:500865 [7] NCCL INFO Ring Channel 09/0 : 7[7] -> 0[0] via P2P/CUMEM 2024-12-05 12:33:32.044 n124-105-156:494060:500859 [0] NCCL INFO Ring Channel 01/0 : 0[0] -> 1[1] via P2P/CUMEM 2024-12-05 12:33:32.044 n124-105-156:494064:500866 [4] NCCL INFO Ring Channel 20/0 : 4[4] -> 5[5] via P2P/CUMEM 2024-12-05 12:33:32.044 n124-105-156:494067:500865 [7] NCCL INFO Ring Channel 10/0 : 7[7] -> 0[0] via P2P/CUMEM 2024-12-05 12:33:32.044 n124-105-156:494060:500859 [0] NCCL INFO Ring Channel 02/0 : 0[0] -> 1[1] via P2P/CUMEM 2024-12-05 12:33:32.044 n124-105-156:494064:500866 [4] NCCL INFO Ring Channel 21/0 : 4[4] -> 5[5] via P2P/CUMEM 2024-12-05 12:33:32.044 n124-105-156:494067:500865 [7] NCCL INFO Ring Channel 11/0 : 7[7] -> 0[0] via P2P/CUMEM 2024-12-05 12:33:32.045 n124-105-156:494060:500859 [0] NCCL INFO Ring Channel 03/0 : 0[0] -> 1[1] via P2P/CUMEM 2024-12-05 12:33:32.045 n124-105-156:494064:500866 [4] NCCL INFO Ring Channel 22/0 : 4[4] -> 5[5] via P2P/CUMEM 2024-12-05 12:33:32.045 n124-105-156:494067:500865 [7] NCCL INFO Ring Channel 12/0 : 7[7] -> 0[0] via P2P/CUMEM 2024-12-05 12:33:32.045 n124-105-156:494060:500859 [0] NCCL INFO Ring Channel 04/0 : 0[0] -> 1[1] via P2P/CUMEM 2024-12-05 12:33:32.045 n124-105-156:494064:500866 [4] NCCL INFO Ring Channel 23/0 : 4[4] -> 5[5] via P2P/CUMEM 2024-12-05 12:33:32.045 n124-105-156:494067:500865 [7] NCCL INFO Ring Channel 13/0 : 7[7] -> 0[0] via P2P/CUMEM 2024-12-05 12:33:32.045 n124-105-156:494060:500859 [0] NCCL INFO Ring Channel 05/0 : 0[0] -> 1[1] via P2P/CUMEM 2024-12-05 12:33:32.046 n124-105-156:494067:500865 [7] NCCL INFO Ring Channel 14/0 : 7[7] -> 0[0] via P2P/CUMEM 2024-12-05 12:33:32.046 n124-105-156:494060:500859 [0] NCCL INFO Ring Channel 06/0 : 0[0] -> 1[1] via P2P/CUMEM 2024-12-05 12:33:32.046 n124-105-156:494066:500860 [6] NCCL INFO Ring Channel 10/0 : 6[6] -> 7[7] via P2P/CUMEM 2024-12-05 12:33:32.046 n124-105-156:494062:500862 [2] NCCL INFO Ring Channel 00/0 : 2[2] -> 3[3] via P2P/CUMEM 2024-12-05 12:33:32.046 n124-105-156:494067:500865 [7] NCCL INFO Ring Channel 15/0 : 7[7] -> 0[0] via P2P/CUMEM 2024-12-05 12:33:32.046 n124-105-156:494060:500859 [0] NCCL INFO Ring Channel 07/0 : 0[0] -> 1[1] via P2P/CUMEM 2024-12-05 12:33:32.046 n124-105-156:494066:500860 [6] NCCL INFO Ring Channel 11/0 : 6[6] -> 7[7] via P2P/CUMEM 2024-12-05 12:33:32.046 n124-105-156:494062:500862 [2] NCCL INFO Ring Channel 01/0 : 2[2] -> 3[3] via P2P/CUMEM 2024-12-05 12:33:32.046 n124-105-156:494067:500865 [7] NCCL INFO Ring Channel 16/0 : 7[7] -> 0[0] via P2P/CUMEM 2024-12-05 12:33:32.047 n124-105-156:494060:500859 [0] NCCL INFO Ring Channel 08/0 : 0[0] -> 1[1] via P2P/CUMEM 2024-12-05 12:33:32.047 n124-105-156:494066:500860 [6] NCCL INFO Ring Channel 12/0 : 6[6] -> 7[7] via P2P/CUMEM 2024-12-05 12:33:32.047 n124-105-156:494062:500862 [2] NCCL INFO Ring Channel 02/0 : 2[2] -> 3[3] via P2P/CUMEM 2024-12-05 12:33:32.047 n124-105-156:494067:500865 [7] NCCL INFO Ring Channel 17/0 : 7[7] -> 0[0] via P2P/CUMEM 2024-12-05 12:33:32.047 n124-105-156:494063:500861 [3] NCCL INFO Ring Channel 00/0 : 3[3] -> 4[4] via P2P/CUMEM 2024-12-05 12:33:32.047 n124-105-156:494060:500859 [0] NCCL INFO Ring Channel 09/0 : 0[0] -> 1[1] via P2P/CUMEM 2024-12-05 12:33:32.047 n124-105-156:494066:500860 [6] NCCL INFO Ring Channel 13/0 : 6[6] -> 7[7] via P2P/CUMEM 2024-12-05 12:33:32.047 n124-105-156:494062:500862 [2] NCCL INFO Ring Channel 03/0 : 2[2] -> 3[3] via P2P/CUMEM 2024-12-05 12:33:32.047 n124-105-156:494067:500865 [7] NCCL INFO Ring Channel 18/0 : 7[7] -> 0[0] via P2P/CUMEM 2024-12-05 12:33:32.047 n124-105-156:494060:500859 [0] NCCL INFO Ring Channel 10/0 : 0[0] -> 1[1] via P2P/CUMEM 2024-12-05 12:33:32.047 n124-105-156:494063:500861 [3] NCCL INFO Ring Channel 01/0 : 3[3] -> 4[4] via P2P/CUMEM 2024-12-05 12:33:32.048 n124-105-156:494066:500860 [6] NCCL INFO Ring Channel 14/0 : 6[6] -> 7[7] via P2P/CUMEM 2024-12-05 12:33:32.048 n124-105-156:494062:500862 [2] NCCL INFO Ring Channel 04/0 : 2[2] -> 3[3] via P2P/CUMEM 2024-12-05 12:33:32.048 n124-105-156:494067:500865 [7] NCCL INFO Ring Channel 19/0 : 7[7] -> 0[0] via P2P/CUMEM 2024-12-05 12:33:32.048 n124-105-156:494060:500859 [0] NCCL INFO Ring Channel 11/0 : 0[0] -> 1[1] via P2P/CUMEM 2024-12-05 12:33:32.048 n124-105-156:494063:500861 [3] NCCL INFO Ring Channel 02/0 : 3[3] -> 4[4] via P2P/CUMEM 2024-12-05 12:33:32.048 n124-105-156:494066:500860 [6] NCCL INFO Ring Channel 15/0 : 6[6] -> 7[7] via P2P/CUMEM 2024-12-05 12:33:32.048 n124-105-156:494062:500862 [2] NCCL INFO Ring Channel 05/0 : 2[2] -> 3[3] via P2P/CUMEM 2024-12-05 12:33:32.048 n124-105-156:494067:500865 [7] NCCL INFO Ring Channel 20/0 : 7[7] -> 0[0] via P2P/CUMEM 2024-12-05 12:33:32.048 n124-105-156:494060:500859 [0] NCCL INFO Ring Channel 12/0 : 0[0] -> 1[1] via P2P/CUMEM 2024-12-05 12:33:32.048 n124-105-156:494062:500862 [2] NCCL INFO Ring Channel 06/0 : 2[2] -> 3[3] via P2P/CUMEM 2024-12-05 12:33:32.048 n124-105-156:494061:500864 [1] NCCL INFO Ring Channel 14/0 : 1[1] -> 2[2] via P2P/CUMEM 2024-12-05 12:33:32.048 n124-105-156:494066:500860 [6] NCCL INFO Ring Channel 16/0 : 6[6] -> 7[7] via P2P/CUMEM 2024-12-05 12:33:32.049 n124-105-156:494060:500859 [0] NCCL INFO Ring Channel 13/0 : 0[0] -> 1[1] via P2P/CUMEM 2024-12-05 12:33:32.049 n124-105-156:494062:500862 [2] NCCL INFO Ring Channel 07/0 : 2[2] -> 3[3] via P2P/CUMEM 2024-12-05 12:33:32.049 n124-105-156:494061:500864 [1] NCCL INFO Ring Channel 15/0 : 1[1] -> 2[2] via P2P/CUMEM 2024-12-05 12:33:32.049 n124-105-156:494066:500860 [6] NCCL INFO Ring Channel 17/0 : 6[6] -> 7[7] via P2P/CUMEM 2024-12-05 12:33:32.049 n124-105-156:494060:500859 [0] NCCL INFO Ring Channel 14/0 : 0[0] -> 1[1] via P2P/CUMEM 2024-12-05 12:33:32.049 n124-105-156:494062:500862 [2] NCCL INFO Ring Channel 08/0 : 2[2] -> 3[3] via P2P/CUMEM 2024-12-05 12:33:32.049 n124-105-156:494061:500864 [1] NCCL INFO Ring Channel 16/0 : 1[1] -> 2[2] via P2P/CUMEM 2024-12-05 12:33:32.049 n124-105-156:494066:500860 [6] NCCL INFO Ring Channel 18/0 : 6[6] -> 7[7] via P2P/CUMEM 2024-12-05 12:33:32.049 n124-105-156:494060:500859 [0] NCCL INFO Ring Channel 15/0 : 0[0] -> 1[1] via P2P/CUMEM 2024-12-05 12:33:32.049 n124-105-156:494062:500862 [2] NCCL INFO Ring Channel 09/0 : 2[2] -> 3[3] via P2P/CUMEM 2024-12-05 12:33:32.049 n124-105-156:494061:500864 [1] NCCL INFO Ring Channel 17/0 : 1[1] -> 2[2] via P2P/CUMEM 2024-12-05 12:33:32.049 n124-105-156:494066:500860 [6] NCCL INFO Ring Channel 19/0 : 6[6] -> 7[7] via P2P/CUMEM 2024-12-05 12:33:32.049 n124-105-156:494060:500859 [0] NCCL INFO Ring Channel 16/0 : 0[0] -> 1[1] via P2P/CUMEM 2024-12-05 12:33:32.050 n124-105-156:494062:500862 [2] NCCL INFO Ring Channel 10/0 : 2[2] -> 3[3] via P2P/CUMEM 2024-12-05 12:33:32.050 n124-105-156:494061:500864 [1] NCCL INFO Ring Channel 18/0 : 1[1] -> 2[2] via P2P/CUMEM 2024-12-05 12:33:32.050 n124-105-156:494066:500860 [6] NCCL INFO Ring Channel 20/0 : 6[6] -> 7[7] via P2P/CUMEM 2024-12-05 12:33:32.050 n124-105-156:494060:500859 [0] NCCL INFO Ring Channel 17/0 : 0[0] -> 1[1] via P2P/CUMEM 2024-12-05 12:33:32.050 n124-105-156:494062:500862 [2] NCCL INFO Ring Channel 11/0 : 2[2] -> 3[3] via P2P/CUMEM 2024-12-05 12:33:32.050 n124-105-156:494061:500864 [1] NCCL INFO Ring Channel 19/0 : 1[1] -> 2[2] via P2P/CUMEM 2024-12-05 12:33:32.050 n124-105-156:494066:500860 [6] NCCL INFO Ring Channel 21/0 : 6[6] -> 7[7] via P2P/CUMEM 2024-12-05 12:33:32.050 n124-105-156:494065:500863 [5] NCCL INFO Ring Channel 07/0 : 5[5] -> 6[6] via P2P/CUMEM 2024-12-05 12:33:32.050 n124-105-156:494060:500859 [0] NCCL INFO Ring Channel 18/0 : 0[0] -> 1[1] via P2P/CUMEM 2024-12-05 12:33:32.050 n124-105-156:494062:500862 [2] NCCL INFO Ring Channel 12/0 : 2[2] -> 3[3] via P2P/CUMEM 2024-12-05 12:33:32.050 n124-105-156:494061:500864 [1] NCCL INFO Ring Channel 20/0 : 1[1] -> 2[2] via P2P/CUMEM 2024-12-05 12:33:32.050 n124-105-156:494065:500863 [5] NCCL INFO Ring Channel 08/0 : 5[5] -> 6[6] via P2P/CUMEM 2024-12-05 12:33:32.050 n124-105-156:494066:500860 [6] NCCL INFO Ring Channel 22/0 : 6[6] -> 7[7] via P2P/CUMEM 2024-12-05 12:33:32.051 n124-105-156:494060:500859 [0] NCCL INFO Ring Channel 19/0 : 0[0] -> 1[1] via P2P/CUMEM 2024-12-05 12:33:32.051 n124-105-156:494061:500864 [1] NCCL INFO Ring Channel 21/0 : 1[1] -> 2[2] via P2P/CUMEM 2024-12-05 12:33:32.051 n124-105-156:494062:500862 [2] NCCL INFO Ring Channel 13/0 : 2[2] -> 3[3] via P2P/CUMEM 2024-12-05 12:33:32.051 n124-105-156:494065:500863 [5] NCCL INFO Ring Channel 09/0 : 5[5] -> 6[6] via P2P/CUMEM 2024-12-05 12:33:32.051 n124-105-156:494066:500860 [6] NCCL INFO Ring Channel 23/0 : 6[6] -> 7[7] via P2P/CUMEM 2024-12-05 12:33:32.051 n124-105-156:494060:500859 [0] NCCL INFO Ring Channel 20/0 : 0[0] -> 1[1] via P2P/CUMEM 2024-12-05 12:33:32.051 n124-105-156:494061:500864 [1] NCCL INFO Ring Channel 22/0 : 1[1] -> 2[2] via P2P/CUMEM 2024-12-05 12:33:32.051 n124-105-156:494062:500862 [2] NCCL INFO Ring Channel 14/0 : 2[2] -> 3[3] via P2P/CUMEM 2024-12-05 12:33:32.051 n124-105-156:494065:500863 [5] NCCL INFO Ring Channel 10/0 : 5[5] -> 6[6] via P2P/CUMEM 2024-12-05 12:33:32.051 n124-105-156:494060:500859 [0] NCCL INFO Ring Channel 21/0 : 0[0] -> 1[1] via P2P/CUMEM 2024-12-05 12:33:32.051 n124-105-156:494061:500864 [1] NCCL INFO Ring Channel 23/0 : 1[1] -> 2[2] via P2P/CUMEM 2024-12-05 12:33:32.051 n124-105-156:494062:500862 [2] NCCL INFO Ring Channel 15/0 : 2[2] -> 3[3] via P2P/CUMEM 2024-12-05 12:33:32.051 n124-105-156:494065:500863 [5] NCCL INFO Ring Channel 11/0 : 5[5] -> 6[6] via P2P/CUMEM 2024-12-05 12:33:32.052 n124-105-156:494060:500859 [0] NCCL INFO Ring Channel 22/0 : 0[0] -> 1[1] via P2P/CUMEM 2024-12-05 12:33:32.052 n124-105-156:494067:500865 [7] NCCL INFO Ring Channel 21/0 : 7[7] -> 0[0] via P2P/CUMEM 2024-12-05 12:33:32.052 n124-105-156:494062:500862 [2] NCCL INFO Ring Channel 16/0 : 2[2] -> 3[3] via P2P/CUMEM 2024-12-05 12:33:32.052 n124-105-156:494065:500863 [5] NCCL INFO Ring Channel 12/0 : 5[5] -> 6[6] via P2P/CUMEM 2024-12-05 12:33:32.052 n124-105-156:494060:500859 [0] NCCL INFO Ring Channel 23/0 : 0[0] -> 1[1] via P2P/CUMEM 2024-12-05 12:33:32.052 n124-105-156:494067:500865 [7] NCCL INFO Ring Channel 22/0 : 7[7] -> 0[0] via P2P/CUMEM 2024-12-05 12:33:32.052 n124-105-156:494062:500862 [2] NCCL INFO Ring Channel 17/0 : 2[2] -> 3[3] via P2P/CUMEM 2024-12-05 12:33:32.052 n124-105-156:494065:500863 [5] NCCL INFO Ring Channel 13/0 : 5[5] -> 6[6] via P2P/CUMEM 2024-12-05 12:33:32.052 n124-105-156:494067:500865 [7] NCCL INFO Ring Channel 23/0 : 7[7] -> 0[0] via P2P/CUMEM 2024-12-05 12:33:32.052 n124-105-156:494062:500862 [2] NCCL INFO Ring Channel 18/0 : 2[2] -> 3[3] via P2P/CUMEM 2024-12-05 12:33:32.052 n124-105-156:494065:500863 [5] NCCL INFO Ring Channel 14/0 : 5[5] -> 6[6] via P2P/CUMEM 2024-12-05 12:33:32.053 n124-105-156:494063:500861 [3] NCCL INFO Ring Channel 03/0 : 3[3] -> 4[4] via P2P/CUMEM 2024-12-05 12:33:32.053 n124-105-156:494062:500862 [2] NCCL INFO Ring Channel 19/0 : 2[2] -> 3[3] via P2P/CUMEM 2024-12-05 12:33:32.053 n124-105-156:494065:500863 [5] NCCL INFO Ring Channel 15/0 : 5[5] -> 6[6] via P2P/CUMEM 2024-12-05 12:33:32.053 n124-105-156:494063:500861 [3] NCCL INFO Ring Channel 04/0 : 3[3] -> 4[4] via P2P/CUMEM 2024-12-05 12:33:32.053 n124-105-156:494062:500862 [2] NCCL INFO Ring Channel 20/0 : 2[2] -> 3[3] via P2P/CUMEM 2024-12-05 12:33:32.053 n124-105-156:494065:500863 [5] NCCL INFO Ring Channel 16/0 : 5[5] -> 6[6] via P2P/CUMEM 2024-12-05 12:33:32.055 n124-105-156:494063:500861 [3] NCCL INFO Ring Channel 05/0 : 3[3] -> 4[4] via P2P/CUMEM 2024-12-05 12:33:32.063 n124-105-156:494062:500862 [2] NCCL INFO Ring Channel 21/0 : 2[2] -> 3[3] via P2P/CUMEM 2024-12-05 12:33:32.080 n124-105-156:494063:500861 [3] NCCL INFO Ring Channel 06/0 : 3[3] -> 4[4] via P2P/CUMEM 2024-12-05 12:33:32.085 n124-105-156:494065:500863 [5] NCCL INFO Ring Channel 17/0 : 5[5] -> 6[6] via P2P/CUMEM 2024-12-05 12:33:32.090 n124-105-156:494062:500862 [2] NCCL INFO Ring Channel 22/0 : 2[2] -> 3[3] via P2P/CUMEM 2024-12-05 12:33:32.094 n124-105-156:494065:500863 [5] NCCL INFO Ring Channel 18/0 : 5[5] -> 6[6] via P2P/CUMEM 2024-12-05 12:33:32.102 n124-105-156:494063:500861 [3] NCCL INFO Ring Channel 07/0 : 3[3] -> 4[4] via P2P/CUMEM 2024-12-05 12:33:32.106 n124-105-156:494062:500862 [2] NCCL INFO Ring Channel 23/0 : 2[2] -> 3[3] via P2P/CUMEM 2024-12-05 12:33:32.109 n124-105-156:494063:500861 [3] NCCL INFO Ring Channel 08/0 : 3[3] -> 4[4] via P2P/CUMEM 2024-12-05 12:33:32.109 n124-105-156:494065:500863 [5] NCCL INFO Ring Channel 19/0 : 5[5] -> 6[6] via P2P/CUMEM 2024-12-05 12:33:32.111 n124-105-156:494063:500861 [3] NCCL INFO Ring Channel 09/0 : 3[3] -> 4[4] via P2P/CUMEM 2024-12-05 12:33:32.111 n124-105-156:494065:500863 [5] NCCL INFO Ring Channel 20/0 : 5[5] -> 6[6] via P2P/CUMEM 2024-12-05 12:33:32.114 n124-105-156:494063:500861 [3] NCCL INFO Ring Channel 10/0 : 3[3] -> 4[4] via P2P/CUMEM 2024-12-05 12:33:32.115 n124-105-156:494065:500863 [5] NCCL INFO Ring Channel 21/0 : 5[5] -> 6[6] via P2P/CUMEM 2024-12-05 12:33:32.116 n124-105-156:494063:500861 [3] NCCL INFO Ring Channel 11/0 : 3[3] -> 4[4] via P2P/CUMEM 2024-12-05 12:33:32.117 n124-105-156:494065:500863 [5] NCCL INFO Ring Channel 22/0 : 5[5] -> 6[6] via P2P/CUMEM 2024-12-05 12:33:32.125 n124-105-156:494063:500861 [3] NCCL INFO Ring Channel 12/0 : 3[3] -> 4[4] via P2P/CUMEM 2024-12-05 12:33:32.134 n124-105-156:494065:500863 [5] NCCL INFO Ring Channel 23/0 : 5[5] -> 6[6] via P2P/CUMEM 2024-12-05 12:33:32.134 n124-105-156:494063:500861 [3] NCCL INFO Ring Channel 13/0 : 3[3] -> 4[4] via P2P/CUMEM 2024-12-05 12:33:32.141 n124-105-156:494063:500861 [3] NCCL INFO Ring Channel 14/0 : 3[3] -> 4[4] via P2P/CUMEM 2024-12-05 12:33:32.142 n124-105-156:494063:500861 [3] NCCL INFO Ring Channel 15/0 : 3[3] -> 4[4] via P2P/CUMEM 2024-12-05 12:33:32.144 n124-105-156:494063:500861 [3] NCCL INFO Ring Channel 16/0 : 3[3] -> 4[4] via P2P/CUMEM 2024-12-05 12:33:32.146 n124-105-156:494063:500861 [3] NCCL INFO Ring Channel 17/0 : 3[3] -> 4[4] via P2P/CUMEM 2024-12-05 12:33:32.161 n124-105-156:494063:500861 [3] NCCL INFO Ring Channel 18/0 : 3[3] -> 4[4] via P2P/CUMEM 2024-12-05 12:33:32.163 n124-105-156:494063:500861 [3] NCCL INFO Ring Channel 19/0 : 3[3] -> 4[4] via P2P/CUMEM 2024-12-05 12:33:32.169 n124-105-156:494063:500861 [3] NCCL INFO Ring Channel 20/0 : 3[3] -> 4[4] via P2P/CUMEM 2024-12-05 12:33:32.177 n124-105-156:494063:500861 [3] NCCL INFO Ring Channel 21/0 : 3[3] -> 4[4] via P2P/CUMEM 2024-12-05 12:33:32.182 n124-105-156:494063:500861 [3] NCCL INFO Ring Channel 22/0 : 3[3] -> 4[4] via P2P/CUMEM 2024-12-05 12:33:32.184 n124-105-156:494063:500861 [3] NCCL INFO Ring Channel 23/0 : 3[3] -> 4[4] via P2P/CUMEM 2024-12-05 12:33:33.118 n124-105-156:494060:500859 [0] NCCL INFO Connected all rings 2024-12-05 12:33:33.141 n124-105-156:494061:500864 [1] NCCL INFO Connected all rings 2024-12-05 12:33:33.167 n124-105-156:494064:500866 [4] NCCL INFO Connected all rings 2024-12-05 12:33:33.167 n124-105-156:494063:500861 [3] NCCL INFO Connected all rings 2024-12-05 12:33:33.167 n124-105-156:494062:500862 [2] NCCL INFO Connected all rings 2024-12-05 12:33:33.177 n124-105-156:494061:500864 [1] NCCL INFO Tree Channel 00/0 : 1[1] -> 0[0] via P2P/CUMEM 2024-12-05 12:33:33.178 n124-105-156:494061:500864 [1] NCCL INFO Tree Channel 01/0 : 1[1] -> 0[0] via P2P/CUMEM 2024-12-05 12:33:33.178 n124-105-156:494061:500864 [1] NCCL INFO Tree Channel 02/0 : 1[1] -> 0[0] via P2P/CUMEM 2024-12-05 12:33:33.178 n124-105-156:494061:500864 [1] NCCL INFO Tree Channel 03/0 : 1[1] -> 0[0] via P2P/CUMEM 2024-12-05 12:33:33.181 n124-105-156:494061:500864 [1] NCCL INFO Tree Channel 04/0 : 1[1] -> 0[0] via P2P/CUMEM 2024-12-05 12:33:33.181 n124-105-156:494067:500865 [7] NCCL INFO Connected all rings 2024-12-05 12:33:33.181 n124-105-156:494067:500865 [7] NCCL INFO Tree Channel 00/0 : 7[7] -> 6[6] via P2P/CUMEM 2024-12-05 12:33:33.181 n124-105-156:494066:500860 [6] NCCL INFO Connected all rings 2024-12-05 12:33:33.181 n124-105-156:494065:500863 [5] NCCL INFO Connected all rings 2024-12-05 12:33:33.181 n124-105-156:494061:500864 [1] NCCL INFO Tree Channel 05/0 : 1[1] -> 0[0] via P2P/CUMEM 2024-12-05 12:33:33.182 n124-105-156:494061:500864 [1] NCCL INFO Tree Channel 06/0 : 1[1] -> 0[0] via P2P/CUMEM 2024-12-05 12:33:33.182 n124-105-156:494067:500865 [7] NCCL INFO Tree Channel 01/0 : 7[7] -> 6[6] via P2P/CUMEM 2024-12-05 12:33:33.183 n124-105-156:494061:500864 [1] NCCL INFO Tree Channel 07/0 : 1[1] -> 0[0] via P2P/CUMEM 2024-12-05 12:33:33.183 n124-105-156:494067:500865 [7] NCCL INFO Tree Channel 02/0 : 7[7] -> 6[6] via P2P/CUMEM 2024-12-05 12:33:33.183 n124-105-156:494061:500864 [1] NCCL INFO Tree Channel 08/0 : 1[1] -> 0[0] via P2P/CUMEM 2024-12-05 12:33:33.186 n124-105-156:494063:500861 [3] NCCL INFO Tree Channel 00/0 : 3[3] -> 2[2] via P2P/CUMEM 2024-12-05 12:33:33.186 n124-105-156:494067:500865 [7] NCCL INFO Tree Channel 03/0 : 7[7] -> 6[6] via P2P/CUMEM 2024-12-05 12:33:33.186 n124-105-156:494064:500866 [4] NCCL INFO Tree Channel 00/0 : 4[4] -> 3[3] via P2P/CUMEM 2024-12-05 12:33:33.186 n124-105-156:494063:500861 [3] NCCL INFO Tree Channel 01/0 : 3[3] -> 2[2] via P2P/CUMEM 2024-12-05 12:33:33.186 n124-105-156:494067:500865 [7] NCCL INFO Tree Channel 04/0 : 7[7] -> 6[6] via P2P/CUMEM 2024-12-05 12:33:33.186 n124-105-156:494064:500866 [4] NCCL INFO Tree Channel 01/0 : 4[4] -> 3[3] via P2P/CUMEM 2024-12-05 12:33:33.186 n124-105-156:494063:500861 [3] NCCL INFO Tree Channel 02/0 : 3[3] -> 2[2] via P2P/CUMEM 2024-12-05 12:33:33.186 n124-105-156:494067:500865 [7] NCCL INFO Tree Channel 05/0 : 7[7] -> 6[6] via P2P/CUMEM 2024-12-05 12:33:33.186 n124-105-156:494064:500866 [4] NCCL INFO Tree Channel 02/0 : 4[4] -> 3[3] via P2P/CUMEM 2024-12-05 12:33:33.186 n124-105-156:494062:500862 [2] NCCL INFO Tree Channel 00/0 : 2[2] -> 1[1] via P2P/CUMEM 2024-12-05 12:33:33.187 n124-105-156:494063:500861 [3] NCCL INFO Tree Channel 03/0 : 3[3] -> 2[2] via P2P/CUMEM 2024-12-05 12:33:33.187 n124-105-156:494067:500865 [7] NCCL INFO Tree Channel 06/0 : 7[7] -> 6[6] via P2P/CUMEM 2024-12-05 12:33:33.187 n124-105-156:494064:500866 [4] NCCL INFO Tree Channel 03/0 : 4[4] -> 3[3] via P2P/CUMEM 2024-12-05 12:33:33.187 n124-105-156:494062:500862 [2] NCCL INFO Tree Channel 01/0 : 2[2] -> 1[1] via P2P/CUMEM 2024-12-05 12:33:33.187 n124-105-156:494063:500861 [3] NCCL INFO Tree Channel 04/0 : 3[3] -> 2[2] via P2P/CUMEM 2024-12-05 12:33:33.187 n124-105-156:494067:500865 [7] NCCL INFO Tree Channel 07/0 : 7[7] -> 6[6] via P2P/CUMEM 2024-12-05 12:33:33.187 n124-105-156:494064:500866 [4] NCCL INFO Tree Channel 04/0 : 4[4] -> 3[3] via P2P/CUMEM 2024-12-05 12:33:33.187 n124-105-156:494062:500862 [2] NCCL INFO Tree Channel 02/0 : 2[2] -> 1[1] via P2P/CUMEM 2024-12-05 12:33:33.187 n124-105-156:494063:500861 [3] NCCL INFO Tree Channel 05/0 : 3[3] -> 2[2] via P2P/CUMEM 2024-12-05 12:33:33.187 n124-105-156:494067:500865 [7] NCCL INFO Tree Channel 08/0 : 7[7] -> 6[6] via P2P/CUMEM 2024-12-05 12:33:33.187 n124-105-156:494064:500866 [4] NCCL INFO Tree Channel 05/0 : 4[4] -> 3[3] via P2P/CUMEM 2024-12-05 12:33:33.187 n124-105-156:494062:500862 [2] NCCL INFO Tree Channel 03/0 : 2[2] -> 1[1] via P2P/CUMEM 2024-12-05 12:33:33.187 n124-105-156:494063:500861 [3] NCCL INFO Tree Channel 06/0 : 3[3] -> 2[2] via P2P/CUMEM 2024-12-05 12:33:33.188 n124-105-156:494067:500865 [7] NCCL INFO Tree Channel 09/0 : 7[7] -> 6[6] via P2P/CUMEM 2024-12-05 12:33:33.188 n124-105-156:494064:500866 [4] NCCL INFO Tree Channel 06/0 : 4[4] -> 3[3] via P2P/CUMEM 2024-12-05 12:33:33.188 n124-105-156:494062:500862 [2] NCCL INFO Tree Channel 04/0 : 2[2] -> 1[1] via P2P/CUMEM 2024-12-05 12:33:33.188 n124-105-156:494063:500861 [3] NCCL INFO Tree Channel 07/0 : 3[3] -> 2[2] via P2P/CUMEM 2024-12-05 12:33:33.188 n124-105-156:494067:500865 [7] NCCL INFO Tree Channel 10/0 : 7[7] -> 6[6] via P2P/CUMEM 2024-12-05 12:33:33.188 n124-105-156:494064:500866 [4] NCCL INFO Tree Channel 07/0 : 4[4] -> 3[3] via P2P/CUMEM 2024-12-05 12:33:33.188 n124-105-156:494062:500862 [2] NCCL INFO Tree Channel 05/0 : 2[2] -> 1[1] via P2P/CUMEM 2024-12-05 12:33:33.188 n124-105-156:494063:500861 [3] NCCL INFO Tree Channel 08/0 : 3[3] -> 2[2] via P2P/CUMEM 2024-12-05 12:33:33.188 n124-105-156:494067:500865 [7] NCCL INFO Tree Channel 11/0 : 7[7] -> 6[6] via P2P/CUMEM 2024-12-05 12:33:33.188 n124-105-156:494064:500866 [4] NCCL INFO Tree Channel 08/0 : 4[4] -> 3[3] via P2P/CUMEM 2024-12-05 12:33:33.189 n124-105-156:494061:500864 [1] NCCL INFO Tree Channel 09/0 : 1[1] -> 0[0] via P2P/CUMEM 2024-12-05 12:33:33.189 n124-105-156:494062:500862 [2] NCCL INFO Tree Channel 06/0 : 2[2] -> 1[1] via P2P/CUMEM 2024-12-05 12:33:33.189 n124-105-156:494063:500861 [3] NCCL INFO Tree Channel 09/0 : 3[3] -> 2[2] via P2P/CUMEM 2024-12-05 12:33:33.189 n124-105-156:494067:500865 [7] NCCL INFO Tree Channel 12/0 : 7[7] -> 6[6] via P2P/CUMEM 2024-12-05 12:33:33.189 n124-105-156:494064:500866 [4] NCCL INFO Tree Channel 09/0 : 4[4] -> 3[3] via P2P/CUMEM 2024-12-05 12:33:33.189 n124-105-156:494061:500864 [1] NCCL INFO Tree Channel 10/0 : 1[1] -> 0[0] via P2P/CUMEM 2024-12-05 12:33:33.189 n124-105-156:494062:500862 [2] NCCL INFO Tree Channel 07/0 : 2[2] -> 1[1] via P2P/CUMEM 2024-12-05 12:33:33.189 n124-105-156:494063:500861 [3] NCCL INFO Tree Channel 10/0 : 3[3] -> 2[2] via P2P/CUMEM 2024-12-05 12:33:33.189 n124-105-156:494067:500865 [7] NCCL INFO Tree Channel 13/0 : 7[7] -> 6[6] via P2P/CUMEM 2024-12-05 12:33:33.190 n124-105-156:494064:500866 [4] NCCL INFO Tree Channel 10/0 : 4[4] -> 3[3] via P2P/CUMEM 2024-12-05 12:33:33.190 n124-105-156:494061:500864 [1] NCCL INFO Tree Channel 11/0 : 1[1] -> 0[0] via P2P/CUMEM 2024-12-05 12:33:33.190 n124-105-156:494062:500862 [2] NCCL INFO Tree Channel 08/0 : 2[2] -> 1[1] via P2P/CUMEM 2024-12-05 12:33:33.190 n124-105-156:494063:500861 [3] NCCL INFO Tree Channel 11/0 : 3[3] -> 2[2] via P2P/CUMEM 2024-12-05 12:33:33.190 n124-105-156:494067:500865 [7] NCCL INFO Tree Channel 14/0 : 7[7] -> 6[6] via P2P/CUMEM 2024-12-05 12:33:33.190 n124-105-156:494064:500866 [4] NCCL INFO Tree Channel 11/0 : 4[4] -> 3[3] via P2P/CUMEM 2024-12-05 12:33:33.190 n124-105-156:494061:500864 [1] NCCL INFO Tree Channel 12/0 : 1[1] -> 0[0] via P2P/CUMEM 2024-12-05 12:33:33.190 n124-105-156:494062:500862 [2] NCCL INFO Tree Channel 09/0 : 2[2] -> 1[1] via P2P/CUMEM 2024-12-05 12:33:33.190 n124-105-156:494063:500861 [3] NCCL INFO Tree Channel 12/0 : 3[3] -> 2[2] via P2P/CUMEM 2024-12-05 12:33:33.191 n124-105-156:494067:500865 [7] NCCL INFO Tree Channel 15/0 : 7[7] -> 6[6] via P2P/CUMEM 2024-12-05 12:33:33.191 n124-105-156:494064:500866 [4] NCCL INFO Tree Channel 12/0 : 4[4] -> 3[3] via P2P/CUMEM 2024-12-05 12:33:33.191 n124-105-156:494061:500864 [1] NCCL INFO Tree Channel 13/0 : 1[1] -> 0[0] via P2P/CUMEM 2024-12-05 12:33:33.191 n124-105-156:494062:500862 [2] NCCL INFO Tree Channel 10/0 : 2[2] -> 1[1] via P2P/CUMEM 2024-12-05 12:33:33.191 n124-105-156:494063:500861 [3] NCCL INFO Tree Channel 13/0 : 3[3] -> 2[2] via P2P/CUMEM 2024-12-05 12:33:33.191 n124-105-156:494067:500865 [7] NCCL INFO Tree Channel 16/0 : 7[7] -> 6[6] via P2P/CUMEM 2024-12-05 12:33:33.191 n124-105-156:494064:500866 [4] NCCL INFO Tree Channel 13/0 : 4[4] -> 3[3] via P2P/CUMEM 2024-12-05 12:33:33.191 n124-105-156:494061:500864 [1] NCCL INFO Tree Channel 14/0 : 1[1] -> 0[0] via P2P/CUMEM 2024-12-05 12:33:33.191 n124-105-156:494062:500862 [2] NCCL INFO Tree Channel 11/0 : 2[2] -> 1[1] via P2P/CUMEM 2024-12-05 12:33:33.192 n124-105-156:494063:500861 [3] NCCL INFO Tree Channel 14/0 : 3[3] -> 2[2] via P2P/CUMEM 2024-12-05 12:33:33.192 n124-105-156:494067:500865 [7] NCCL INFO Tree Channel 17/0 : 7[7] -> 6[6] via P2P/CUMEM 2024-12-05 12:33:33.192 n124-105-156:494064:500866 [4] NCCL INFO Tree Channel 14/0 : 4[4] -> 3[3] via P2P/CUMEM 2024-12-05 12:33:33.192 n124-105-156:494061:500864 [1] NCCL INFO Tree Channel 15/0 : 1[1] -> 0[0] via P2P/CUMEM 2024-12-05 12:33:33.192 n124-105-156:494063:500861 [3] NCCL INFO Tree Channel 15/0 : 3[3] -> 2[2] via P2P/CUMEM 2024-12-05 12:33:33.192 n124-105-156:494064:500866 [4] NCCL INFO Tree Channel 15/0 : 4[4] -> 3[3] via P2P/CUMEM 2024-12-05 12:33:33.192 n124-105-156:494061:500864 [1] NCCL INFO Tree Channel 16/0 : 1[1] -> 0[0] via P2P/CUMEM 2024-12-05 12:33:33.192 n124-105-156:494063:500861 [3] NCCL INFO Tree Channel 16/0 : 3[3] -> 2[2] via P2P/CUMEM 2024-12-05 12:33:33.192 n124-105-156:494062:500862 [2] NCCL INFO Tree Channel 12/0 : 2[2] -> 1[1] via P2P/CUMEM 2024-12-05 12:33:33.193 n124-105-156:494064:500866 [4] NCCL INFO Tree Channel 16/0 : 4[4] -> 3[3] via P2P/CUMEM 2024-12-05 12:33:33.193 n124-105-156:494061:500864 [1] NCCL INFO Tree Channel 17/0 : 1[1] -> 0[0] via P2P/CUMEM 2024-12-05 12:33:33.193 n124-105-156:494063:500861 [3] NCCL INFO Tree Channel 17/0 : 3[3] -> 2[2] via P2P/CUMEM 2024-12-05 12:33:33.193 n124-105-156:494062:500862 [2] NCCL INFO Tree Channel 13/0 : 2[2] -> 1[1] via P2P/CUMEM 2024-12-05 12:33:33.193 n124-105-156:494064:500866 [4] NCCL INFO Tree Channel 17/0 : 4[4] -> 3[3] via P2P/CUMEM 2024-12-05 12:33:33.193 n124-105-156:494061:500864 [1] NCCL INFO Tree Channel 18/0 : 1[1] -> 0[0] via P2P/CUMEM 2024-12-05 12:33:33.193 n124-105-156:494063:500861 [3] NCCL INFO Tree Channel 18/0 : 3[3] -> 2[2] via P2P/CUMEM 2024-12-05 12:33:33.193 n124-105-156:494062:500862 [2] NCCL INFO Tree Channel 14/0 : 2[2] -> 1[1] via P2P/CUMEM 2024-12-05 12:33:33.194 n124-105-156:494064:500866 [4] NCCL INFO Tree Channel 18/0 : 4[4] -> 3[3] via P2P/CUMEM 2024-12-05 12:33:33.194 n124-105-156:494061:500864 [1] NCCL INFO Tree Channel 19/0 : 1[1] -> 0[0] via P2P/CUMEM 2024-12-05 12:33:33.194 n124-105-156:494063:500861 [3] NCCL INFO Tree Channel 19/0 : 3[3] -> 2[2] via P2P/CUMEM 2024-12-05 12:33:33.194 n124-105-156:494062:500862 [2] NCCL INFO Tree Channel 15/0 : 2[2] -> 1[1] via P2P/CUMEM 2024-12-05 12:33:33.194 n124-105-156:494064:500866 [4] NCCL INFO Tree Channel 19/0 : 4[4] -> 3[3] via P2P/CUMEM 2024-12-05 12:33:33.194 n124-105-156:494061:500864 [1] NCCL INFO Tree Channel 20/0 : 1[1] -> 0[0] via P2P/CUMEM 2024-12-05 12:33:33.194 n124-105-156:494063:500861 [3] NCCL INFO Tree Channel 20/0 : 3[3] -> 2[2] via P2P/CUMEM 2024-12-05 12:33:33.194 n124-105-156:494062:500862 [2] NCCL INFO Tree Channel 16/0 : 2[2] -> 1[1] via P2P/CUMEM 2024-12-05 12:33:33.194 n124-105-156:494064:500866 [4] NCCL INFO Tree Channel 20/0 : 4[4] -> 3[3] via P2P/CUMEM 2024-12-05 12:33:33.194 n124-105-156:494061:500864 [1] NCCL INFO Tree Channel 21/0 : 1[1] -> 0[0] via P2P/CUMEM 2024-12-05 12:33:33.194 n124-105-156:494063:500861 [3] NCCL INFO Tree Channel 21/0 : 3[3] -> 2[2] via P2P/CUMEM 2024-12-05 12:33:33.195 n124-105-156:494062:500862 [2] NCCL INFO Tree Channel 17/0 : 2[2] -> 1[1] via P2P/CUMEM 2024-12-05 12:33:33.195 n124-105-156:494064:500866 [4] NCCL INFO Tree Channel 21/0 : 4[4] -> 3[3] via P2P/CUMEM 2024-12-05 12:33:33.195 n124-105-156:494061:500864 [1] NCCL INFO Tree Channel 22/0 : 1[1] -> 0[0] via P2P/CUMEM 2024-12-05 12:33:33.195 n124-105-156:494063:500861 [3] NCCL INFO Tree Channel 22/0 : 3[3] -> 2[2] via P2P/CUMEM 2024-12-05 12:33:33.195 n124-105-156:494062:500862 [2] NCCL INFO Tree Channel 18/0 : 2[2] -> 1[1] via P2P/CUMEM 2024-12-05 12:33:33.195 n124-105-156:494064:500866 [4] NCCL INFO Tree Channel 22/0 : 4[4] -> 3[3] via P2P/CUMEM 2024-12-05 12:33:33.195 n124-105-156:494061:500864 [1] NCCL INFO Tree Channel 23/0 : 1[1] -> 0[0] via P2P/CUMEM 2024-12-05 12:33:33.195 n124-105-156:494063:500861 [3] NCCL INFO Tree Channel 23/0 : 3[3] -> 2[2] via P2P/CUMEM 2024-12-05 12:33:33.195 n124-105-156:494062:500862 [2] NCCL INFO Tree Channel 19/0 : 2[2] -> 1[1] via P2P/CUMEM 2024-12-05 12:33:33.196 n124-105-156:494064:500866 [4] NCCL INFO Tree Channel 23/0 : 4[4] -> 3[3] via P2P/CUMEM 2024-12-05 12:33:33.196 n124-105-156:494062:500862 [2] NCCL INFO Tree Channel 20/0 : 2[2] -> 1[1] via P2P/CUMEM 2024-12-05 12:33:33.196 n124-105-156:494062:500862 [2] NCCL INFO Tree Channel 21/0 : 2[2] -> 1[1] via P2P/CUMEM 2024-12-05 12:33:33.197 n124-105-156:494062:500862 [2] NCCL INFO Tree Channel 22/0 : 2[2] -> 1[1] via P2P/CUMEM 2024-12-05 12:33:33.197 n124-105-156:494067:500865 [7] NCCL INFO Tree Channel 18/0 : 7[7] -> 6[6] via P2P/CUMEM 2024-12-05 12:33:33.197 n124-105-156:494062:500862 [2] NCCL INFO Tree Channel 23/0 : 2[2] -> 1[1] via P2P/CUMEM 2024-12-05 12:33:33.198 n124-105-156:494067:500865 [7] NCCL INFO Tree Channel 19/0 : 7[7] -> 6[6] via P2P/CUMEM 2024-12-05 12:33:33.198 n124-105-156:494066:500860 [6] NCCL INFO Tree Channel 00/0 : 6[6] -> 5[5] via P2P/CUMEM 2024-12-05 12:33:33.199 n124-105-156:494067:500865 [7] NCCL INFO Tree Channel 20/0 : 7[7] -> 6[6] via P2P/CUMEM 2024-12-05 12:33:33.199 n124-105-156:494066:500860 [6] NCCL INFO Tree Channel 01/0 : 6[6] -> 5[5] via P2P/CUMEM 2024-12-05 12:33:33.199 n124-105-156:494067:500865 [7] NCCL INFO Tree Channel 21/0 : 7[7] -> 6[6] via P2P/CUMEM 2024-12-05 12:33:33.199 n124-105-156:494066:500860 [6] NCCL INFO Tree Channel 02/0 : 6[6] -> 5[5] via P2P/CUMEM 2024-12-05 12:33:33.199 n124-105-156:494067:500865 [7] NCCL INFO Tree Channel 22/0 : 7[7] -> 6[6] via P2P/CUMEM 2024-12-05 12:33:33.199 n124-105-156:494066:500860 [6] NCCL INFO Tree Channel 03/0 : 6[6] -> 5[5] via P2P/CUMEM 2024-12-05 12:33:33.202 n124-105-156:494067:500865 [7] NCCL INFO Tree Channel 23/0 : 7[7] -> 6[6] via P2P/CUMEM 2024-12-05 12:33:33.203 n124-105-156:494066:500860 [6] NCCL INFO Tree Channel 04/0 : 6[6] -> 5[5] via P2P/CUMEM 2024-12-05 12:33:33.206 n124-105-156:494066:500860 [6] NCCL INFO Tree Channel 05/0 : 6[6] -> 5[5] via P2P/CUMEM 2024-12-05 12:33:33.206 n124-105-156:494065:500863 [5] NCCL INFO Tree Channel 00/0 : 5[5] -> 4[4] via P2P/CUMEM 2024-12-05 12:33:33.211 n124-105-156:494066:500860 [6] NCCL INFO Tree Channel 06/0 : 6[6] -> 5[5] via P2P/CUMEM 2024-12-05 12:33:33.212 n124-105-156:494065:500863 [5] NCCL INFO Tree Channel 01/0 : 5[5] -> 4[4] via P2P/CUMEM 2024-12-05 12:33:33.212 n124-105-156:494066:500860 [6] NCCL INFO Tree Channel 07/0 : 6[6] -> 5[5] via P2P/CUMEM 2024-12-05 12:33:33.213 n124-105-156:494065:500863 [5] NCCL INFO Tree Channel 02/0 : 5[5] -> 4[4] via P2P/CUMEM 2024-12-05 12:33:33.213 n124-105-156:494066:500860 [6] NCCL INFO Tree Channel 08/0 : 6[6] -> 5[5] via P2P/CUMEM 2024-12-05 12:33:33.213 n124-105-156:494066:500860 [6] NCCL INFO Tree Channel 09/0 : 6[6] -> 5[5] via P2P/CUMEM 2024-12-05 12:33:33.213 n124-105-156:494065:500863 [5] NCCL INFO Tree Channel 03/0 : 5[5] -> 4[4] via P2P/CUMEM 2024-12-05 12:33:33.216 n124-105-156:494066:500860 [6] NCCL INFO Tree Channel 10/0 : 6[6] -> 5[5] via P2P/CUMEM 2024-12-05 12:33:33.216 n124-105-156:494065:500863 [5] NCCL INFO Tree Channel 04/0 : 5[5] -> 4[4] via P2P/CUMEM 2024-12-05 12:33:33.219 n124-105-156:494065:500863 [5] NCCL INFO Tree Channel 05/0 : 5[5] -> 4[4] via P2P/CUMEM 2024-12-05 12:33:33.221 n124-105-156:494066:500860 [6] NCCL INFO Tree Channel 11/0 : 6[6] -> 5[5] via P2P/CUMEM 2024-12-05 12:33:33.223 n124-105-156:494065:500863 [5] NCCL INFO Tree Channel 06/0 : 5[5] -> 4[4] via P2P/CUMEM 2024-12-05 12:33:33.227 n124-105-156:494066:500860 [6] NCCL INFO Tree Channel 12/0 : 6[6] -> 5[5] via P2P/CUMEM 2024-12-05 12:33:33.228 n124-105-156:494065:500863 [5] NCCL INFO Tree Channel 07/0 : 5[5] -> 4[4] via P2P/CUMEM 2024-12-05 12:33:33.228 n124-105-156:494066:500860 [6] NCCL INFO Tree Channel 13/0 : 6[6] -> 5[5] via P2P/CUMEM 2024-12-05 12:33:33.231 n124-105-156:494065:500863 [5] NCCL INFO Tree Channel 08/0 : 5[5] -> 4[4] via P2P/CUMEM 2024-12-05 12:33:33.232 n124-105-156:494066:500860 [6] NCCL INFO Tree Channel 14/0 : 6[6] -> 5[5] via P2P/CUMEM 2024-12-05 12:33:33.232 n124-105-156:494065:500863 [5] NCCL INFO Tree Channel 09/0 : 5[5] -> 4[4] via P2P/CUMEM 2024-12-05 12:33:33.238 n124-105-156:494066:500860 [6] NCCL INFO Tree Channel 15/0 : 6[6] -> 5[5] via P2P/CUMEM 2024-12-05 12:33:33.242 n124-105-156:494065:500863 [5] NCCL INFO Tree Channel 10/0 : 5[5] -> 4[4] via P2P/CUMEM 2024-12-05 12:33:33.243 n124-105-156:494066:500860 [6] NCCL INFO Tree Channel 16/0 : 6[6] -> 5[5] via P2P/CUMEM 2024-12-05 12:33:33.244 n124-105-156:494066:500860 [6] NCCL INFO Tree Channel 17/0 : 6[6] -> 5[5] via P2P/CUMEM 2024-12-05 12:33:33.244 n124-105-156:494065:500863 [5] NCCL INFO Tree Channel 11/0 : 5[5] -> 4[4] via P2P/CUMEM 2024-12-05 12:33:33.246 n124-105-156:494066:500860 [6] NCCL INFO Tree Channel 18/0 : 6[6] -> 5[5] via P2P/CUMEM 2024-12-05 12:33:33.246 n124-105-156:494065:500863 [5] NCCL INFO Tree Channel 12/0 : 5[5] -> 4[4] via P2P/CUMEM 2024-12-05 12:33:33.247 n124-105-156:494066:500860 [6] NCCL INFO Tree Channel 19/0 : 6[6] -> 5[5] via P2P/CUMEM 2024-12-05 12:33:33.248 n124-105-156:494065:500863 [5] NCCL INFO Tree Channel 13/0 : 5[5] -> 4[4] via P2P/CUMEM 2024-12-05 12:33:33.249 n124-105-156:494066:500860 [6] NCCL INFO Tree Channel 20/0 : 6[6] -> 5[5] via P2P/CUMEM 2024-12-05 12:33:33.250 n124-105-156:494065:500863 [5] NCCL INFO Tree Channel 14/0 : 5[5] -> 4[4] via P2P/CUMEM 2024-12-05 12:33:33.250 n124-105-156:494066:500860 [6] NCCL INFO Tree Channel 21/0 : 6[6] -> 5[5] via P2P/CUMEM 2024-12-05 12:33:33.250 n124-105-156:494065:500863 [5] NCCL INFO Tree Channel 15/0 : 5[5] -> 4[4] via P2P/CUMEM 2024-12-05 12:33:33.251 n124-105-156:494066:500860 [6] NCCL INFO Tree Channel 22/0 : 6[6] -> 5[5] via P2P/CUMEM 2024-12-05 12:33:33.261 n124-105-156:494065:500863 [5] NCCL INFO Tree Channel 16/0 : 5[5] -> 4[4] via P2P/CUMEM 2024-12-05 12:33:33.262 n124-105-156:494066:500860 [6] NCCL INFO Tree Channel 23/0 : 6[6] -> 5[5] via P2P/CUMEM 2024-12-05 12:33:33.264 n124-105-156:494065:500863 [5] NCCL INFO Tree Channel 17/0 : 5[5] -> 4[4] via P2P/CUMEM 2024-12-05 12:33:33.270 n124-105-156:494065:500863 [5] NCCL INFO Tree Channel 18/0 : 5[5] -> 4[4] via P2P/CUMEM 2024-12-05 12:33:33.272 n124-105-156:494065:500863 [5] NCCL INFO Tree Channel 19/0 : 5[5] -> 4[4] via P2P/CUMEM 2024-12-05 12:33:33.275 n124-105-156:494065:500863 [5] NCCL INFO Tree Channel 20/0 : 5[5] -> 4[4] via P2P/CUMEM 2024-12-05 12:33:33.280 n124-105-156:494065:500863 [5] NCCL INFO Tree Channel 21/0 : 5[5] -> 4[4] via P2P/CUMEM 2024-12-05 12:33:33.284 n124-105-156:494065:500863 [5] NCCL INFO Tree Channel 22/0 : 5[5] -> 4[4] via P2P/CUMEM 2024-12-05 12:33:33.289 n124-105-156:494065:500863 [5] NCCL INFO Tree Channel 23/0 : 5[5] -> 4[4] via P2P/CUMEM 2024-12-05 12:33:34.042 n124-105-156:494060:500859 [0] NCCL INFO Connected all trees 2024-12-05 12:33:34.042 n124-105-156:494060:500859 [0] NCCL INFO NCCL_PROTO set by environment to Simple 2024-12-05 12:33:34.042 n124-105-156:494060:500859 [0] NCCL INFO NCCL_ALGO set by environment to Ring,Tree 2024-12-05 12:33:34.042 n124-105-156:494060:500859 [0] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 2024-12-05 12:33:34.042 n124-105-156:494060:500859 [0] NCCL INFO 24 coll channels, 24 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer 2024-12-05 12:33:34.228 n124-105-156:494061:500864 [1] NCCL INFO Connected all trees 2024-12-05 12:33:34.228 n124-105-156:494061:500864 [1] NCCL INFO NCCL_PROTO set by environment to Simple 2024-12-05 12:33:34.228 n124-105-156:494061:500864 [1] NCCL INFO NCCL_ALGO set by environment to Ring,Tree 2024-12-05 12:33:34.228 n124-105-156:494061:500864 [1] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 2024-12-05 12:33:34.228 n124-105-156:494061:500864 [1] NCCL INFO 24 coll channels, 24 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer 2024-12-05 12:33:34.260 n124-105-156:494062:500862 [2] NCCL INFO Connected all trees 2024-12-05 12:33:34.260 n124-105-156:494062:500862 [2] NCCL INFO NCCL_PROTO set by environment to Simple 2024-12-05 12:33:34.260 n124-105-156:494062:500862 [2] NCCL INFO NCCL_ALGO set by environment to Ring,Tree 2024-12-05 12:33:34.260 n124-105-156:494062:500862 [2] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 2024-12-05 12:33:34.260 n124-105-156:494062:500862 [2] NCCL INFO 24 coll channels, 24 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer 2024-12-05 12:33:34.358 n124-105-156:494063:500861 [3] NCCL INFO Connected all trees 2024-12-05 12:33:34.358 n124-105-156:494063:500861 [3] NCCL INFO NCCL_PROTO set by environment to Simple 2024-12-05 12:33:34.358 n124-105-156:494063:500861 [3] NCCL INFO NCCL_ALGO set by environment to Ring,Tree 2024-12-05 12:33:34.358 n124-105-156:494063:500861 [3] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 2024-12-05 12:33:34.358 n124-105-156:494063:500861 [3] NCCL INFO 24 coll channels, 24 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer 2024-12-05 12:33:34.364 n124-105-156:494067:500865 [7] NCCL INFO Connected all trees 2024-12-05 12:33:34.364 n124-105-156:494067:500865 [7] NCCL INFO NCCL_PROTO set by environment to Simple 2024-12-05 12:33:34.364 n124-105-156:494067:500865 [7] NCCL INFO NCCL_ALGO set by environment to Ring,Tree 2024-12-05 12:33:34.365 n124-105-156:494067:500865 [7] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 2024-12-05 12:33:34.365 n124-105-156:494067:500865 [7] NCCL INFO 24 coll channels, 24 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer 2024-12-05 12:33:34.368 n124-105-156:494064:500866 [4] NCCL INFO Connected all trees 2024-12-05 12:33:34.368 n124-105-156:494064:500866 [4] NCCL INFO NCCL_PROTO set by environment to Simple 2024-12-05 12:33:34.368 n124-105-156:494064:500866 [4] NCCL INFO NCCL_ALGO set by environment to Ring,Tree 2024-12-05 12:33:34.368 n124-105-156:494064:500866 [4] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 2024-12-05 12:33:34.368 n124-105-156:494064:500866 [4] NCCL INFO 24 coll channels, 24 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer 2024-12-05 12:33:34.368 n124-105-156:494066:500860 [6] NCCL INFO Connected all trees 2024-12-05 12:33:34.368 n124-105-156:494065:500863 [5] NCCL INFO Connected all trees 2024-12-05 12:33:34.368 n124-105-156:494065:500863 [5] NCCL INFO NCCL_PROTO set by environment to Simple 2024-12-05 12:33:34.368 n124-105-156:494066:500860 [6] NCCL INFO NCCL_PROTO set by environment to Simple 2024-12-05 12:33:34.368 n124-105-156:494066:500860 [6] NCCL INFO NCCL_ALGO set by environment to Ring,Tree 2024-12-05 12:33:34.368 n124-105-156:494065:500863 [5] NCCL INFO NCCL_ALGO set by environment to Ring,Tree 2024-12-05 12:33:34.368 n124-105-156:494066:500860 [6] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 2024-12-05 12:33:34.368 n124-105-156:494065:500863 [5] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 2024-12-05 12:33:34.368 n124-105-156:494066:500860 [6] NCCL INFO 24 coll channels, 24 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer 2024-12-05 12:33:34.368 n124-105-156:494065:500863 [5] NCCL INFO 24 coll channels, 24 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer 2024-12-05 12:33:34.397 n124-105-156:494065:500863 [5] NCCL INFO ncclCommInitRank comm 0x7f70d80807b0 rank 5 nranks 8 cudaDev 5 nvmlDev 5 busId 85000 commId 0xc74f8d475a139f5f - Init COMPLETE 2024-12-05 12:33:34.397 n124-105-156:494067:500865 [7] NCCL INFO ncclCommInitRank comm 0x7f36700807b0 rank 7 nranks 8 cudaDev 7 nvmlDev 7 busId 8c000 commId 0xc74f8d475a139f5f - Init COMPLETE 2024-12-05 12:33:34.397 n124-105-156:494063:500861 [3] NCCL INFO ncclCommInitRank comm 0x7fd2f00807b0 rank 3 nranks 8 cudaDev 3 nvmlDev 3 busId c000 commId 0xc74f8d475a139f5f - Init COMPLETE 2024-12-05 12:33:34.397 n124-105-156:494060:500859 [0] NCCL INFO ncclCommInitRank comm 0x7f4c58080ff0 rank 0 nranks 8 cudaDev 0 nvmlDev 0 busId 4000 commId 0xc74f8d475a139f5f - Init COMPLETE 2024-12-05 12:33:34.397 n124-105-156:494066:500860 [6] NCCL INFO ncclCommInitRank comm 0x7f32d00807f0 rank 6 nranks 8 cudaDev 6 nvmlDev 6 busId 8b000 commId 0xc74f8d475a139f5f - Init COMPLETE 2024-12-05 12:33:34.397 n124-105-156:494064:500866 [4] NCCL INFO ncclCommInitRank comm 0x7fac083441b0 rank 4 nranks 8 cudaDev 4 nvmlDev 4 busId 84000 commId 0xc74f8d475a139f5f - Init COMPLETE 2024-12-05 12:33:34.397 n124-105-156:494062:500862 [2] NCCL INFO ncclCommInitRank comm 0x7fca94080730 rank 2 nranks 8 cudaDev 2 nvmlDev 2 busId b000 commId 0xc74f8d475a139f5f - Init COMPLETE 2024-12-05 12:33:34.397 n124-105-156:494061:500864 [1] NCCL INFO ncclCommInitRank comm 0x7fae440807b0 rank 1 nranks 8 cudaDev 1 nvmlDev 1 busId 5000 commId 0xc74f8d475a139f5f - Init COMPLETE [I ProcessGroupNCCL.cpp:1805] NCCL_DEBUG: INFO 0%| | 1/12313 [00:12<43:30:24, 12.72s/it] {'loss': 1.0675, 'grad_norm': 6.562383818863084, 'learning_rate': 1.3513513513513516e-08, 'epoch': 0.0} 0%| | 1/12313 [00:12<43:30:24, 12.72s/it] 0%| | 2/12313 [00:16<25:00:18, 7.31s/it] {'loss': 1.3477, 'grad_norm': 8.277066733441483, 'learning_rate': 2.702702702702703e-08, 'epoch': 0.0} 0%| | 2/12313 [00:16<25:00:18, 7.31s/it] 0%| | 3/12313 [00:19<18:00:49, 5.27s/it] {'loss': 0.982, 'grad_norm': 5.566945299122935, 'learning_rate': 4.0540540540540545e-08, 'epoch': 0.0} 0%| | 3/12313 [00:19<18:00:49, 5.27s/it] 0%| | 4/12313 [00:21<14:28:01, 4.23s/it] {'loss': 1.1009, 'grad_norm': 20.123631677375503, 'learning_rate': 5.405405405405406e-08, 'epoch': 0.0} 0%| | 4/12313 [00:21<14:28:01, 4.23s/it] 0%| | 5/12313 [00:24<13:08:59, 3.85s/it] {'loss': 1.1767, 'grad_norm': 7.516929368064332, 'learning_rate': 6.756756756756757e-08, 'epoch': 0.0} 0%| | 5/12313 [00:24<13:08:59, 3.85s/it] 0%| | 6/12313 [00:27<11:33:55, 3.38s/it] {'loss': 0.9279, 'grad_norm': 6.534930634637095, 'learning_rate': 8.108108108108109e-08, 'epoch': 0.0} 0%| | 6/12313 [00:27<11:33:55, 3.38s/it] 0%| | 7/12313 [00:30<11:12:47, 3.28s/it] {'loss': 1.1229, 'grad_norm': 8.547137659048731, 'learning_rate': 9.459459459459461e-08, 'epoch': 0.0} 0%| | 7/12313 [00:30<11:12:47, 3.28s/it] 0%| | 8/12313 [00:33<10:31:17, 3.08s/it] {'loss': 0.9784, 'grad_norm': 8.40063081083683, 'learning_rate': 1.0810810810810812e-07, 'epoch': 0.0} 0%| | 8/12313 [00:33<10:31:17, 3.08s/it] 0%| | 9/12313 [00:35<10:12:19, 2.99s/it] {'loss': 1.007, 'grad_norm': 7.113188011014463, 'learning_rate': 1.2162162162162163e-07, 'epoch': 0.0} 0%| | 9/12313 [00:35<10:12:19, 2.99s/it] 0%| | 10/12313 [00:38<9:39:49, 2.83s/it] {'loss': 1.0264, 'grad_norm': 9.511299992179831, 'learning_rate': 1.3513513513513515e-07, 'epoch': 0.0} 0%| | 10/12313 [00:38<9:39:49, 2.83s/it] 0%| | 11/12313 [00:40<9:20:17, 2.73s/it] {'loss': 1.1152, 'grad_norm': 6.234366333859408, 'learning_rate': 1.4864864864864866e-07, 'epoch': 0.0} 0%| | 11/12313 [00:40<9:20:17, 2.73s/it][2024-12-05 12:34:08,806] [WARNING] [stage3.py:1949:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time 0%| | 12/12313 [00:44<10:42:24, 3.13s/it] {'loss': 0.9787, 'grad_norm': 8.300258843057527, 'learning_rate': 1.6216216216216218e-07, 'epoch': 0.0} 0%| | 12/12313 [00:44<10:42:24, 3.13s/it] 0%| | 13/12313 [00:47<10:00:31, 2.93s/it] {'loss': 1.0764, 'grad_norm': 6.394938331200344, 'learning_rate': 1.756756756756757e-07, 'epoch': 0.0} 0%| | 13/12313 [00:47<10:00:31, 2.93s/it] 0%| | 14/12313 [00:50<9:46:20, 2.86s/it] {'loss': 1.0187, 'grad_norm': 5.924100005331387, 'learning_rate': 1.8918918918918921e-07, 'epoch': 0.0} 0%| | 14/12313 [00:50<9:46:20, 2.86s/it] 0%| | 15/12313 [00:52<9:30:05, 2.78s/it] {'loss': 1.3016, 'grad_norm': 7.076697509559219, 'learning_rate': 2.0270270270270273e-07, 'epoch': 0.0} 0%| | 15/12313 [00:52<9:30:05, 2.78s/it] 0%| | 16/12313 [00:55<9:17:42, 2.72s/it] {'loss': 1.1151, 'grad_norm': 7.949795337583316, 'learning_rate': 2.1621621621621625e-07, 'epoch': 0.0} 0%| | 16/12313 [00:55<9:17:42, 2.72s/it] 0%| | 17/12313 [00:57<9:13:19, 2.70s/it] {'loss': 1.0339, 'grad_norm': 7.324802587536972, 'learning_rate': 2.2972972972972977e-07, 'epoch': 0.0} 0%| | 17/12313 [00:57<9:13:19, 2.70s/it] 0%| | 18/12313 [01:00<9:19:52, 2.73s/it] {'loss': 1.0653, 'grad_norm': 7.644757636442122, 'learning_rate': 2.4324324324324326e-07, 'epoch': 0.0} 0%| | 18/12313 [01:00<9:19:52, 2.73s/it] 0%| | 19/12313 [01:03<9:02:46, 2.65s/it] {'loss': 1.2616, 'grad_norm': 19.41567123436543, 'learning_rate': 2.567567567567568e-07, 'epoch': 0.0} 0%| | 19/12313 [01:03<9:02:46, 2.65s/it] 0%| | 20/12313 [01:05<8:57:14, 2.62s/it] {'loss': 1.0637, 'grad_norm': 7.096406348798535, 'learning_rate': 2.702702702702703e-07, 'epoch': 0.0} 0%| | 20/12313 [01:05<8:57:14, 2.62s/it] 0%| | 21/12313 [01:08<8:49:17, 2.58s/it] {'loss': 1.185, 'grad_norm': 9.633919383798274, 'learning_rate': 2.837837837837838e-07, 'epoch': 0.0} 0%| | 21/12313 [01:08<8:49:17, 2.58s/it] 0%| | 22/12313 [01:10<9:00:06, 2.64s/it] {'loss': 1.0474, 'grad_norm': 10.180555284457757, 'learning_rate': 2.972972972972973e-07, 'epoch': 0.0} 0%| | 22/12313 [01:10<9:00:06, 2.64s/it] 0%| | 23/12313 [01:13<9:10:35, 2.69s/it] {'loss': 0.9087, 'grad_norm': 8.634935943189125, 'learning_rate': 3.1081081081081084e-07, 'epoch': 0.0} 0%| | 23/12313 [01:13<9:10:35, 2.69s/it] 0%| | 24/12313 [01:16<9:38:11, 2.82s/it] {'loss': 1.2098, 'grad_norm': 7.712256121372059, 'learning_rate': 3.2432432432432436e-07, 'epoch': 0.0} 0%| | 24/12313 [01:16<9:38:11, 2.82s/it] 0%| | 25/12313 [01:19<9:22:33, 2.75s/it] {'loss': 0.9682, 'grad_norm': 7.212382477186713, 'learning_rate': 3.378378378378379e-07, 'epoch': 0.0} 0%| | 25/12313 [01:19<9:22:33, 2.75s/it] 0%| | 26/12313 [01:22<9:32:48, 2.80s/it] {'loss': 0.9742, 'grad_norm': 8.965382654928913, 'learning_rate': 3.513513513513514e-07, 'epoch': 0.0} 0%| | 26/12313 [01:22<9:32:48, 2.80s/it] 0%| | 27/12313 [01:25<9:22:51, 2.75s/it] {'loss': 0.9463, 'grad_norm': 9.333155078594814, 'learning_rate': 3.648648648648649e-07, 'epoch': 0.0} 0%| | 27/12313 [01:25<9:22:51, 2.75s/it] 0%| | 28/12313 [01:27<9:31:42, 2.79s/it] {'loss': 0.9216, 'grad_norm': 5.770329415577018, 'learning_rate': 3.7837837837837843e-07, 'epoch': 0.0} 0%| | 28/12313 [01:27<9:31:42, 2.79s/it] 0%| | 29/12313 [01:30<9:32:24, 2.80s/it] {'loss': 0.9299, 'grad_norm': 7.19909286290371, 'learning_rate': 3.9189189189189195e-07, 'epoch': 0.0} 0%| | 29/12313 [01:30<9:32:24, 2.80s/it] 0%| | 30/12313 [01:33<9:10:56, 2.69s/it] {'loss': 1.2135, 'grad_norm': 8.198959079831749, 'learning_rate': 4.0540540540540546e-07, 'epoch': 0.0} 0%| | 30/12313 [01:33<9:10:56, 2.69s/it] 0%| | 31/12313 [01:36<9:38:05, 2.82s/it] {'loss': 1.162, 'grad_norm': 7.469329596056761, 'learning_rate': 4.18918918918919e-07, 'epoch': 0.0} 0%| | 31/12313 [01:36<9:38:05, 2.82s/it] 0%| | 32/12313 [01:39<9:32:41, 2.80s/it] {'loss': 1.0726, 'grad_norm': 6.064802318250161, 'learning_rate': 4.324324324324325e-07, 'epoch': 0.0} 0%| | 32/12313 [01:39<9:32:41, 2.80s/it] 0%| | 33/12313 [01:41<9:30:28, 2.79s/it] {'loss': 0.9132, 'grad_norm': 7.561733885808792, 'learning_rate': 4.45945945945946e-07, 'epoch': 0.0} 0%| | 33/12313 [01:41<9:30:28, 2.79s/it] 0%| | 34/12313 [01:44<9:37:44, 2.82s/it] {'loss': 0.8835, 'grad_norm': 5.510848606396675, 'learning_rate': 4.5945945945945953e-07, 'epoch': 0.0} 0%| | 34/12313 [01:44<9:37:44, 2.82s/it] 0%| | 35/12313 [01:47<9:32:58, 2.80s/it] {'loss': 1.025, 'grad_norm': 5.05147854422008, 'learning_rate': 4.7297297297297305e-07, 'epoch': 0.0} 0%| | 35/12313 [01:47<9:32:58, 2.80s/it] 0%| | 36/12313 [01:50<9:47:59, 2.87s/it] {'loss': 1.0634, 'grad_norm': 7.449384043723888, 'learning_rate': 4.864864864864865e-07, 'epoch': 0.0} 0%| | 36/12313 [01:50<9:47:59, 2.87s/it] 0%| | 37/12313 [01:53<9:36:15, 2.82s/it] {'loss': 0.9037, 'grad_norm': 5.554970539302449, 'learning_rate': 5.000000000000001e-07, 'epoch': 0.0} 0%| | 37/12313 [01:53<9:36:15, 2.82s/it] 0%| | 38/12313 [01:55<9:13:08, 2.70s/it] {'loss': 1.0197, 'grad_norm': 5.648676923146146, 'learning_rate': 5.135135135135135e-07, 'epoch': 0.0} 0%| | 38/12313 [01:55<9:13:08, 2.70s/it] 0%| | 39/12313 [01:58<8:59:16, 2.64s/it] {'loss': 0.9723, 'grad_norm': 5.581517917957071, 'learning_rate': 5.270270270270271e-07, 'epoch': 0.0} 0%| | 39/12313 [01:58<8:59:16, 2.64s/it] 0%| | 40/12313 [02:00<9:08:48, 2.68s/it] {'loss': 1.0647, 'grad_norm': 6.811513573858421, 'learning_rate': 5.405405405405406e-07, 'epoch': 0.0} 0%| | 40/12313 [02:00<9:08:48, 2.68s/it] 0%| | 41/12313 [02:03<9:14:36, 2.71s/it] {'loss': 0.9535, 'grad_norm': 13.02023304402827, 'learning_rate': 5.540540540540542e-07, 'epoch': 0.0} 0%| | 41/12313 [02:03<9:14:36, 2.71s/it] 0%| | 42/12313 [02:06<9:11:18, 2.70s/it] {'loss': 0.9207, 'grad_norm': 5.903154525353516, 'learning_rate': 5.675675675675676e-07, 'epoch': 0.0} 0%| | 42/12313 [02:06<9:11:18, 2.70s/it] 0%| | 43/12313 [02:09<9:43:18, 2.85s/it] {'loss': 0.901, 'grad_norm': 5.713506629308762, 'learning_rate': 5.810810810810812e-07, 'epoch': 0.0} 0%| | 43/12313 [02:09<9:43:18, 2.85s/it] 0%| | 44/12313 [02:12<9:24:36, 2.76s/it] {'loss': 0.8387, 'grad_norm': 7.568618399596037, 'learning_rate': 5.945945945945947e-07, 'epoch': 0.0} 0%| | 44/12313 [02:12<9:24:36, 2.76s/it] 0%| | 45/12313 [02:15<9:33:47, 2.81s/it] {'loss': 0.9369, 'grad_norm': 4.684278865041181, 'learning_rate': 6.081081081081082e-07, 'epoch': 0.0} 0%| | 45/12313 [02:15<9:33:47, 2.81s/it] 0%| | 46/12313 [02:17<9:19:37, 2.74s/it] {'loss': 0.9138, 'grad_norm': 4.719765340896398, 'learning_rate': 6.216216216216217e-07, 'epoch': 0.0} 0%| | 46/12313 [02:17<9:19:37, 2.74s/it] 0%| | 47/12313 [02:20<9:15:14, 2.72s/it] {'loss': 0.8665, 'grad_norm': 6.215441613276706, 'learning_rate': 6.351351351351353e-07, 'epoch': 0.0} 0%| | 47/12313 [02:20<9:15:14, 2.72s/it] 0%| | 48/12313 [02:22<8:57:10, 2.63s/it] {'loss': 0.8277, 'grad_norm': 5.80882937072084, 'learning_rate': 6.486486486486487e-07, 'epoch': 0.0} 0%| | 48/12313 [02:22<8:57:10, 2.63s/it] 0%| | 49/12313 [02:25<8:57:18, 2.63s/it] {'loss': 0.9194, 'grad_norm': 4.4336706590480786, 'learning_rate': 6.621621621621623e-07, 'epoch': 0.0} 0%| | 49/12313 [02:25<8:57:18, 2.63s/it] 0%| | 50/12313 [02:28<9:04:39, 2.66s/it] {'loss': 0.997, 'grad_norm': 6.287882016956921, 'learning_rate': 6.756756756756758e-07, 'epoch': 0.0} 0%| | 50/12313 [02:28<9:04:39, 2.66s/it] 0%| | 51/12313 [02:30<8:55:40, 2.62s/it] {'loss': 0.9637, 'grad_norm': 5.639526953982635, 'learning_rate': 6.891891891891893e-07, 'epoch': 0.0} 0%| | 51/12313 [02:30<8:55:40, 2.62s/it] 0%| | 52/12313 [02:33<8:50:28, 2.60s/it] {'loss': 0.8232, 'grad_norm': 5.51310604984265, 'learning_rate': 7.027027027027028e-07, 'epoch': 0.0} 0%| | 52/12313 [02:33<8:50:28, 2.60s/it] 0%| | 53/12313 [02:35<9:02:15, 2.65s/it] {'loss': 0.8983, 'grad_norm': 4.517639703226352, 'learning_rate': 7.162162162162164e-07, 'epoch': 0.0} 0%| | 53/12313 [02:35<9:02:15, 2.65s/it] 0%| | 54/12313 [02:38<9:18:41, 2.73s/it] {'loss': 0.7985, 'grad_norm': 6.666343453939387, 'learning_rate': 7.297297297297298e-07, 'epoch': 0.0} 0%| | 54/12313 [02:38<9:18:41, 2.73s/it] 0%| | 55/12313 [02:41<9:22:07, 2.75s/it] {'loss': 0.9058, 'grad_norm': 5.422191276786231, 'learning_rate': 7.432432432432434e-07, 'epoch': 0.0} 0%| | 55/12313 [02:41<9:22:07, 2.75s/it] 0%| | 56/12313 [02:44<9:19:45, 2.74s/it] {'loss': 0.7943, 'grad_norm': 4.624429643253304, 'learning_rate': 7.567567567567569e-07, 'epoch': 0.0} 0%| | 56/12313 [02:44<9:19:45, 2.74s/it] 0%| | 57/12313 [02:48<10:16:59, 3.02s/it] {'loss': 0.8046, 'grad_norm': 4.391972463636353, 'learning_rate': 7.702702702702704e-07, 'epoch': 0.0} 0%| | 57/12313 [02:48<10:16:59, 3.02s/it] 0%| | 58/12313 [02:50<10:08:33, 2.98s/it] {'loss': 0.96, 'grad_norm': 7.0317366536589745, 'learning_rate': 7.837837837837839e-07, 'epoch': 0.0} 0%| | 58/12313 [02:50<10:08:33, 2.98s/it] 0%| | 59/12313 [02:53<9:56:58, 2.92s/it] {'loss': 0.8267, 'grad_norm': 6.253750275868285, 'learning_rate': 7.972972972972974e-07, 'epoch': 0.0} 0%| | 59/12313 [02:53<9:56:58, 2.92s/it] 0%| | 60/12313 [02:56<9:44:49, 2.86s/it] {'loss': 0.7541, 'grad_norm': 4.2004573670066545, 'learning_rate': 8.108108108108109e-07, 'epoch': 0.0} 0%| | 60/12313 [02:56<9:44:49, 2.86s/it] 0%| | 61/12313 [02:58<9:14:40, 2.72s/it] {'loss': 0.8652, 'grad_norm': 4.843922828461304, 'learning_rate': 8.243243243243244e-07, 'epoch': 0.0} 0%| | 61/12313 [02:58<9:14:40, 2.72s/it] 1%| | 62/12313 [03:01<9:15:39, 2.72s/it] {'loss': 0.9765, 'grad_norm': 6.663532165513732, 'learning_rate': 8.37837837837838e-07, 'epoch': 0.01} 1%| | 62/12313 [03:01<9:15:39, 2.72s/it] 1%| | 63/12313 [03:04<9:08:28, 2.69s/it] {'loss': 0.8343, 'grad_norm': 5.518475179846077, 'learning_rate': 8.513513513513514e-07, 'epoch': 0.01} 1%| | 63/12313 [03:04<9:08:28, 2.69s/it] 1%| | 64/12313 [03:06<9:13:20, 2.71s/it] {'loss': 0.8835, 'grad_norm': 4.16743993524973, 'learning_rate': 8.64864864864865e-07, 'epoch': 0.01} 1%| | 64/12313 [03:06<9:13:20, 2.71s/it] 1%| | 65/12313 [03:09<9:13:54, 2.71s/it] {'loss': 0.8031, 'grad_norm': 4.724546088137992, 'learning_rate': 8.783783783783785e-07, 'epoch': 0.01} 1%| | 65/12313 [03:09<9:13:54, 2.71s/it] 1%| | 66/12313 [03:12<9:09:49, 2.69s/it] {'loss': 0.8464, 'grad_norm': 5.965079554752788, 'learning_rate': 8.91891891891892e-07, 'epoch': 0.01} 1%| | 66/12313 [03:12<9:09:49, 2.69s/it] 1%| | 67/12313 [03:14<9:09:22, 2.69s/it] {'loss': 0.9254, 'grad_norm': 7.388225521826432, 'learning_rate': 9.054054054054055e-07, 'epoch': 0.01} 1%| | 67/12313 [03:14<9:09:22, 2.69s/it] 1%| | 68/12313 [03:17<9:05:03, 2.67s/it] {'loss': 0.7852, 'grad_norm': 4.486467374218727, 'learning_rate': 9.189189189189191e-07, 'epoch': 0.01} 1%| | 68/12313 [03:17<9:05:03, 2.67s/it] 1%| | 69/12313 [03:20<9:05:17, 2.67s/it] {'loss': 0.6409, 'grad_norm': 4.818486349074325, 'learning_rate': 9.324324324324325e-07, 'epoch': 0.01} 1%| | 69/12313 [03:20<9:05:17, 2.67s/it] 1%| | 70/12313 [03:23<9:16:19, 2.73s/it] {'loss': 0.6555, 'grad_norm': 12.591926984529453, 'learning_rate': 9.459459459459461e-07, 'epoch': 0.01} 1%| | 70/12313 [03:23<9:16:19, 2.73s/it] 1%| | 71/12313 [03:25<9:21:24, 2.75s/it] {'loss': 0.7594, 'grad_norm': 3.582506346281597, 'learning_rate': 9.594594594594596e-07, 'epoch': 0.01} 1%| | 71/12313 [03:25<9:21:24, 2.75s/it] 1%| | 72/12313 [03:28<9:13:10, 2.71s/it] {'loss': 0.8258, 'grad_norm': 6.316272242206502, 'learning_rate': 9.72972972972973e-07, 'epoch': 0.01} 1%| | 72/12313 [03:28<9:13:10, 2.71s/it] 1%| | 73/12313 [03:31<9:15:34, 2.72s/it] {'loss': 0.7343, 'grad_norm': 7.768203703731759, 'learning_rate': 9.864864864864867e-07, 'epoch': 0.01} 1%| | 73/12313 [03:31<9:15:34, 2.72s/it] 1%| | 74/12313 [03:34<9:23:19, 2.76s/it] {'loss': 0.8984, 'grad_norm': 4.411865830101082, 'learning_rate': 1.0000000000000002e-06, 'epoch': 0.01} 1%| | 74/12313 [03:34<9:23:19, 2.76s/it] 1%| | 75/12313 [03:36<9:20:44, 2.75s/it] {'loss': 0.8646, 'grad_norm': 6.221797107498061, 'learning_rate': 1.0135135135135136e-06, 'epoch': 0.01} 1%| | 75/12313 [03:36<9:20:44, 2.75s/it] 1%| | 76/12313 [03:39<9:15:10, 2.72s/it] {'loss': 0.7082, 'grad_norm': 4.668292899364404, 'learning_rate': 1.027027027027027e-06, 'epoch': 0.01} 1%| | 76/12313 [03:39<9:15:10, 2.72s/it] 1%| | 77/12313 [03:42<9:32:20, 2.81s/it] {'loss': 0.7584, 'grad_norm': 5.307505469950394, 'learning_rate': 1.0405405405405408e-06, 'epoch': 0.01} 1%| | 77/12313 [03:42<9:32:20, 2.81s/it] 1%| | 78/12313 [03:45<9:12:46, 2.71s/it] {'loss': 0.8715, 'grad_norm': 5.411978113961716, 'learning_rate': 1.0540540540540542e-06, 'epoch': 0.01} 1%| | 78/12313 [03:45<9:12:46, 2.71s/it] 1%| | 79/12313 [03:47<9:11:47, 2.71s/it] {'loss': 0.8187, 'grad_norm': 11.725200388824204, 'learning_rate': 1.0675675675675677e-06, 'epoch': 0.01} 1%| | 79/12313 [03:47<9:11:47, 2.71s/it] 1%| | 80/12313 [03:50<9:26:03, 2.78s/it] {'loss': 0.7657, 'grad_norm': 7.066619804350216, 'learning_rate': 1.0810810810810812e-06, 'epoch': 0.01} 1%| | 80/12313 [03:50<9:26:03, 2.78s/it] 1%| | 81/12313 [03:53<9:10:09, 2.70s/it] {'loss': 1.0916, 'grad_norm': 5.206447831213838, 'learning_rate': 1.0945945945945948e-06, 'epoch': 0.01} 1%| | 81/12313 [03:53<9:10:09, 2.70s/it] 1%| | 82/12313 [03:55<9:12:50, 2.71s/it] {'loss': 0.695, 'grad_norm': 6.270913752130497, 'learning_rate': 1.1081081081081083e-06, 'epoch': 0.01} 1%| | 82/12313 [03:55<9:12:50, 2.71s/it] 1%| | 83/12313 [03:58<8:59:27, 2.65s/it] {'loss': 0.7361, 'grad_norm': 5.602218933207931, 'learning_rate': 1.1216216216216218e-06, 'epoch': 0.01} 1%| | 83/12313 [03:58<8:59:27, 2.65s/it] 1%| | 84/12313 [04:01<9:04:02, 2.67s/it] {'loss': 0.6823, 'grad_norm': 5.592916777746583, 'learning_rate': 1.1351351351351352e-06, 'epoch': 0.01} 1%| | 84/12313 [04:01<9:04:02, 2.67s/it] 1%| | 85/12313 [04:03<9:02:07, 2.66s/it] {'loss': 0.7927, 'grad_norm': 6.7335919368870565, 'learning_rate': 1.148648648648649e-06, 'epoch': 0.01} 1%| | 85/12313 [04:03<9:02:07, 2.66s/it] 1%| | 86/12313 [04:06<9:08:08, 2.69s/it] {'loss': 0.8417, 'grad_norm': 7.941076590684681, 'learning_rate': 1.1621621621621624e-06, 'epoch': 0.01} 1%| | 86/12313 [04:06<9:08:08, 2.69s/it] 1%| | 87/12313 [04:09<9:20:25, 2.75s/it] {'loss': 0.689, 'grad_norm': 5.969244160464252, 'learning_rate': 1.1756756756756758e-06, 'epoch': 0.01} 1%| | 87/12313 [04:09<9:20:25, 2.75s/it] 1%| | 88/12313 [04:12<9:13:01, 2.71s/it] {'loss': 0.9608, 'grad_norm': 5.118618058340234, 'learning_rate': 1.1891891891891893e-06, 'epoch': 0.01} 1%| | 88/12313 [04:12<9:13:01, 2.71s/it] 1%| | 89/12313 [04:14<9:15:45, 2.73s/it] {'loss': 0.6376, 'grad_norm': 8.55651778885645, 'learning_rate': 1.2027027027027028e-06, 'epoch': 0.01} 1%| | 89/12313 [04:14<9:15:45, 2.73s/it] 1%| | 90/12313 [04:17<9:19:45, 2.75s/it] {'loss': 0.8781, 'grad_norm': 8.9312254966113, 'learning_rate': 1.2162162162162164e-06, 'epoch': 0.01} 1%| | 90/12313 [04:17<9:19:45, 2.75s/it] 1%| | 91/12313 [04:20<9:15:53, 2.73s/it] {'loss': 0.7296, 'grad_norm': 5.435421842215974, 'learning_rate': 1.22972972972973e-06, 'epoch': 0.01} 1%| | 91/12313 [04:20<9:15:53, 2.73s/it] 1%| | 92/12313 [04:22<9:14:01, 2.72s/it] {'loss': 0.7004, 'grad_norm': 7.00575424705863, 'learning_rate': 1.2432432432432434e-06, 'epoch': 0.01} 1%| | 92/12313 [04:22<9:14:01, 2.72s/it] 1%| | 93/12313 [04:25<9:07:16, 2.69s/it] {'loss': 0.6933, 'grad_norm': 5.126051571596671, 'learning_rate': 1.2567567567567568e-06, 'epoch': 0.01} 1%| | 93/12313 [04:25<9:07:16, 2.69s/it] 1%| | 94/12313 [04:28<9:09:57, 2.70s/it] {'loss': 0.7443, 'grad_norm': 7.546722570854549, 'learning_rate': 1.2702702702702705e-06, 'epoch': 0.01} 1%| | 94/12313 [04:28<9:09:57, 2.70s/it] 1%| | 95/12313 [04:30<8:57:49, 2.64s/it] {'loss': 0.6609, 'grad_norm': 5.82298529351105, 'learning_rate': 1.2837837837837838e-06, 'epoch': 0.01} 1%| | 95/12313 [04:30<8:57:49, 2.64s/it] 1%| | 96/12313 [04:33<9:20:41, 2.75s/it] {'loss': 0.6966, 'grad_norm': 5.395134658996866, 'learning_rate': 1.2972972972972974e-06, 'epoch': 0.01} 1%| | 96/12313 [04:33<9:20:41, 2.75s/it] 1%| | 97/12313 [04:36<9:43:36, 2.87s/it] {'loss': 0.8075, 'grad_norm': 5.715845533979447, 'learning_rate': 1.310810810810811e-06, 'epoch': 0.01} 1%| | 97/12313 [04:36<9:43:36, 2.87s/it] 1%| | 98/12313 [04:39<9:47:46, 2.89s/it] {'loss': 0.8382, 'grad_norm': 4.346663727077859, 'learning_rate': 1.3243243243243246e-06, 'epoch': 0.01} 1%| | 98/12313 [04:39<9:47:46, 2.89s/it] 1%| | 99/12313 [04:42<9:31:13, 2.81s/it] {'loss': 0.8409, 'grad_norm': 4.510360891483355, 'learning_rate': 1.3378378378378378e-06, 'epoch': 0.01} 1%| | 99/12313 [04:42<9:31:13, 2.81s/it] 1%| | 100/12313 [04:45<9:23:01, 2.77s/it] {'loss': 0.8968, 'grad_norm': 6.252214797052545, 'learning_rate': 1.3513513513513515e-06, 'epoch': 0.01} 1%| | 100/12313 [04:45<9:23:01, 2.77s/it] 1%| | 101/12313 [04:48<9:30:03, 2.80s/it] {'loss': 0.7697, 'grad_norm': 4.133668140760274, 'learning_rate': 1.364864864864865e-06, 'epoch': 0.01} 1%| | 101/12313 [04:48<9:30:03, 2.80s/it] 1%| | 102/12313 [04:50<9:32:45, 2.81s/it] {'loss': 0.8302, 'grad_norm': 5.13995005645122, 'learning_rate': 1.3783783783783786e-06, 'epoch': 0.01} 1%| | 102/12313 [04:50<9:32:45, 2.81s/it] 1%| | 103/12313 [04:54<10:18:38, 3.04s/it] {'loss': 0.8553, 'grad_norm': 5.645133972443234, 'learning_rate': 1.391891891891892e-06, 'epoch': 0.01} 1%| | 103/12313 [04:54<10:18:38, 3.04s/it] 1%| | 104/12313 [04:57<9:59:14, 2.94s/it] {'loss': 0.7269, 'grad_norm': 5.768137707230809, 'learning_rate': 1.4054054054054056e-06, 'epoch': 0.01} 1%| | 104/12313 [04:57<9:59:14, 2.94s/it] 1%| | 105/12313 [04:59<9:34:08, 2.82s/it] {'loss': 0.8465, 'grad_norm': 3.9384737579894358, 'learning_rate': 1.418918918918919e-06, 'epoch': 0.01} 1%| | 105/12313 [04:59<9:34:08, 2.82s/it][2024-12-05 12:38:27,644] [WARNING] [stage3.py:1949:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time 1%| | 106/12313 [05:03<10:46:27, 3.18s/it] {'loss': 0.7358, 'grad_norm': 6.608951862752225, 'learning_rate': 1.4324324324324327e-06, 'epoch': 0.01} 1%| | 106/12313 [05:03<10:46:27, 3.18s/it] 1%| | 107/12313 [05:06<10:12:56, 3.01s/it] {'loss': 0.9122, 'grad_norm': 4.8856541626722425, 'learning_rate': 1.445945945945946e-06, 'epoch': 0.01} 1%| | 107/12313 [05:06<10:12:56, 3.01s/it] 1%| | 108/12313 [05:09<10:22:06, 3.06s/it] {'loss': 0.7847, 'grad_norm': 3.744812822198274, 'learning_rate': 1.4594594594594596e-06, 'epoch': 0.01} 1%| | 108/12313 [05:09<10:22:06, 3.06s/it] 1%| | 109/12313 [05:12<10:03:35, 2.97s/it] {'loss': 0.6125, 'grad_norm': 4.555012637892476, 'learning_rate': 1.4729729729729731e-06, 'epoch': 0.01} 1%| | 109/12313 [05:12<10:03:35, 2.97s/it] 1%| | 110/12313 [05:15<9:49:43, 2.90s/it] {'loss': 0.6426, 'grad_norm': 5.417265029398178, 'learning_rate': 1.4864864864864868e-06, 'epoch': 0.01} 1%| | 110/12313 [05:15<9:49:43, 2.90s/it] 1%| | 111/12313 [05:17<9:20:44, 2.76s/it] {'loss': 0.8351, 'grad_norm': 6.978404404546592, 'learning_rate': 1.5e-06, 'epoch': 0.01} 1%| | 111/12313 [05:17<9:20:44, 2.76s/it] 1%| | 112/12313 [05:20<9:15:32, 2.73s/it] {'loss': 0.9104, 'grad_norm': 7.486793674946231, 'learning_rate': 1.5135135135135137e-06, 'epoch': 0.01} 1%| | 112/12313 [05:20<9:15:32, 2.73s/it] 1%| | 113/12313 [05:22<9:13:58, 2.72s/it] {'loss': 0.7182, 'grad_norm': 6.427379250554024, 'learning_rate': 1.5270270270270272e-06, 'epoch': 0.01} 1%| | 113/12313 [05:22<9:13:58, 2.72s/it] 1%| | 114/12313 [05:25<8:57:43, 2.64s/it] {'loss': 0.7966, 'grad_norm': 7.581127803220287, 'learning_rate': 1.5405405405405409e-06, 'epoch': 0.01} 1%| | 114/12313 [05:25<8:57:43, 2.64s/it] 1%| | 115/12313 [05:28<9:03:21, 2.67s/it] {'loss': 0.7349, 'grad_norm': 4.51084883429586, 'learning_rate': 1.5540540540540541e-06, 'epoch': 0.01} 1%| | 115/12313 [05:28<9:03:21, 2.67s/it] 1%| | 116/12313 [05:30<8:57:46, 2.65s/it] {'loss': 0.6476, 'grad_norm': 5.521276273858657, 'learning_rate': 1.5675675675675678e-06, 'epoch': 0.01} 1%| | 116/12313 [05:30<8:57:46, 2.65s/it] 1%| | 117/12313 [05:33<8:59:39, 2.65s/it] {'loss': 0.7003, 'grad_norm': 10.232542435537884, 'learning_rate': 1.5810810810810812e-06, 'epoch': 0.01} 1%| | 117/12313 [05:33<8:59:39, 2.65s/it] 1%| | 118/12313 [05:35<8:46:50, 2.59s/it] {'loss': 0.921, 'grad_norm': 5.670487468601197, 'learning_rate': 1.5945945945945947e-06, 'epoch': 0.01} 1%| | 118/12313 [05:35<8:46:50, 2.59s/it] 1%| | 119/12313 [05:38<8:57:26, 2.64s/it] {'loss': 0.6932, 'grad_norm': 5.136554782606984, 'learning_rate': 1.6081081081081082e-06, 'epoch': 0.01} 1%| | 119/12313 [05:38<8:57:26, 2.64s/it] 1%| | 120/12313 [05:41<8:54:36, 2.63s/it] {'loss': 0.7676, 'grad_norm': 3.81643952542769, 'learning_rate': 1.6216216216216219e-06, 'epoch': 0.01} 1%| | 120/12313 [05:41<8:54:36, 2.63s/it] 1%| | 121/12313 [05:43<8:46:01, 2.59s/it] {'loss': 0.7903, 'grad_norm': 7.129104413729071, 'learning_rate': 1.6351351351351353e-06, 'epoch': 0.01} 1%| | 121/12313 [05:43<8:46:01, 2.59s/it] 1%| | 122/12313 [05:46<8:51:40, 2.62s/it] {'loss': 0.6741, 'grad_norm': 5.89283670831236, 'learning_rate': 1.6486486486486488e-06, 'epoch': 0.01} 1%| | 122/12313 [05:46<8:51:40, 2.62s/it] 1%| | 123/12313 [05:49<9:14:09, 2.73s/it] {'loss': 0.7493, 'grad_norm': 4.728889968775353, 'learning_rate': 1.6621621621621622e-06, 'epoch': 0.01} 1%| | 123/12313 [05:49<9:14:09, 2.73s/it] 1%| | 124/12313 [05:52<9:20:53, 2.76s/it] {'loss': 0.6254, 'grad_norm': 5.400043416105325, 'learning_rate': 1.675675675675676e-06, 'epoch': 0.01} 1%| | 124/12313 [05:52<9:20:53, 2.76s/it] 1%| | 125/12313 [05:54<9:10:58, 2.71s/it] {'loss': 0.6588, 'grad_norm': 5.7442992971209605, 'learning_rate': 1.6891891891891894e-06, 'epoch': 0.01} 1%| | 125/12313 [05:54<9:10:58, 2.71s/it] 1%| | 126/12313 [05:57<9:14:58, 2.73s/it] {'loss': 0.721, 'grad_norm': 5.225132595645112, 'learning_rate': 1.7027027027027028e-06, 'epoch': 0.01} 1%| | 126/12313 [05:57<9:14:58, 2.73s/it] 1%| | 127/12313 [06:00<9:19:49, 2.76s/it] {'loss': 0.6719, 'grad_norm': 4.917827752874381, 'learning_rate': 1.7162162162162163e-06, 'epoch': 0.01} 1%| | 127/12313 [06:00<9:19:49, 2.76s/it] 1%| | 128/12313 [06:03<9:16:59, 2.74s/it] {'loss': 0.8192, 'grad_norm': 5.96105392795946, 'learning_rate': 1.72972972972973e-06, 'epoch': 0.01} 1%| | 128/12313 [06:03<9:16:59, 2.74s/it] 1%| | 129/12313 [06:06<9:42:44, 2.87s/it] {'loss': 0.741, 'grad_norm': 5.348491477184876, 'learning_rate': 1.7432432432432432e-06, 'epoch': 0.01} 1%| | 129/12313 [06:06<9:42:44, 2.87s/it] 1%| | 130/12313 [06:08<9:19:36, 2.76s/it] {'loss': 0.6943, 'grad_norm': 4.046650669596968, 'learning_rate': 1.756756756756757e-06, 'epoch': 0.01} 1%| | 130/12313 [06:08<9:19:36, 2.76s/it] 1%| | 131/12313 [06:11<9:08:32, 2.70s/it] {'loss': 0.5737, 'grad_norm': 5.045700573097575, 'learning_rate': 1.7702702702702704e-06, 'epoch': 0.01} 1%| | 131/12313 [06:11<9:08:32, 2.70s/it] 1%| | 132/12313 [06:14<9:12:08, 2.72s/it] {'loss': 0.7721, 'grad_norm': 4.332973323075942, 'learning_rate': 1.783783783783784e-06, 'epoch': 0.01} 1%| | 132/12313 [06:14<9:12:08, 2.72s/it] 1%| | 133/12313 [06:16<9:09:08, 2.71s/it] {'loss': 0.7081, 'grad_norm': 12.053968494279287, 'learning_rate': 1.7972972972972973e-06, 'epoch': 0.01} 1%| | 133/12313 [06:16<9:09:08, 2.71s/it] 1%| | 134/12313 [06:19<9:04:54, 2.68s/it] {'loss': 0.7476, 'grad_norm': 4.314977931658641, 'learning_rate': 1.810810810810811e-06, 'epoch': 0.01} 1%| | 134/12313 [06:19<9:04:54, 2.68s/it] 1%| | 135/12313 [06:22<9:09:47, 2.71s/it] {'loss': 0.5623, 'grad_norm': 4.372992825514325, 'learning_rate': 1.8243243243243245e-06, 'epoch': 0.01} 1%| | 135/12313 [06:22<9:09:47, 2.71s/it] 1%| | 136/12313 [06:24<8:52:13, 2.62s/it] {'loss': 0.8409, 'grad_norm': 5.06501074840964, 'learning_rate': 1.8378378378378381e-06, 'epoch': 0.01} 1%| | 136/12313 [06:24<8:52:13, 2.62s/it] 1%| | 137/12313 [06:27<8:54:53, 2.64s/it] {'loss': 0.7097, 'grad_norm': 5.486341378807148, 'learning_rate': 1.8513513513513514e-06, 'epoch': 0.01} 1%| | 137/12313 [06:27<8:54:53, 2.64s/it] 1%| | 138/12313 [06:29<8:56:19, 2.64s/it] {'loss': 0.8605, 'grad_norm': 5.885624763272731, 'learning_rate': 1.864864864864865e-06, 'epoch': 0.01} 1%| | 138/12313 [06:29<8:56:19, 2.64s/it] 1%| | 139/12313 [06:32<8:52:16, 2.62s/it] {'loss': 0.7589, 'grad_norm': 5.810017768183461, 'learning_rate': 1.8783783783783785e-06, 'epoch': 0.01} 1%| | 139/12313 [06:32<8:52:16, 2.62s/it] 1%| | 140/12313 [06:35<9:13:43, 2.73s/it] {'loss': 0.6821, 'grad_norm': 5.241258116276019, 'learning_rate': 1.8918918918918922e-06, 'epoch': 0.01} 1%| | 140/12313 [06:35<9:13:43, 2.73s/it] 1%| | 141/12313 [06:37<8:57:43, 2.65s/it] {'loss': 0.6702, 'grad_norm': 5.141543904486299, 'learning_rate': 1.9054054054054054e-06, 'epoch': 0.01} 1%| | 141/12313 [06:37<8:57:43, 2.65s/it] 1%| | 142/12313 [06:40<9:02:18, 2.67s/it] {'loss': 0.7541, 'grad_norm': 6.614627634564522, 'learning_rate': 1.918918918918919e-06, 'epoch': 0.01} 1%| | 142/12313 [06:40<9:02:18, 2.67s/it] 1%| | 143/12313 [06:43<8:58:55, 2.66s/it] {'loss': 0.6118, 'grad_norm': 4.270716146835968, 'learning_rate': 1.9324324324324326e-06, 'epoch': 0.01} 1%| | 143/12313 [06:43<8:58:55, 2.66s/it] 1%| | 144/12313 [06:46<9:08:57, 2.71s/it] {'loss': 0.6631, 'grad_norm': 4.833439173611212, 'learning_rate': 1.945945945945946e-06, 'epoch': 0.01} 1%| | 144/12313 [06:46<9:08:57, 2.71s/it] 1%| | 145/12313 [06:48<9:00:28, 2.67s/it] {'loss': 0.5792, 'grad_norm': 6.670666111898159, 'learning_rate': 1.9594594594594595e-06, 'epoch': 0.01} 1%| | 145/12313 [06:48<9:00:28, 2.67s/it] 1%| | 146/12313 [06:51<8:52:33, 2.63s/it] {'loss': 0.6649, 'grad_norm': 5.320699935552583, 'learning_rate': 1.9729729729729734e-06, 'epoch': 0.01} 1%| | 146/12313 [06:51<8:52:33, 2.63s/it] 1%| | 147/12313 [06:53<8:46:58, 2.60s/it] {'loss': 0.6055, 'grad_norm': 10.048210540269071, 'learning_rate': 1.9864864864864864e-06, 'epoch': 0.01} 1%| | 147/12313 [06:53<8:46:58, 2.60s/it] 1%| | 148/12313 [06:56<8:44:54, 2.59s/it] {'loss': 0.7224, 'grad_norm': 4.8358064531705525, 'learning_rate': 2.0000000000000003e-06, 'epoch': 0.01} 1%| | 148/12313 [06:56<8:44:54, 2.59s/it] 1%| | 149/12313 [06:58<8:46:27, 2.60s/it] {'loss': 0.7546, 'grad_norm': 8.585169822222198, 'learning_rate': 2.013513513513514e-06, 'epoch': 0.01} 1%| | 149/12313 [06:58<8:46:27, 2.60s/it] 1%| | 150/12313 [07:01<8:40:14, 2.57s/it] {'loss': 0.6867, 'grad_norm': 4.847625342598578, 'learning_rate': 2.0270270270270273e-06, 'epoch': 0.01} 1%| | 150/12313 [07:01<8:40:14, 2.57s/it] 1%| | 151/12313 [07:03<8:40:58, 2.57s/it] {'loss': 0.7262, 'grad_norm': 3.7722782135223594, 'learning_rate': 2.0405405405405407e-06, 'epoch': 0.01} 1%| | 151/12313 [07:03<8:40:58, 2.57s/it] 1%| | 152/12313 [07:06<8:39:18, 2.56s/it] {'loss': 0.8792, 'grad_norm': 5.149292525910975, 'learning_rate': 2.054054054054054e-06, 'epoch': 0.01} 1%| | 152/12313 [07:06<8:39:18, 2.56s/it] 1%| | 153/12313 [07:09<8:39:42, 2.56s/it] {'loss': 0.7837, 'grad_norm': 6.26503383092344, 'learning_rate': 2.0675675675675677e-06, 'epoch': 0.01} 1%| | 153/12313 [07:09<8:39:42, 2.56s/it] 1%|▏ | 154/12313 [07:11<9:01:43, 2.67s/it] {'loss': 0.5692, 'grad_norm': 4.188912763837703, 'learning_rate': 2.0810810810810815e-06, 'epoch': 0.01} 1%|▏ | 154/12313 [07:11<9:01:43, 2.67s/it] 1%|▏ | 155/12313 [07:14<9:14:28, 2.74s/it] {'loss': 0.652, 'grad_norm': 5.04456436284514, 'learning_rate': 2.0945945945945946e-06, 'epoch': 0.01} 1%|▏ | 155/12313 [07:14<9:14:28, 2.74s/it] 1%|▏ | 156/12313 [07:17<8:58:32, 2.66s/it] {'loss': 0.6266, 'grad_norm': 7.405968257695848, 'learning_rate': 2.1081081081081085e-06, 'epoch': 0.01} 1%|▏ | 156/12313 [07:17<8:58:32, 2.66s/it] 1%|▏ | 157/12313 [07:19<8:51:42, 2.62s/it] {'loss': 0.6542, 'grad_norm': 5.310169429627866, 'learning_rate': 2.121621621621622e-06, 'epoch': 0.01} 1%|▏ | 157/12313 [07:19<8:51:42, 2.62s/it] 1%|▏ | 158/12313 [07:22<9:00:19, 2.67s/it] {'loss': 0.8108, 'grad_norm': 5.046936455927635, 'learning_rate': 2.1351351351351354e-06, 'epoch': 0.01} 1%|▏ | 158/12313 [07:22<9:00:19, 2.67s/it] 1%|▏ | 159/12313 [07:25<9:03:18, 2.68s/it] {'loss': 0.6339, 'grad_norm': 4.036458244440137, 'learning_rate': 2.148648648648649e-06, 'epoch': 0.01} 1%|▏ | 159/12313 [07:25<9:03:18, 2.68s/it] 1%|▏ | 160/12313 [07:27<8:54:03, 2.64s/it] {'loss': 0.7129, 'grad_norm': 5.254253720134002, 'learning_rate': 2.1621621621621623e-06, 'epoch': 0.01} 1%|▏ | 160/12313 [07:27<8:54:03, 2.64s/it] 1%|▏ | 161/12313 [07:30<9:04:50, 2.69s/it] {'loss': 0.7669, 'grad_norm': 4.842001153737952, 'learning_rate': 2.175675675675676e-06, 'epoch': 0.01} 1%|▏ | 161/12313 [07:30<9:04:50, 2.69s/it] 1%|▏ | 162/12313 [07:33<9:25:09, 2.79s/it] {'loss': 0.7403, 'grad_norm': 3.45883013736874, 'learning_rate': 2.1891891891891897e-06, 'epoch': 0.01} 1%|▏ | 162/12313 [07:33<9:25:09, 2.79s/it] 1%|▏ | 163/12313 [07:36<9:27:03, 2.80s/it] {'loss': 0.6072, 'grad_norm': 6.615557242600742, 'learning_rate': 2.2027027027027027e-06, 'epoch': 0.01} 1%|▏ | 163/12313 [07:36<9:27:03, 2.80s/it] 1%|▏ | 164/12313 [07:39<9:23:56, 2.79s/it] {'loss': 0.4518, 'grad_norm': 5.295390317262873, 'learning_rate': 2.2162162162162166e-06, 'epoch': 0.01} 1%|▏ | 164/12313 [07:39<9:23:56, 2.79s/it] 1%|▏ | 165/12313 [07:42<9:24:45, 2.79s/it] {'loss': 0.7405, 'grad_norm': 4.553708607425031, 'learning_rate': 2.22972972972973e-06, 'epoch': 0.01} 1%|▏ | 165/12313 [07:42<9:24:45, 2.79s/it] 1%|▏ | 166/12313 [07:44<9:29:18, 2.81s/it] {'loss': 0.7216, 'grad_norm': 5.730983978090373, 'learning_rate': 2.2432432432432435e-06, 'epoch': 0.01} 1%|▏ | 166/12313 [07:44<9:29:18, 2.81s/it] 1%|▏ | 167/12313 [07:47<9:20:07, 2.77s/it] {'loss': 0.543, 'grad_norm': 6.28068035640207, 'learning_rate': 2.256756756756757e-06, 'epoch': 0.01} 1%|▏ | 167/12313 [07:47<9:20:07, 2.77s/it] 1%|▏ | 168/12313 [07:50<9:00:30, 2.67s/it] {'loss': 0.7273, 'grad_norm': 5.534240505988875, 'learning_rate': 2.2702702702702705e-06, 'epoch': 0.01} 1%|▏ | 168/12313 [07:50<9:00:30, 2.67s/it] 1%|▏ | 169/12313 [07:52<9:11:38, 2.73s/it] {'loss': 0.6259, 'grad_norm': 10.022692075346896, 'learning_rate': 2.283783783783784e-06, 'epoch': 0.01} 1%|▏ | 169/12313 [07:52<9:11:38, 2.73s/it] 1%|▏ | 170/12313 [07:55<9:08:01, 2.71s/it] {'loss': 0.5538, 'grad_norm': 4.133335838910277, 'learning_rate': 2.297297297297298e-06, 'epoch': 0.01} 1%|▏ | 170/12313 [07:55<9:08:01, 2.71s/it] 1%|▏ | 171/12313 [07:57<8:49:36, 2.62s/it] {'loss': 0.5594, 'grad_norm': 6.849412426150848, 'learning_rate': 2.310810810810811e-06, 'epoch': 0.01} 1%|▏ | 171/12313 [07:57<8:49:36, 2.62s/it] 1%|▏ | 172/12313 [08:00<8:57:54, 2.66s/it] {'loss': 0.6406, 'grad_norm': 7.2240081386637955, 'learning_rate': 2.3243243243243247e-06, 'epoch': 0.01} 1%|▏ | 172/12313 [08:00<8:57:54, 2.66s/it] 1%|▏ | 173/12313 [08:03<9:15:40, 2.75s/it] {'loss': 0.697, 'grad_norm': 4.948498140450114, 'learning_rate': 2.337837837837838e-06, 'epoch': 0.01} 1%|▏ | 173/12313 [08:03<9:15:40, 2.75s/it] 1%|▏ | 174/12313 [08:06<9:07:59, 2.71s/it] {'loss': 0.709, 'grad_norm': 3.8421671175611465, 'learning_rate': 2.3513513513513517e-06, 'epoch': 0.01} 1%|▏ | 174/12313 [08:06<9:07:59, 2.71s/it] 1%|▏ | 175/12313 [08:09<9:43:02, 2.88s/it] {'loss': 0.5944, 'grad_norm': 4.9263286743150925, 'learning_rate': 2.364864864864865e-06, 'epoch': 0.01} 1%|▏ | 175/12313 [08:09<9:43:02, 2.88s/it] 1%|▏ | 176/12313 [08:12<9:28:49, 2.81s/it] {'loss': 0.7951, 'grad_norm': 4.181352300851268, 'learning_rate': 2.3783783783783786e-06, 'epoch': 0.01} 1%|▏ | 176/12313 [08:12<9:28:49, 2.81s/it] 1%|▏ | 177/12313 [08:14<9:08:09, 2.71s/it] {'loss': 0.8847, 'grad_norm': 5.217958869841234, 'learning_rate': 2.391891891891892e-06, 'epoch': 0.01} 1%|▏ | 177/12313 [08:14<9:08:09, 2.71s/it] 1%|▏ | 178/12313 [08:17<9:06:44, 2.70s/it] {'loss': 0.6506, 'grad_norm': 5.53287305794427, 'learning_rate': 2.4054054054054055e-06, 'epoch': 0.01} 1%|▏ | 178/12313 [08:17<9:06:44, 2.70s/it] 1%|▏ | 179/12313 [08:19<8:58:58, 2.67s/it] {'loss': 0.649, 'grad_norm': 4.977958862414195, 'learning_rate': 2.418918918918919e-06, 'epoch': 0.01} 1%|▏ | 179/12313 [08:19<8:58:58, 2.67s/it] 1%|▏ | 180/12313 [08:22<8:53:57, 2.64s/it] {'loss': 0.5173, 'grad_norm': 7.079003805106819, 'learning_rate': 2.432432432432433e-06, 'epoch': 0.01} 1%|▏ | 180/12313 [08:22<8:53:57, 2.64s/it] 1%|▏ | 181/12313 [08:25<8:57:27, 2.66s/it] {'loss': 0.7425, 'grad_norm': 6.4006738662761675, 'learning_rate': 2.4459459459459463e-06, 'epoch': 0.01} 1%|▏ | 181/12313 [08:25<8:57:27, 2.66s/it] 1%|▏ | 182/12313 [08:28<9:07:13, 2.71s/it] {'loss': 0.7116, 'grad_norm': 7.223867008336038, 'learning_rate': 2.45945945945946e-06, 'epoch': 0.01} 1%|▏ | 182/12313 [08:28<9:07:13, 2.71s/it] 1%|▏ | 183/12313 [08:30<9:04:05, 2.69s/it] {'loss': 0.8202, 'grad_norm': 4.74150217223079, 'learning_rate': 2.4729729729729733e-06, 'epoch': 0.01} 1%|▏ | 183/12313 [08:30<9:04:05, 2.69s/it] 1%|▏ | 184/12313 [08:33<8:56:16, 2.65s/it] {'loss': 0.7202, 'grad_norm': 5.838162332180424, 'learning_rate': 2.4864864864864867e-06, 'epoch': 0.01} 1%|▏ | 184/12313 [08:33<8:56:16, 2.65s/it] 2%|▏ | 185/12313 [08:35<8:58:15, 2.66s/it] {'loss': 0.6318, 'grad_norm': 7.3164412014837845, 'learning_rate': 2.5e-06, 'epoch': 0.02} 2%|▏ | 185/12313 [08:35<8:58:15, 2.66s/it] 2%|▏ | 186/12313 [08:38<9:05:38, 2.70s/it] {'loss': 0.6161, 'grad_norm': 4.0466470539819674, 'learning_rate': 2.5135135135135137e-06, 'epoch': 0.02} 2%|▏ | 186/12313 [08:38<9:05:38, 2.70s/it] 2%|▏ | 187/12313 [08:41<8:53:58, 2.64s/it] {'loss': 0.8844, 'grad_norm': 4.20696930651959, 'learning_rate': 2.527027027027027e-06, 'epoch': 0.02} 2%|▏ | 187/12313 [08:41<8:53:58, 2.64s/it] 2%|▏ | 188/12313 [08:43<8:44:13, 2.59s/it] {'loss': 0.7893, 'grad_norm': 4.619062654793844, 'learning_rate': 2.540540540540541e-06, 'epoch': 0.02} 2%|▏ | 188/12313 [08:43<8:44:13, 2.59s/it] 2%|▏ | 189/12313 [08:46<8:49:49, 2.62s/it] {'loss': 0.9053, 'grad_norm': 4.028095158003242, 'learning_rate': 2.554054054054054e-06, 'epoch': 0.02} 2%|▏ | 189/12313 [08:46<8:49:49, 2.62s/it] 2%|▏ | 190/12313 [08:49<8:58:49, 2.67s/it] {'loss': 0.6785, 'grad_norm': 5.474147278368468, 'learning_rate': 2.5675675675675675e-06, 'epoch': 0.02} 2%|▏ | 190/12313 [08:49<8:58:49, 2.67s/it] 2%|▏ | 191/12313 [08:51<8:45:55, 2.60s/it] {'loss': 0.6353, 'grad_norm': 7.068401125839219, 'learning_rate': 2.581081081081081e-06, 'epoch': 0.02} 2%|▏ | 191/12313 [08:51<8:45:55, 2.60s/it] 2%|▏ | 192/12313 [08:54<8:49:21, 2.62s/it] {'loss': 0.6171, 'grad_norm': 6.211801926350057, 'learning_rate': 2.594594594594595e-06, 'epoch': 0.02} 2%|▏ | 192/12313 [08:54<8:49:21, 2.62s/it] 2%|▏ | 193/12313 [08:56<8:47:12, 2.61s/it] {'loss': 0.6291, 'grad_norm': 6.8766856866833805, 'learning_rate': 2.6081081081081083e-06, 'epoch': 0.02} 2%|▏ | 193/12313 [08:56<8:47:12, 2.61s/it] 2%|▏ | 194/12313 [08:59<8:55:47, 2.65s/it] {'loss': 0.7322, 'grad_norm': 4.450746282218491, 'learning_rate': 2.621621621621622e-06, 'epoch': 0.02} 2%|▏ | 194/12313 [08:59<8:55:47, 2.65s/it] 2%|▏ | 195/12313 [09:02<8:56:13, 2.65s/it] {'loss': 0.8431, 'grad_norm': 6.737626992175194, 'learning_rate': 2.6351351351351353e-06, 'epoch': 0.02} 2%|▏ | 195/12313 [09:02<8:56:13, 2.65s/it] 2%|▏ | 196/12313 [09:04<8:46:49, 2.61s/it] {'loss': 0.6027, 'grad_norm': 5.202217528227061, 'learning_rate': 2.648648648648649e-06, 'epoch': 0.02} 2%|▏ | 196/12313 [09:04<8:46:49, 2.61s/it] 2%|▏ | 197/12313 [09:07<9:06:04, 2.70s/it] {'loss': 0.7227, 'grad_norm': 6.136140183171829, 'learning_rate': 2.662162162162162e-06, 'epoch': 0.02} 2%|▏ | 197/12313 [09:07<9:06:04, 2.70s/it] 2%|▏ | 198/12313 [09:10<9:07:49, 2.71s/it] {'loss': 0.8206, 'grad_norm': 6.20592135137412, 'learning_rate': 2.6756756756756757e-06, 'epoch': 0.02} 2%|▏ | 198/12313 [09:10<9:07:49, 2.71s/it] 2%|▏ | 199/12313 [09:13<9:09:17, 2.72s/it] {'loss': 0.5755, 'grad_norm': 5.542234344965935, 'learning_rate': 2.689189189189189e-06, 'epoch': 0.02} 2%|▏ | 199/12313 [09:13<9:09:17, 2.72s/it] 2%|▏ | 200/12313 [09:15<9:03:06, 2.69s/it] {'loss': 0.9124, 'grad_norm': 6.481129594142135, 'learning_rate': 2.702702702702703e-06, 'epoch': 0.02} 2%|▏ | 200/12313 [09:15<9:03:06, 2.69s/it] 2%|▏ | 201/12313 [09:18<9:00:43, 2.68s/it] {'loss': 0.7077, 'grad_norm': 8.819932985194717, 'learning_rate': 2.7162162162162165e-06, 'epoch': 0.02} 2%|▏ | 201/12313 [09:18<9:00:43, 2.68s/it] 2%|▏ | 202/12313 [09:20<8:46:53, 2.61s/it] {'loss': 0.7187, 'grad_norm': 3.7133046294352496, 'learning_rate': 2.72972972972973e-06, 'epoch': 0.02} 2%|▏ | 202/12313 [09:20<8:46:53, 2.61s/it] 2%|▏ | 203/12313 [09:23<8:53:54, 2.65s/it] {'loss': 0.7079, 'grad_norm': 4.768904441511917, 'learning_rate': 2.7432432432432434e-06, 'epoch': 0.02} 2%|▏ | 203/12313 [09:23<8:53:54, 2.65s/it] 2%|▏ | 204/12313 [09:26<8:49:43, 2.62s/it] {'loss': 0.8741, 'grad_norm': 5.4109232849671445, 'learning_rate': 2.7567567567567573e-06, 'epoch': 0.02} 2%|▏ | 204/12313 [09:26<8:49:43, 2.62s/it] 2%|▏ | 205/12313 [09:28<8:49:42, 2.62s/it] {'loss': 0.6088, 'grad_norm': 11.784275280928684, 'learning_rate': 2.7702702702702703e-06, 'epoch': 0.02} 2%|▏ | 205/12313 [09:28<8:49:42, 2.62s/it] 2%|▏ | 206/12313 [09:31<8:51:12, 2.63s/it] {'loss': 0.7085, 'grad_norm': 6.596444076197084, 'learning_rate': 2.783783783783784e-06, 'epoch': 0.02} 2%|▏ | 206/12313 [09:31<8:51:12, 2.63s/it] 2%|▏ | 207/12313 [09:34<8:58:02, 2.67s/it] {'loss': 0.7641, 'grad_norm': 4.7347497389094055, 'learning_rate': 2.7972972972972973e-06, 'epoch': 0.02} 2%|▏ | 207/12313 [09:34<8:58:02, 2.67s/it] 2%|▏ | 208/12313 [09:36<8:42:36, 2.59s/it] {'loss': 0.613, 'grad_norm': 4.895873979702855, 'learning_rate': 2.810810810810811e-06, 'epoch': 0.02} 2%|▏ | 208/12313 [09:36<8:42:36, 2.59s/it] 2%|▏ | 209/12313 [09:39<8:58:49, 2.67s/it] {'loss': 0.6551, 'grad_norm': 4.112688584540271, 'learning_rate': 2.8243243243243246e-06, 'epoch': 0.02} 2%|▏ | 209/12313 [09:39<8:58:49, 2.67s/it] 2%|▏ | 210/12313 [09:42<8:57:38, 2.67s/it] {'loss': 0.8091, 'grad_norm': 4.280798819098727, 'learning_rate': 2.837837837837838e-06, 'epoch': 0.02} 2%|▏ | 210/12313 [09:42<8:57:38, 2.67s/it] 2%|▏ | 211/12313 [09:45<9:15:37, 2.75s/it] {'loss': 0.6225, 'grad_norm': 4.953151146240816, 'learning_rate': 2.851351351351351e-06, 'epoch': 0.02} 2%|▏ | 211/12313 [09:45<9:15:37, 2.75s/it] 2%|▏ | 212/12313 [09:47<9:03:24, 2.69s/it] {'loss': 0.7894, 'grad_norm': 4.858934778537565, 'learning_rate': 2.8648648648648654e-06, 'epoch': 0.02} 2%|▏ | 212/12313 [09:47<9:03:24, 2.69s/it] 2%|▏ | 213/12313 [09:50<9:16:35, 2.76s/it] {'loss': 0.6714, 'grad_norm': 3.776464089377743, 'learning_rate': 2.8783783783783785e-06, 'epoch': 0.02} 2%|▏ | 213/12313 [09:50<9:16:35, 2.76s/it] 2%|▏ | 214/12313 [09:53<9:06:43, 2.71s/it] {'loss': 0.6115, 'grad_norm': 5.927211431887128, 'learning_rate': 2.891891891891892e-06, 'epoch': 0.02} 2%|▏ | 214/12313 [09:53<9:06:43, 2.71s/it] 2%|▏ | 215/12313 [09:56<9:11:54, 2.74s/it] {'loss': 0.8019, 'grad_norm': 4.290882863865101, 'learning_rate': 2.9054054054054054e-06, 'epoch': 0.02} 2%|▏ | 215/12313 [09:56<9:11:54, 2.74s/it] 2%|▏ | 216/12313 [09:58<9:09:21, 2.72s/it] {'loss': 0.9165, 'grad_norm': 5.014996919723652, 'learning_rate': 2.9189189189189193e-06, 'epoch': 0.02} 2%|▏ | 216/12313 [09:58<9:09:21, 2.72s/it] 2%|▏ | 217/12313 [10:01<8:58:55, 2.67s/it] {'loss': 0.7034, 'grad_norm': 4.678677528211619, 'learning_rate': 2.9324324324324328e-06, 'epoch': 0.02} 2%|▏ | 217/12313 [10:01<8:58:55, 2.67s/it] 2%|▏ | 218/12313 [10:03<8:56:11, 2.66s/it] {'loss': 0.6187, 'grad_norm': 5.856721537107752, 'learning_rate': 2.9459459459459462e-06, 'epoch': 0.02} 2%|▏ | 218/12313 [10:03<8:56:11, 2.66s/it] 2%|▏ | 219/12313 [10:06<8:53:25, 2.65s/it] {'loss': 0.6282, 'grad_norm': 5.675292275975791, 'learning_rate': 2.9594594594594593e-06, 'epoch': 0.02} 2%|▏ | 219/12313 [10:06<8:53:25, 2.65s/it] 2%|▏ | 220/12313 [10:09<9:26:55, 2.81s/it] {'loss': 0.6984, 'grad_norm': 4.026148472844569, 'learning_rate': 2.9729729729729736e-06, 'epoch': 0.02} 2%|▏ | 220/12313 [10:09<9:26:55, 2.81s/it] 2%|▏ | 221/12313 [10:12<9:19:43, 2.78s/it] {'loss': 0.7059, 'grad_norm': 3.7553668002568115, 'learning_rate': 2.9864864864864866e-06, 'epoch': 0.02} 2%|▏ | 221/12313 [10:12<9:19:43, 2.78s/it] 2%|▏ | 222/12313 [10:15<9:14:27, 2.75s/it] {'loss': 0.6676, 'grad_norm': 5.010656407185249, 'learning_rate': 3e-06, 'epoch': 0.02} 2%|▏ | 222/12313 [10:15<9:14:27, 2.75s/it] 2%|▏ | 223/12313 [10:17<9:06:58, 2.71s/it] {'loss': 0.6528, 'grad_norm': 5.174403860939526, 'learning_rate': 3.0135135135135135e-06, 'epoch': 0.02} 2%|▏ | 223/12313 [10:17<9:06:58, 2.71s/it] 2%|▏ | 224/12313 [10:20<9:01:18, 2.69s/it] {'loss': 0.6981, 'grad_norm': 9.300176785765524, 'learning_rate': 3.0270270270270274e-06, 'epoch': 0.02} 2%|▏ | 224/12313 [10:20<9:01:18, 2.69s/it] 2%|▏ | 225/12313 [10:22<8:51:28, 2.64s/it] {'loss': 0.704, 'grad_norm': 6.1866583084404185, 'learning_rate': 3.040540540540541e-06, 'epoch': 0.02} 2%|▏ | 225/12313 [10:22<8:51:28, 2.64s/it] 2%|▏ | 226/12313 [10:25<9:03:13, 2.70s/it] {'loss': 0.593, 'grad_norm': 7.4629748217872605, 'learning_rate': 3.0540540540540544e-06, 'epoch': 0.02} 2%|▏ | 226/12313 [10:25<9:03:13, 2.70s/it] 2%|▏ | 227/12313 [10:28<9:03:01, 2.70s/it] {'loss': 0.5932, 'grad_norm': 4.530659236075315, 'learning_rate': 3.0675675675675674e-06, 'epoch': 0.02} 2%|▏ | 227/12313 [10:28<9:03:01, 2.70s/it] 2%|▏ | 228/12313 [10:31<9:01:43, 2.69s/it] {'loss': 0.7238, 'grad_norm': 5.748440319813538, 'learning_rate': 3.0810810810810817e-06, 'epoch': 0.02} 2%|▏ | 228/12313 [10:31<9:01:43, 2.69s/it] 2%|▏ | 229/12313 [10:33<9:00:42, 2.68s/it] {'loss': 0.7017, 'grad_norm': 40.72028295804037, 'learning_rate': 3.0945945945945947e-06, 'epoch': 0.02} 2%|▏ | 229/12313 [10:33<9:00:42, 2.68s/it] 2%|▏ | 230/12313 [10:36<9:00:55, 2.69s/it] {'loss': 0.5761, 'grad_norm': 8.886281188290711, 'learning_rate': 3.1081081081081082e-06, 'epoch': 0.02} 2%|▏ | 230/12313 [10:36<9:00:55, 2.69s/it] 2%|▏ | 231/12313 [10:38<8:46:34, 2.62s/it] {'loss': 0.8567, 'grad_norm': 4.343869317099014, 'learning_rate': 3.1216216216216217e-06, 'epoch': 0.02} 2%|▏ | 231/12313 [10:38<8:46:34, 2.62s/it] 2%|▏ | 232/12313 [10:41<8:47:55, 2.62s/it] {'loss': 0.7763, 'grad_norm': 4.744970167283222, 'learning_rate': 3.1351351351351356e-06, 'epoch': 0.02} 2%|▏ | 232/12313 [10:41<8:47:55, 2.62s/it] 2%|▏ | 233/12313 [10:44<8:58:06, 2.67s/it] {'loss': 0.8186, 'grad_norm': 7.495714776851687, 'learning_rate': 3.148648648648649e-06, 'epoch': 0.02} 2%|▏ | 233/12313 [10:44<8:58:06, 2.67s/it] 2%|▏ | 234/12313 [10:47<9:05:24, 2.71s/it] {'loss': 0.7489, 'grad_norm': 4.102840820567404, 'learning_rate': 3.1621621621621625e-06, 'epoch': 0.02} 2%|▏ | 234/12313 [10:47<9:05:24, 2.71s/it] 2%|▏ | 235/12313 [10:50<9:19:21, 2.78s/it] {'loss': 0.6933, 'grad_norm': 3.8421345026888027, 'learning_rate': 3.1756756756756755e-06, 'epoch': 0.02} 2%|▏ | 235/12313 [10:50<9:19:21, 2.78s/it] 2%|▏ | 236/12313 [10:52<9:22:24, 2.79s/it] {'loss': 0.6052, 'grad_norm': 7.387916364375026, 'learning_rate': 3.1891891891891894e-06, 'epoch': 0.02} 2%|▏ | 236/12313 [10:52<9:22:24, 2.79s/it] 2%|▏ | 237/12313 [10:55<9:01:48, 2.69s/it] {'loss': 0.7804, 'grad_norm': 3.849533522847183, 'learning_rate': 3.202702702702703e-06, 'epoch': 0.02} 2%|▏ | 237/12313 [10:55<9:01:48, 2.69s/it] 2%|▏ | 238/12313 [10:57<8:39:45, 2.58s/it] {'loss': 0.6641, 'grad_norm': 5.396298934498324, 'learning_rate': 3.2162162162162164e-06, 'epoch': 0.02} 2%|▏ | 238/12313 [10:57<8:39:45, 2.58s/it] 2%|▏ | 239/12313 [11:00<8:57:53, 2.67s/it] {'loss': 0.5532, 'grad_norm': 4.940398668509269, 'learning_rate': 3.22972972972973e-06, 'epoch': 0.02} 2%|▏ | 239/12313 [11:00<8:57:53, 2.67s/it] 2%|▏ | 240/12313 [11:03<8:58:54, 2.68s/it] {'loss': 0.7118, 'grad_norm': 6.577994445331441, 'learning_rate': 3.2432432432432437e-06, 'epoch': 0.02} 2%|▏ | 240/12313 [11:03<8:58:54, 2.68s/it] 2%|▏ | 241/12313 [11:05<8:43:54, 2.60s/it] {'loss': 0.7421, 'grad_norm': 5.519259337961804, 'learning_rate': 3.256756756756757e-06, 'epoch': 0.02} 2%|▏ | 241/12313 [11:05<8:43:54, 2.60s/it] 2%|▏ | 242/12313 [11:08<8:41:49, 2.59s/it] {'loss': 0.6083, 'grad_norm': 6.0576611783888215, 'learning_rate': 3.2702702702702706e-06, 'epoch': 0.02} 2%|▏ | 242/12313 [11:08<8:41:49, 2.59s/it] 2%|▏ | 243/12313 [11:10<8:31:17, 2.54s/it] {'loss': 0.678, 'grad_norm': 8.457628617936836, 'learning_rate': 3.2837837837837837e-06, 'epoch': 0.02} 2%|▏ | 243/12313 [11:10<8:31:17, 2.54s/it] 2%|▏ | 244/12313 [11:13<8:39:10, 2.58s/it] {'loss': 0.5896, 'grad_norm': 7.364746035461257, 'learning_rate': 3.2972972972972976e-06, 'epoch': 0.02} 2%|▏ | 244/12313 [11:13<8:39:10, 2.58s/it] 2%|▏ | 245/12313 [11:16<8:44:15, 2.61s/it] {'loss': 0.6219, 'grad_norm': 7.750984536568317, 'learning_rate': 3.310810810810811e-06, 'epoch': 0.02} 2%|▏ | 245/12313 [11:16<8:44:15, 2.61s/it] 2%|▏ | 246/12313 [11:19<9:22:40, 2.80s/it] {'loss': 0.6942, 'grad_norm': 4.981685980750705, 'learning_rate': 3.3243243243243245e-06, 'epoch': 0.02} 2%|▏ | 246/12313 [11:19<9:22:40, 2.80s/it] 2%|▏ | 247/12313 [11:21<9:09:40, 2.73s/it] {'loss': 0.7533, 'grad_norm': 6.278603739216485, 'learning_rate': 3.337837837837838e-06, 'epoch': 0.02} 2%|▏ | 247/12313 [11:21<9:09:40, 2.73s/it] 2%|▏ | 248/12313 [11:24<9:02:48, 2.70s/it] {'loss': 0.6608, 'grad_norm': 4.801507815370232, 'learning_rate': 3.351351351351352e-06, 'epoch': 0.02} 2%|▏ | 248/12313 [11:24<9:02:48, 2.70s/it] 2%|▏ | 249/12313 [11:27<9:10:47, 2.74s/it] {'loss': 0.8164, 'grad_norm': 7.021222735394233, 'learning_rate': 3.3648648648648653e-06, 'epoch': 0.02} 2%|▏ | 249/12313 [11:27<9:10:47, 2.74s/it] 2%|▏ | 250/12313 [11:29<9:05:54, 2.72s/it] {'loss': 0.6886, 'grad_norm': 5.832060700035103, 'learning_rate': 3.3783783783783788e-06, 'epoch': 0.02} 2%|▏ | 250/12313 [11:29<9:05:54, 2.72s/it] 2%|▏ | 251/12313 [11:32<9:05:55, 2.72s/it] {'loss': 0.6018, 'grad_norm': 4.387381573209668, 'learning_rate': 3.391891891891892e-06, 'epoch': 0.02} 2%|▏ | 251/12313 [11:32<9:05:55, 2.72s/it] 2%|▏ | 252/12313 [11:35<9:03:47, 2.71s/it] {'loss': 0.8009, 'grad_norm': 4.588515154439844, 'learning_rate': 3.4054054054054057e-06, 'epoch': 0.02} 2%|▏ | 252/12313 [11:35<9:03:47, 2.71s/it] 2%|▏ | 253/12313 [11:37<8:56:47, 2.67s/it] {'loss': 0.6694, 'grad_norm': 5.9906722613866865, 'learning_rate': 3.418918918918919e-06, 'epoch': 0.02} 2%|▏ | 253/12313 [11:37<8:56:47, 2.67s/it] 2%|▏ | 254/12313 [11:40<8:52:40, 2.65s/it] {'loss': 0.8309, 'grad_norm': 4.955892303287632, 'learning_rate': 3.4324324324324326e-06, 'epoch': 0.02} 2%|▏ | 254/12313 [11:40<8:52:40, 2.65s/it] 2%|▏ | 255/12313 [11:43<8:51:49, 2.65s/it] {'loss': 0.8692, 'grad_norm': 4.824146659749785, 'learning_rate': 3.445945945945946e-06, 'epoch': 0.02} 2%|▏ | 255/12313 [11:43<8:51:49, 2.65s/it] 2%|▏ | 256/12313 [11:45<8:56:40, 2.67s/it] {'loss': 0.5992, 'grad_norm': 5.5270982189809015, 'learning_rate': 3.45945945945946e-06, 'epoch': 0.02} 2%|▏ | 256/12313 [11:45<8:56:40, 2.67s/it] 2%|▏ | 257/12313 [11:48<8:49:36, 2.64s/it] {'loss': 0.777, 'grad_norm': 6.204903985337891, 'learning_rate': 3.4729729729729734e-06, 'epoch': 0.02} 2%|▏ | 257/12313 [11:48<8:49:36, 2.64s/it] 2%|▏ | 258/12313 [11:51<8:55:14, 2.66s/it] {'loss': 0.6435, 'grad_norm': 3.695647833466083, 'learning_rate': 3.4864864864864865e-06, 'epoch': 0.02} 2%|▏ | 258/12313 [11:51<8:55:14, 2.66s/it] 2%|▏ | 259/12313 [11:53<8:58:37, 2.68s/it] {'loss': 0.6796, 'grad_norm': 5.291080353938334, 'learning_rate': 3.5e-06, 'epoch': 0.02} 2%|▏ | 259/12313 [11:53<8:58:37, 2.68s/it] 2%|▏ | 260/12313 [11:56<8:53:14, 2.65s/it] {'loss': 0.6448, 'grad_norm': 4.5185266685147285, 'learning_rate': 3.513513513513514e-06, 'epoch': 0.02} 2%|▏ | 260/12313 [11:56<8:53:14, 2.65s/it] 2%|▏ | 261/12313 [11:58<8:40:06, 2.59s/it] {'loss': 0.682, 'grad_norm': 4.851027601896295, 'learning_rate': 3.5270270270270273e-06, 'epoch': 0.02} 2%|▏ | 261/12313 [11:58<8:40:06, 2.59s/it] 2%|▏ | 262/12313 [12:01<8:36:26, 2.57s/it] {'loss': 0.7084, 'grad_norm': 5.6100378122852925, 'learning_rate': 3.5405405405405408e-06, 'epoch': 0.02} 2%|▏ | 262/12313 [12:01<8:36:26, 2.57s/it] 2%|▏ | 263/12313 [12:04<8:58:16, 2.68s/it] {'loss': 0.8671, 'grad_norm': 5.135851719796815, 'learning_rate': 3.5540540540540542e-06, 'epoch': 0.02} 2%|▏ | 263/12313 [12:04<8:58:16, 2.68s/it] 2%|▏ | 264/12313 [12:06<8:49:08, 2.63s/it] {'loss': 0.5465, 'grad_norm': 5.735868784715722, 'learning_rate': 3.567567567567568e-06, 'epoch': 0.02} 2%|▏ | 264/12313 [12:06<8:49:08, 2.63s/it] 2%|▏ | 265/12313 [12:09<9:11:46, 2.75s/it] {'loss': 0.6117, 'grad_norm': 4.041664795202519, 'learning_rate': 3.5810810810810816e-06, 'epoch': 0.02} 2%|▏ | 265/12313 [12:09<9:11:46, 2.75s/it] 2%|▏ | 266/12313 [12:12<9:15:44, 2.77s/it] {'loss': 0.6543, 'grad_norm': 4.742239360128328, 'learning_rate': 3.5945945945945946e-06, 'epoch': 0.02} 2%|▏ | 266/12313 [12:12<9:15:44, 2.77s/it] 2%|▏ | 267/12313 [12:15<9:19:25, 2.79s/it] {'loss': 0.7132, 'grad_norm': 5.6376075863303114, 'learning_rate': 3.608108108108108e-06, 'epoch': 0.02} 2%|▏ | 267/12313 [12:15<9:19:25, 2.79s/it] 2%|▏ | 268/12313 [12:18<9:27:21, 2.83s/it] {'loss': 0.9163, 'grad_norm': 4.108413552937677, 'learning_rate': 3.621621621621622e-06, 'epoch': 0.02} 2%|▏ | 268/12313 [12:18<9:27:21, 2.83s/it] 2%|▏ | 269/12313 [12:21<9:10:07, 2.74s/it] {'loss': 0.6385, 'grad_norm': 7.767807764665652, 'learning_rate': 3.6351351351351354e-06, 'epoch': 0.02} 2%|▏ | 269/12313 [12:21<9:10:07, 2.74s/it] 2%|▏ | 270/12313 [12:23<9:10:19, 2.74s/it] {'loss': 0.6538, 'grad_norm': 5.864609873584057, 'learning_rate': 3.648648648648649e-06, 'epoch': 0.02} 2%|▏ | 270/12313 [12:23<9:10:19, 2.74s/it] 2%|▏ | 271/12313 [12:26<9:06:05, 2.72s/it] {'loss': 0.6993, 'grad_norm': 4.820511968405664, 'learning_rate': 3.6621621621621624e-06, 'epoch': 0.02} 2%|▏ | 271/12313 [12:26<9:06:05, 2.72s/it] 2%|▏ | 272/12313 [12:29<9:08:47, 2.73s/it] {'loss': 0.8915, 'grad_norm': 4.967071259608292, 'learning_rate': 3.6756756756756763e-06, 'epoch': 0.02} 2%|▏ | 272/12313 [12:29<9:08:47, 2.73s/it] 2%|▏ | 273/12313 [12:31<9:05:36, 2.72s/it] {'loss': 0.906, 'grad_norm': 3.7176252188194723, 'learning_rate': 3.6891891891891897e-06, 'epoch': 0.02} 2%|▏ | 273/12313 [12:31<9:05:36, 2.72s/it] 2%|▏ | 274/12313 [12:34<8:58:33, 2.68s/it] {'loss': 0.5487, 'grad_norm': 7.432800095587209, 'learning_rate': 3.7027027027027028e-06, 'epoch': 0.02} 2%|▏ | 274/12313 [12:34<8:58:33, 2.68s/it] 2%|▏ | 275/12313 [12:37<8:54:03, 2.66s/it] {'loss': 0.6497, 'grad_norm': 4.725582603425111, 'learning_rate': 3.7162162162162162e-06, 'epoch': 0.02} 2%|▏ | 275/12313 [12:37<8:54:03, 2.66s/it] 2%|▏ | 276/12313 [12:39<9:01:55, 2.70s/it] {'loss': 0.6416, 'grad_norm': 7.338793955644148, 'learning_rate': 3.72972972972973e-06, 'epoch': 0.02} 2%|▏ | 276/12313 [12:39<9:01:55, 2.70s/it] 2%|▏ | 277/12313 [12:42<9:08:21, 2.73s/it] {'loss': 0.6278, 'grad_norm': 5.306407989783936, 'learning_rate': 3.7432432432432436e-06, 'epoch': 0.02} 2%|▏ | 277/12313 [12:42<9:08:21, 2.73s/it] 2%|▏ | 278/12313 [12:45<9:00:58, 2.70s/it] {'loss': 0.7679, 'grad_norm': 5.1495205257748164, 'learning_rate': 3.756756756756757e-06, 'epoch': 0.02} 2%|▏ | 278/12313 [12:45<9:00:58, 2.70s/it] 2%|▏ | 279/12313 [12:47<8:54:03, 2.66s/it] {'loss': 0.7532, 'grad_norm': 5.708997533226292, 'learning_rate': 3.7702702702702705e-06, 'epoch': 0.02} 2%|▏ | 279/12313 [12:47<8:54:03, 2.66s/it] 2%|▏ | 280/12313 [12:50<8:54:58, 2.67s/it] {'loss': 0.7133, 'grad_norm': 8.83003188884847, 'learning_rate': 3.7837837837837844e-06, 'epoch': 0.02} 2%|▏ | 280/12313 [12:50<8:54:58, 2.67s/it] 2%|▏ | 281/12313 [12:53<9:01:07, 2.70s/it] {'loss': 0.7097, 'grad_norm': 5.539341071050379, 'learning_rate': 3.797297297297298e-06, 'epoch': 0.02} 2%|▏ | 281/12313 [12:53<9:01:07, 2.70s/it] 2%|▏ | 282/12313 [12:55<8:53:13, 2.66s/it] {'loss': 0.6048, 'grad_norm': 8.240407300947401, 'learning_rate': 3.810810810810811e-06, 'epoch': 0.02} 2%|▏ | 282/12313 [12:55<8:53:13, 2.66s/it] 2%|▏ | 283/12313 [12:58<9:06:47, 2.73s/it] {'loss': 0.8175, 'grad_norm': 4.587467280833204, 'learning_rate': 3.824324324324324e-06, 'epoch': 0.02} 2%|▏ | 283/12313 [12:58<9:06:47, 2.73s/it] 2%|▏ | 284/12313 [13:01<9:13:22, 2.76s/it] {'loss': 0.8436, 'grad_norm': 7.179515697903426, 'learning_rate': 3.837837837837838e-06, 'epoch': 0.02} 2%|▏ | 284/12313 [13:01<9:13:22, 2.76s/it] 2%|▏ | 285/12313 [13:04<9:13:28, 2.76s/it] {'loss': 0.625, 'grad_norm': 5.309912678695326, 'learning_rate': 3.851351351351352e-06, 'epoch': 0.02} 2%|▏ | 285/12313 [13:04<9:13:28, 2.76s/it] 2%|▏ | 286/12313 [13:07<9:08:35, 2.74s/it] {'loss': 0.6989, 'grad_norm': 4.941339777332484, 'learning_rate': 3.864864864864865e-06, 'epoch': 0.02} 2%|▏ | 286/12313 [13:07<9:08:35, 2.74s/it] 2%|▏ | 287/12313 [13:09<9:06:59, 2.73s/it] {'loss': 0.5503, 'grad_norm': 5.2052407828367215, 'learning_rate': 3.878378378378378e-06, 'epoch': 0.02} 2%|▏ | 287/12313 [13:09<9:06:59, 2.73s/it] 2%|▏ | 288/12313 [13:12<9:25:21, 2.82s/it] {'loss': 0.8404, 'grad_norm': 4.238862573234204, 'learning_rate': 3.891891891891892e-06, 'epoch': 0.02} 2%|▏ | 288/12313 [13:12<9:25:21, 2.82s/it] 2%|▏ | 289/12313 [13:15<9:14:52, 2.77s/it] {'loss': 0.6679, 'grad_norm': 9.415691207617115, 'learning_rate': 3.905405405405406e-06, 'epoch': 0.02} 2%|▏ | 289/12313 [13:15<9:14:52, 2.77s/it] 2%|▏ | 290/12313 [13:18<9:14:18, 2.77s/it] {'loss': 0.5115, 'grad_norm': 4.698876605885647, 'learning_rate': 3.918918918918919e-06, 'epoch': 0.02} 2%|▏ | 290/12313 [13:18<9:14:18, 2.77s/it] 2%|▏ | 291/12313 [13:21<9:19:08, 2.79s/it] {'loss': 0.8333, 'grad_norm': 8.590289244453471, 'learning_rate': 3.932432432432433e-06, 'epoch': 0.02} 2%|▏ | 291/12313 [13:21<9:19:08, 2.79s/it] 2%|▏ | 292/12313 [13:24<9:30:50, 2.85s/it] {'loss': 0.5686, 'grad_norm': 3.9911010631765764, 'learning_rate': 3.945945945945947e-06, 'epoch': 0.02} 2%|▏ | 292/12313 [13:24<9:30:50, 2.85s/it] 2%|▏ | 293/12313 [13:26<9:13:05, 2.76s/it] {'loss': 0.6576, 'grad_norm': 4.926542248398489, 'learning_rate': 3.95945945945946e-06, 'epoch': 0.02} 2%|▏ | 293/12313 [13:26<9:13:05, 2.76s/it] 2%|▏ | 294/12313 [13:29<9:12:24, 2.76s/it] {'loss': 0.7971, 'grad_norm': 7.808087195520982, 'learning_rate': 3.972972972972973e-06, 'epoch': 0.02} 2%|▏ | 294/12313 [13:29<9:12:24, 2.76s/it] 2%|▏ | 295/12313 [13:33<10:14:38, 3.07s/it] {'loss': 0.649, 'grad_norm': 4.140299491048756, 'learning_rate': 3.986486486486487e-06, 'epoch': 0.02} 2%|▏ | 295/12313 [13:33<10:14:38, 3.07s/it] 2%|▏ | 296/12313 [13:35<9:47:17, 2.93s/it] {'loss': 0.6101, 'grad_norm': 6.598136262319362, 'learning_rate': 4.000000000000001e-06, 'epoch': 0.02} 2%|▏ | 296/12313 [13:35<9:47:17, 2.93s/it] 2%|▏ | 297/12313 [13:38<9:39:33, 2.89s/it] {'loss': 0.5807, 'grad_norm': 6.657448338847534, 'learning_rate': 4.013513513513514e-06, 'epoch': 0.02} 2%|▏ | 297/12313 [13:38<9:39:33, 2.89s/it] 2%|▏ | 298/12313 [13:41<9:21:58, 2.81s/it] {'loss': 0.614, 'grad_norm': 6.049362925798347, 'learning_rate': 4.027027027027028e-06, 'epoch': 0.02} 2%|▏ | 298/12313 [13:41<9:21:58, 2.81s/it] 2%|▏ | 299/12313 [13:43<9:10:36, 2.75s/it] {'loss': 0.6665, 'grad_norm': 7.1798282609319415, 'learning_rate': 4.040540540540541e-06, 'epoch': 0.02} 2%|▏ | 299/12313 [13:43<9:10:36, 2.75s/it] 2%|▏ | 300/12313 [13:46<9:15:31, 2.77s/it] {'loss': 0.6214, 'grad_norm': 15.049209513289172, 'learning_rate': 4.0540540540540545e-06, 'epoch': 0.02} 2%|▏ | 300/12313 [13:46<9:15:31, 2.77s/it] 2%|▏ | 301/12313 [13:49<8:59:28, 2.69s/it] {'loss': 0.9301, 'grad_norm': 3.8053242555293796, 'learning_rate': 4.067567567567568e-06, 'epoch': 0.02} 2%|▏ | 301/12313 [13:49<8:59:28, 2.69s/it] 2%|▏ | 302/12313 [13:51<8:55:06, 2.67s/it] {'loss': 0.7352, 'grad_norm': 5.676808515237133, 'learning_rate': 4.0810810810810815e-06, 'epoch': 0.02} 2%|▏ | 302/12313 [13:51<8:55:06, 2.67s/it] 2%|▏ | 303/12313 [13:54<9:18:46, 2.79s/it] {'loss': 0.6721, 'grad_norm': 3.930706133352666, 'learning_rate': 4.0945945945945945e-06, 'epoch': 0.02} 2%|▏ | 303/12313 [13:54<9:18:46, 2.79s/it] 2%|▏ | 304/12313 [13:57<9:30:16, 2.85s/it] {'loss': 0.6061, 'grad_norm': 4.260405802198655, 'learning_rate': 4.108108108108108e-06, 'epoch': 0.02} 2%|▏ | 304/12313 [13:57<9:30:16, 2.85s/it] 2%|▏ | 305/12313 [14:00<9:23:21, 2.81s/it] {'loss': 0.7034, 'grad_norm': 3.8039307407950544, 'learning_rate': 4.121621621621622e-06, 'epoch': 0.02} 2%|▏ | 305/12313 [14:00<9:23:21, 2.81s/it] 2%|▏ | 306/12313 [14:03<9:01:31, 2.71s/it] {'loss': 0.703, 'grad_norm': 4.487160345392378, 'learning_rate': 4.135135135135135e-06, 'epoch': 0.02} 2%|▏ | 306/12313 [14:03<9:01:31, 2.71s/it] 2%|▏ | 307/12313 [14:05<9:05:06, 2.72s/it] {'loss': 0.6759, 'grad_norm': 7.679594903469531, 'learning_rate': 4.148648648648649e-06, 'epoch': 0.02} 2%|▏ | 307/12313 [14:05<9:05:06, 2.72s/it] 3%|▎ | 308/12313 [14:08<9:05:05, 2.72s/it] {'loss': 0.6642, 'grad_norm': 3.8651431834889203, 'learning_rate': 4.162162162162163e-06, 'epoch': 0.03} 3%|▎ | 308/12313 [14:08<9:05:05, 2.72s/it] 3%|▎ | 309/12313 [14:10<8:49:56, 2.65s/it] {'loss': 0.6803, 'grad_norm': 7.531093080992881, 'learning_rate': 4.175675675675676e-06, 'epoch': 0.03} 3%|▎ | 309/12313 [14:10<8:49:56, 2.65s/it] 3%|▎ | 310/12313 [14:13<8:53:24, 2.67s/it] {'loss': 0.9062, 'grad_norm': 5.135386924538244, 'learning_rate': 4.189189189189189e-06, 'epoch': 0.03} 3%|▎ | 310/12313 [14:13<8:53:24, 2.67s/it] 3%|▎ | 311/12313 [14:16<8:55:14, 2.68s/it] {'loss': 0.5363, 'grad_norm': 6.817605774173451, 'learning_rate': 4.202702702702703e-06, 'epoch': 0.03} 3%|▎ | 311/12313 [14:16<8:55:14, 2.68s/it] 3%|▎ | 312/12313 [14:19<9:02:24, 2.71s/it] {'loss': 0.7114, 'grad_norm': 4.999220532436134, 'learning_rate': 4.216216216216217e-06, 'epoch': 0.03} 3%|▎ | 312/12313 [14:19<9:02:24, 2.71s/it] 3%|▎ | 313/12313 [14:21<8:45:30, 2.63s/it] {'loss': 0.6691, 'grad_norm': 5.1328146811650495, 'learning_rate': 4.22972972972973e-06, 'epoch': 0.03} 3%|▎ | 313/12313 [14:21<8:45:30, 2.63s/it] 3%|▎ | 314/12313 [14:24<8:55:18, 2.68s/it] {'loss': 0.5742, 'grad_norm': 5.421243989150761, 'learning_rate': 4.243243243243244e-06, 'epoch': 0.03} 3%|▎ | 314/12313 [14:24<8:55:18, 2.68s/it] 3%|▎ | 315/12313 [14:27<8:55:29, 2.68s/it] {'loss': 0.6078, 'grad_norm': 5.473638240229686, 'learning_rate': 4.256756756756757e-06, 'epoch': 0.03} 3%|▎ | 315/12313 [14:27<8:55:29, 2.68s/it] 3%|▎ | 316/12313 [14:30<9:10:23, 2.75s/it] {'loss': 0.6261, 'grad_norm': 4.005836520564742, 'learning_rate': 4.270270270270271e-06, 'epoch': 0.03} 3%|▎ | 316/12313 [14:30<9:10:23, 2.75s/it] 3%|▎ | 317/12313 [14:32<9:08:03, 2.74s/it] {'loss': 0.7862, 'grad_norm': 4.474936034323712, 'learning_rate': 4.283783783783784e-06, 'epoch': 0.03} 3%|▎ | 317/12313 [14:32<9:08:03, 2.74s/it] 3%|▎ | 318/12313 [14:35<9:05:50, 2.73s/it] {'loss': 0.7268, 'grad_norm': 7.022757009649874, 'learning_rate': 4.297297297297298e-06, 'epoch': 0.03} 3%|▎ | 318/12313 [14:35<9:05:50, 2.73s/it] 3%|▎ | 319/12313 [14:38<9:03:45, 2.72s/it] {'loss': 0.9672, 'grad_norm': 4.756298282781381, 'learning_rate': 4.310810810810811e-06, 'epoch': 0.03} 3%|▎ | 319/12313 [14:38<9:03:45, 2.72s/it] 3%|▎ | 320/12313 [14:40<9:01:19, 2.71s/it] {'loss': 0.7338, 'grad_norm': 3.7972155322987797, 'learning_rate': 4.324324324324325e-06, 'epoch': 0.03} 3%|▎ | 320/12313 [14:40<9:01:19, 2.71s/it] 3%|▎ | 321/12313 [14:43<9:07:21, 2.74s/it] {'loss': 0.618, 'grad_norm': 6.4304814258504, 'learning_rate': 4.3378378378378385e-06, 'epoch': 0.03} 3%|▎ | 321/12313 [14:43<9:07:21, 2.74s/it] 3%|▎ | 322/12313 [14:46<8:48:50, 2.65s/it] {'loss': 0.752, 'grad_norm': 4.985880984448781, 'learning_rate': 4.351351351351352e-06, 'epoch': 0.03} 3%|▎ | 322/12313 [14:46<8:48:50, 2.65s/it] 3%|▎ | 323/12313 [14:48<8:49:49, 2.65s/it] {'loss': 0.6961, 'grad_norm': 6.965800077960176, 'learning_rate': 4.364864864864865e-06, 'epoch': 0.03} 3%|▎ | 323/12313 [14:48<8:49:49, 2.65s/it] 3%|▎ | 324/12313 [14:51<8:39:12, 2.60s/it] {'loss': 0.68, 'grad_norm': 5.990584472824659, 'learning_rate': 4.378378378378379e-06, 'epoch': 0.03} 3%|▎ | 324/12313 [14:51<8:39:12, 2.60s/it] 3%|▎ | 325/12313 [14:54<9:04:31, 2.73s/it] {'loss': 0.6824, 'grad_norm': 3.9564290883110647, 'learning_rate': 4.391891891891892e-06, 'epoch': 0.03} 3%|▎ | 325/12313 [14:54<9:04:31, 2.73s/it] 3%|▎ | 326/12313 [14:57<9:11:08, 2.76s/it] {'loss': 0.7004, 'grad_norm': 4.712991398333253, 'learning_rate': 4.4054054054054054e-06, 'epoch': 0.03} 3%|▎ | 326/12313 [14:57<9:11:08, 2.76s/it] 3%|▎ | 327/12313 [14:59<9:02:21, 2.71s/it] {'loss': 0.5713, 'grad_norm': 4.011893314732997, 'learning_rate': 4.418918918918919e-06, 'epoch': 0.03} 3%|▎ | 327/12313 [14:59<9:02:21, 2.71s/it] 3%|▎ | 328/12313 [15:02<9:02:34, 2.72s/it] {'loss': 0.5751, 'grad_norm': 10.789632918330955, 'learning_rate': 4.432432432432433e-06, 'epoch': 0.03} 3%|▎ | 328/12313 [15:02<9:02:34, 2.72s/it] 3%|▎ | 329/12313 [15:04<8:56:02, 2.68s/it] {'loss': 0.621, 'grad_norm': 5.480368608439147, 'learning_rate': 4.445945945945946e-06, 'epoch': 0.03} 3%|▎ | 329/12313 [15:04<8:56:02, 2.68s/it] 3%|▎ | 330/12313 [15:07<8:58:10, 2.69s/it] {'loss': 0.6287, 'grad_norm': 5.7028472331846, 'learning_rate': 4.45945945945946e-06, 'epoch': 0.03} 3%|▎ | 330/12313 [15:07<8:58:10, 2.69s/it] 3%|▎ | 331/12313 [15:10<8:56:57, 2.69s/it] {'loss': 0.7512, 'grad_norm': 4.278443647971252, 'learning_rate': 4.472972972972973e-06, 'epoch': 0.03} 3%|▎ | 331/12313 [15:10<8:56:57, 2.69s/it] 3%|▎ | 332/12313 [15:13<8:57:58, 2.69s/it] {'loss': 0.6958, 'grad_norm': 5.540973909463085, 'learning_rate': 4.486486486486487e-06, 'epoch': 0.03} 3%|▎ | 332/12313 [15:13<8:57:58, 2.69s/it] 3%|▎ | 333/12313 [15:15<9:02:38, 2.72s/it] {'loss': 0.6622, 'grad_norm': 8.48363442658351, 'learning_rate': 4.5e-06, 'epoch': 0.03} 3%|▎ | 333/12313 [15:15<9:02:38, 2.72s/it] 3%|▎ | 334/12313 [15:18<9:14:14, 2.78s/it] {'loss': 0.7521, 'grad_norm': 5.265277541309016, 'learning_rate': 4.513513513513514e-06, 'epoch': 0.03} 3%|▎ | 334/12313 [15:18<9:14:14, 2.78s/it] 3%|▎ | 335/12313 [15:21<9:11:04, 2.76s/it] {'loss': 0.5695, 'grad_norm': 6.782051797558551, 'learning_rate': 4.527027027027027e-06, 'epoch': 0.03} 3%|▎ | 335/12313 [15:21<9:11:04, 2.76s/it] 3%|▎ | 336/12313 [15:24<9:15:14, 2.78s/it] {'loss': 0.7251, 'grad_norm': 7.707736747999393, 'learning_rate': 4.540540540540541e-06, 'epoch': 0.03} 3%|▎ | 336/12313 [15:24<9:15:14, 2.78s/it] 3%|▎ | 337/12313 [15:26<9:05:31, 2.73s/it] {'loss': 0.7112, 'grad_norm': 9.996464962694214, 'learning_rate': 4.554054054054055e-06, 'epoch': 0.03} 3%|▎ | 337/12313 [15:26<9:05:31, 2.73s/it] 3%|▎ | 338/12313 [15:29<8:45:01, 2.63s/it] {'loss': 0.7041, 'grad_norm': 5.660779752011064, 'learning_rate': 4.567567567567568e-06, 'epoch': 0.03} 3%|▎ | 338/12313 [15:29<8:45:01, 2.63s/it] 3%|▎ | 339/12313 [15:31<8:41:56, 2.62s/it] {'loss': 0.7357, 'grad_norm': 6.58420296764706, 'learning_rate': 4.581081081081081e-06, 'epoch': 0.03} 3%|▎ | 339/12313 [15:31<8:41:56, 2.62s/it] 3%|▎ | 340/12313 [15:34<8:57:03, 2.69s/it] {'loss': 0.8672, 'grad_norm': 6.154533828404867, 'learning_rate': 4.594594594594596e-06, 'epoch': 0.03} 3%|▎ | 340/12313 [15:34<8:57:03, 2.69s/it] 3%|▎ | 341/12313 [15:37<9:05:59, 2.74s/it] {'loss': 0.7116, 'grad_norm': 3.6916825043347874, 'learning_rate': 4.608108108108109e-06, 'epoch': 0.03} 3%|▎ | 341/12313 [15:37<9:05:59, 2.74s/it] 3%|▎ | 342/12313 [15:40<9:03:26, 2.72s/it] {'loss': 0.5388, 'grad_norm': 5.536606059040381, 'learning_rate': 4.621621621621622e-06, 'epoch': 0.03} 3%|▎ | 342/12313 [15:40<9:03:26, 2.72s/it] 3%|▎ | 343/12313 [15:42<8:57:02, 2.69s/it] {'loss': 0.63, 'grad_norm': 5.0131032158949305, 'learning_rate': 4.635135135135136e-06, 'epoch': 0.03} 3%|▎ | 343/12313 [15:42<8:57:02, 2.69s/it] 3%|▎ | 344/12313 [15:45<8:43:25, 2.62s/it] {'loss': 0.6898, 'grad_norm': 8.645264812499297, 'learning_rate': 4.6486486486486495e-06, 'epoch': 0.03} 3%|▎ | 344/12313 [15:45<8:43:25, 2.62s/it] 3%|▎ | 345/12313 [15:48<8:52:44, 2.67s/it] {'loss': 0.7366, 'grad_norm': 8.845007826985789, 'learning_rate': 4.6621621621621625e-06, 'epoch': 0.03} 3%|▎ | 345/12313 [15:48<8:52:44, 2.67s/it] 3%|▎ | 346/12313 [15:50<8:56:27, 2.69s/it] {'loss': 0.6626, 'grad_norm': 6.800329318246896, 'learning_rate': 4.675675675675676e-06, 'epoch': 0.03} 3%|▎ | 346/12313 [15:50<8:56:27, 2.69s/it] 3%|▎ | 347/12313 [15:53<9:17:41, 2.80s/it] {'loss': 0.9352, 'grad_norm': 3.843752947734245, 'learning_rate': 4.6891891891891895e-06, 'epoch': 0.03} 3%|▎ | 347/12313 [15:53<9:17:41, 2.80s/it] 3%|▎ | 348/12313 [15:56<9:07:17, 2.74s/it] {'loss': 0.558, 'grad_norm': 4.717520289005672, 'learning_rate': 4.702702702702703e-06, 'epoch': 0.03} 3%|▎ | 348/12313 [15:56<9:07:17, 2.74s/it] 3%|▎ | 349/12313 [15:59<8:54:38, 2.68s/it] {'loss': 0.6733, 'grad_norm': 5.18843657142752, 'learning_rate': 4.716216216216216e-06, 'epoch': 0.03} 3%|▎ | 349/12313 [15:59<8:54:38, 2.68s/it] 3%|▎ | 350/12313 [16:02<9:11:34, 2.77s/it] {'loss': 0.7618, 'grad_norm': 6.203969455338302, 'learning_rate': 4.72972972972973e-06, 'epoch': 0.03} 3%|▎ | 350/12313 [16:02<9:11:34, 2.77s/it] 3%|▎ | 351/12313 [16:04<9:08:26, 2.75s/it] {'loss': 0.6203, 'grad_norm': 5.787025528310779, 'learning_rate': 4.743243243243243e-06, 'epoch': 0.03} 3%|▎ | 351/12313 [16:04<9:08:26, 2.75s/it] 3%|▎ | 352/12313 [16:07<9:14:55, 2.78s/it] {'loss': 0.7899, 'grad_norm': 5.74015922758883, 'learning_rate': 4.756756756756757e-06, 'epoch': 0.03} 3%|▎ | 352/12313 [16:07<9:14:55, 2.78s/it] 3%|▎ | 353/12313 [16:10<9:17:04, 2.79s/it] {'loss': 0.5911, 'grad_norm': 3.468789863957611, 'learning_rate': 4.770270270270271e-06, 'epoch': 0.03} 3%|▎ | 353/12313 [16:10<9:17:04, 2.79s/it] 3%|▎ | 354/12313 [16:13<9:03:29, 2.73s/it] {'loss': 0.6077, 'grad_norm': 4.222539599204078, 'learning_rate': 4.783783783783784e-06, 'epoch': 0.03} 3%|▎ | 354/12313 [16:13<9:03:29, 2.73s/it] 3%|▎ | 355/12313 [16:15<8:47:43, 2.65s/it] {'loss': 0.631, 'grad_norm': 5.987825387101823, 'learning_rate': 4.797297297297297e-06, 'epoch': 0.03} 3%|▎ | 355/12313 [16:15<8:47:43, 2.65s/it] 3%|▎ | 356/12313 [16:18<9:05:54, 2.74s/it] {'loss': 0.6175, 'grad_norm': 3.494510978483797, 'learning_rate': 4.810810810810811e-06, 'epoch': 0.03} 3%|▎ | 356/12313 [16:18<9:05:54, 2.74s/it] 3%|▎ | 357/12313 [16:20<8:48:21, 2.65s/it] {'loss': 0.8241, 'grad_norm': 5.341804933211992, 'learning_rate': 4.824324324324325e-06, 'epoch': 0.03} 3%|▎ | 357/12313 [16:20<8:48:21, 2.65s/it] 3%|▎ | 358/12313 [16:23<8:35:05, 2.59s/it] {'loss': 0.6647, 'grad_norm': 5.332752685486192, 'learning_rate': 4.837837837837838e-06, 'epoch': 0.03} 3%|▎ | 358/12313 [16:23<8:35:05, 2.59s/it] 3%|▎ | 359/12313 [16:25<8:32:18, 2.57s/it] {'loss': 0.681, 'grad_norm': 4.174093461478239, 'learning_rate': 4.851351351351352e-06, 'epoch': 0.03} 3%|▎ | 359/12313 [16:25<8:32:18, 2.57s/it] 3%|▎ | 360/12313 [16:28<8:37:56, 2.60s/it] {'loss': 0.7803, 'grad_norm': 4.796111353710757, 'learning_rate': 4.864864864864866e-06, 'epoch': 0.03} 3%|▎ | 360/12313 [16:28<8:37:56, 2.60s/it] 3%|▎ | 361/12313 [16:31<8:49:32, 2.66s/it] {'loss': 0.5628, 'grad_norm': 4.430094054311749, 'learning_rate': 4.878378378378379e-06, 'epoch': 0.03} 3%|▎ | 361/12313 [16:31<8:49:32, 2.66s/it] 3%|▎ | 362/12313 [16:34<8:53:32, 2.68s/it] {'loss': 0.6637, 'grad_norm': 4.736436195711088, 'learning_rate': 4.891891891891893e-06, 'epoch': 0.03} 3%|▎ | 362/12313 [16:34<8:53:32, 2.68s/it] 3%|▎ | 363/12313 [16:36<8:45:00, 2.64s/it] {'loss': 0.6167, 'grad_norm': 5.52908335708342, 'learning_rate': 4.905405405405406e-06, 'epoch': 0.03} 3%|▎ | 363/12313 [16:36<8:45:00, 2.64s/it] 3%|▎ | 364/12313 [16:39<8:51:49, 2.67s/it] {'loss': 0.7164, 'grad_norm': 5.203669203589008, 'learning_rate': 4.91891891891892e-06, 'epoch': 0.03} 3%|▎ | 364/12313 [16:39<8:51:49, 2.67s/it] 3%|▎ | 365/12313 [16:41<8:45:01, 2.64s/it] {'loss': 0.6265, 'grad_norm': 5.632857406640382, 'learning_rate': 4.932432432432433e-06, 'epoch': 0.03} 3%|▎ | 365/12313 [16:41<8:45:01, 2.64s/it] 3%|▎ | 366/12313 [16:44<8:38:28, 2.60s/it] {'loss': 0.6532, 'grad_norm': 4.525898082221645, 'learning_rate': 4.9459459459459466e-06, 'epoch': 0.03} 3%|▎ | 366/12313 [16:44<8:38:28, 2.60s/it] 3%|▎ | 367/12313 [16:46<8:27:41, 2.55s/it] {'loss': 0.6465, 'grad_norm': 3.993279600481055, 'learning_rate': 4.95945945945946e-06, 'epoch': 0.03} 3%|▎ | 367/12313 [16:46<8:27:41, 2.55s/it] 3%|▎ | 368/12313 [16:49<8:26:18, 2.54s/it] {'loss': 0.6768, 'grad_norm': 4.976392222325483, 'learning_rate': 4.9729729729729735e-06, 'epoch': 0.03} 3%|▎ | 368/12313 [16:49<8:26:18, 2.54s/it] 3%|▎ | 369/12313 [16:52<8:41:15, 2.62s/it] {'loss': 0.7201, 'grad_norm': 7.027194066182947, 'learning_rate': 4.986486486486487e-06, 'epoch': 0.03} 3%|▎ | 369/12313 [16:52<8:41:15, 2.62s/it] 3%|▎ | 370/12313 [16:54<8:38:23, 2.60s/it] {'loss': 0.6772, 'grad_norm': 5.6159327874829605, 'learning_rate': 5e-06, 'epoch': 0.03} 3%|▎ | 370/12313 [16:54<8:38:23, 2.60s/it] 3%|▎ | 371/12313 [16:57<8:49:42, 2.66s/it] {'loss': 0.6551, 'grad_norm': 4.876745048221957, 'learning_rate': 4.999999913506616e-06, 'epoch': 0.03} 3%|▎ | 371/12313 [16:57<8:49:42, 2.66s/it] 3%|▎ | 372/12313 [16:59<8:35:10, 2.59s/it] {'loss': 0.5551, 'grad_norm': 11.157591685761751, 'learning_rate': 4.999999654026468e-06, 'epoch': 0.03} 3%|▎ | 372/12313 [16:59<8:35:10, 2.59s/it] 3%|▎ | 373/12313 [17:02<8:41:18, 2.62s/it] {'loss': 0.7666, 'grad_norm': 4.729581532775085, 'learning_rate': 4.999999221559576e-06, 'epoch': 0.03} 3%|▎ | 373/12313 [17:02<8:41:18, 2.62s/it] 3%|▎ | 374/12313 [17:05<9:02:00, 2.72s/it] {'loss': 0.7107, 'grad_norm': 5.683222450900519, 'learning_rate': 4.9999986161059685e-06, 'epoch': 0.03} 3%|▎ | 374/12313 [17:05<9:02:00, 2.72s/it] 3%|▎ | 375/12313 [17:08<9:00:22, 2.72s/it] {'loss': 0.5651, 'grad_norm': 20.13916455542101, 'learning_rate': 4.9999978376656875e-06, 'epoch': 0.03} 3%|▎ | 375/12313 [17:08<9:00:22, 2.72s/it] 3%|▎ | 376/12313 [17:10<8:53:39, 2.68s/it] {'loss': 0.6166, 'grad_norm': 4.663636959572572, 'learning_rate': 4.999996886238788e-06, 'epoch': 0.03} 3%|▎ | 376/12313 [17:10<8:53:39, 2.68s/it] 3%|▎ | 377/12313 [17:14<9:20:35, 2.82s/it] {'loss': 0.6379, 'grad_norm': 3.547301470431729, 'learning_rate': 4.999995761825335e-06, 'epoch': 0.03} 3%|▎ | 377/12313 [17:14<9:20:35, 2.82s/it] 3%|▎ | 378/12313 [17:17<9:28:24, 2.86s/it] {'loss': 0.6795, 'grad_norm': 4.026761047432411, 'learning_rate': 4.999994464425406e-06, 'epoch': 0.03} 3%|▎ | 378/12313 [17:17<9:28:24, 2.86s/it] 3%|▎ | 379/12313 [17:19<9:20:15, 2.82s/it] {'loss': 0.6356, 'grad_norm': 4.177399049954423, 'learning_rate': 4.99999299403909e-06, 'epoch': 0.03} 3%|▎ | 379/12313 [17:19<9:20:15, 2.82s/it] 3%|▎ | 380/12313 [17:22<9:09:14, 2.76s/it] {'loss': 0.5655, 'grad_norm': 8.689971440287565, 'learning_rate': 4.999991350666491e-06, 'epoch': 0.03} 3%|▎ | 380/12313 [17:22<9:09:14, 2.76s/it] 3%|▎ | 381/12313 [17:25<9:05:47, 2.74s/it] {'loss': 0.7988, 'grad_norm': 4.286085314862286, 'learning_rate': 4.999989534307722e-06, 'epoch': 0.03} 3%|▎ | 381/12313 [17:25<9:05:47, 2.74s/it] 3%|▎ | 382/12313 [17:27<9:14:53, 2.79s/it] {'loss': 0.6375, 'grad_norm': 4.468190941115001, 'learning_rate': 4.999987544962908e-06, 'epoch': 0.03} 3%|▎ | 382/12313 [17:27<9:14:53, 2.79s/it] 3%|▎ | 383/12313 [17:30<9:10:38, 2.77s/it] {'loss': 0.6834, 'grad_norm': 11.088261247896137, 'learning_rate': 4.999985382632186e-06, 'epoch': 0.03} 3%|▎ | 383/12313 [17:30<9:10:38, 2.77s/it] 3%|▎ | 384/12313 [17:33<9:20:23, 2.82s/it] {'loss': 0.6825, 'grad_norm': 4.746884755745409, 'learning_rate': 4.9999830473157065e-06, 'epoch': 0.03} 3%|▎ | 384/12313 [17:33<9:20:23, 2.82s/it] 3%|▎ | 385/12313 [17:36<9:14:07, 2.79s/it] {'loss': 0.6341, 'grad_norm': 5.042936323367531, 'learning_rate': 4.9999805390136315e-06, 'epoch': 0.03} 3%|▎ | 385/12313 [17:36<9:14:07, 2.79s/it] 3%|▎ | 386/12313 [17:39<9:14:15, 2.79s/it] {'loss': 0.5923, 'grad_norm': 4.6209559647373375, 'learning_rate': 4.999977857726135e-06, 'epoch': 0.03} 3%|▎ | 386/12313 [17:39<9:14:15, 2.79s/it] 3%|▎ | 387/12313 [17:42<9:22:37, 2.83s/it] {'loss': 0.6442, 'grad_norm': 5.563156193085517, 'learning_rate': 4.999975003453401e-06, 'epoch': 0.03} 3%|▎ | 387/12313 [17:42<9:22:37, 2.83s/it] 3%|▎ | 388/12313 [17:44<9:11:10, 2.77s/it] {'loss': 0.6856, 'grad_norm': 5.076759188588105, 'learning_rate': 4.999971976195628e-06, 'epoch': 0.03} 3%|▎ | 388/12313 [17:44<9:11:10, 2.77s/it] 3%|▎ | 389/12313 [17:47<8:52:04, 2.68s/it] {'loss': 0.7078, 'grad_norm': 3.752599670476179, 'learning_rate': 4.999968775953025e-06, 'epoch': 0.03} 3%|▎ | 389/12313 [17:47<8:52:04, 2.68s/it] 3%|▎ | 390/12313 [17:49<8:49:52, 2.67s/it] {'loss': 0.6747, 'grad_norm': 4.540276130175966, 'learning_rate': 4.999965402725812e-06, 'epoch': 0.03} 3%|▎ | 390/12313 [17:49<8:49:52, 2.67s/it] 3%|▎ | 391/12313 [17:52<8:52:20, 2.68s/it] {'loss': 0.58, 'grad_norm': 6.85554781706532, 'learning_rate': 4.999961856514226e-06, 'epoch': 0.03} 3%|▎ | 391/12313 [17:52<8:52:20, 2.68s/it] 3%|▎ | 392/12313 [17:55<9:03:11, 2.73s/it] {'loss': 0.544, 'grad_norm': 4.470561799528824, 'learning_rate': 4.99995813731851e-06, 'epoch': 0.03} 3%|▎ | 392/12313 [17:55<9:03:11, 2.73s/it] 3%|▎ | 393/12313 [17:57<8:51:07, 2.67s/it] {'loss': 0.6551, 'grad_norm': 3.381515929810499, 'learning_rate': 4.999954245138921e-06, 'epoch': 0.03} 3%|▎ | 393/12313 [17:57<8:51:07, 2.67s/it] 3%|▎ | 394/12313 [18:00<8:45:17, 2.64s/it] {'loss': 0.7476, 'grad_norm': 5.642691704207667, 'learning_rate': 4.99995017997573e-06, 'epoch': 0.03} 3%|▎ | 394/12313 [18:00<8:45:17, 2.64s/it] 3%|▎ | 395/12313 [18:03<8:56:23, 2.70s/it] {'loss': 0.6501, 'grad_norm': 4.423437336138073, 'learning_rate': 4.999945941829217e-06, 'epoch': 0.03} 3%|▎ | 395/12313 [18:03<8:56:23, 2.70s/it] 3%|▎ | 396/12313 [18:05<8:46:23, 2.65s/it] {'loss': 0.5093, 'grad_norm': 5.470944485898885, 'learning_rate': 4.999941530699675e-06, 'epoch': 0.03} 3%|▎ | 396/12313 [18:05<8:46:23, 2.65s/it] 3%|▎ | 397/12313 [18:08<8:40:11, 2.62s/it] {'loss': 0.5785, 'grad_norm': 6.234514653013104, 'learning_rate': 4.999936946587412e-06, 'epoch': 0.03} 3%|▎ | 397/12313 [18:08<8:40:11, 2.62s/it] 3%|▎ | 398/12313 [18:10<8:39:04, 2.61s/it] {'loss': 0.6715, 'grad_norm': 4.217022001267068, 'learning_rate': 4.999932189492741e-06, 'epoch': 0.03} 3%|▎ | 398/12313 [18:10<8:39:04, 2.61s/it] 3%|▎ | 399/12313 [18:13<8:27:56, 2.56s/it] {'loss': 0.6798, 'grad_norm': 3.997512319922242, 'learning_rate': 4.999927259415994e-06, 'epoch': 0.03} 3%|▎ | 399/12313 [18:13<8:27:56, 2.56s/it] 3%|▎ | 400/12313 [18:16<8:32:13, 2.58s/it] {'loss': 0.581, 'grad_norm': 5.231675871788692, 'learning_rate': 4.99992215635751e-06, 'epoch': 0.03} 3%|▎ | 400/12313 [18:16<8:32:13, 2.58s/it] 3%|▎ | 401/12313 [18:19<9:17:47, 2.81s/it] {'loss': 0.608, 'grad_norm': 6.544773650290789, 'learning_rate': 4.999916880317645e-06, 'epoch': 0.03} 3%|▎ | 401/12313 [18:19<9:17:47, 2.81s/it] 3%|▎ | 402/12313 [18:22<9:07:43, 2.76s/it] {'loss': 0.6306, 'grad_norm': 5.966992739092824, 'learning_rate': 4.999911431296762e-06, 'epoch': 0.03} 3%|▎ | 402/12313 [18:22<9:07:43, 2.76s/it] 3%|▎ | 403/12313 [18:24<9:02:31, 2.73s/it] {'loss': 0.7834, 'grad_norm': 9.143165980782102, 'learning_rate': 4.999905809295239e-06, 'epoch': 0.03} 3%|▎ | 403/12313 [18:24<9:02:31, 2.73s/it] 3%|▎ | 404/12313 [18:27<8:55:50, 2.70s/it] {'loss': 0.5444, 'grad_norm': 5.104500215594029, 'learning_rate': 4.999900014313464e-06, 'epoch': 0.03} 3%|▎ | 404/12313 [18:27<8:55:50, 2.70s/it] 3%|▎ | 405/12313 [18:29<8:51:52, 2.68s/it] {'loss': 0.6663, 'grad_norm': 3.982825484617581, 'learning_rate': 4.999894046351839e-06, 'epoch': 0.03} 3%|▎ | 405/12313 [18:29<8:51:52, 2.68s/it] 3%|▎ | 406/12313 [18:32<8:50:21, 2.67s/it] {'loss': 0.6111, 'grad_norm': 9.429533561374097, 'learning_rate': 4.999887905410775e-06, 'epoch': 0.03} 3%|▎ | 406/12313 [18:32<8:50:21, 2.67s/it] 3%|▎ | 407/12313 [18:35<9:05:51, 2.75s/it] {'loss': 0.6354, 'grad_norm': 6.206743996051552, 'learning_rate': 4.9998815914907e-06, 'epoch': 0.03} 3%|▎ | 407/12313 [18:35<9:05:51, 2.75s/it] 3%|▎ | 408/12313 [18:38<9:02:12, 2.73s/it] {'loss': 0.5287, 'grad_norm': 8.266987795691529, 'learning_rate': 4.9998751045920494e-06, 'epoch': 0.03} 3%|▎ | 408/12313 [18:38<9:02:12, 2.73s/it] 3%|▎ | 409/12313 [18:40<9:00:40, 2.73s/it] {'loss': 0.7346, 'grad_norm': 5.87817379773943, 'learning_rate': 4.999868444715271e-06, 'epoch': 0.03} 3%|▎ | 409/12313 [18:40<9:00:40, 2.73s/it] 3%|▎ | 410/12313 [18:43<8:48:07, 2.66s/it] {'loss': 0.702, 'grad_norm': 5.524749698493624, 'learning_rate': 4.999861611860827e-06, 'epoch': 0.03} 3%|▎ | 410/12313 [18:43<8:48:07, 2.66s/it] 3%|▎ | 411/12313 [18:46<8:43:28, 2.64s/it] {'loss': 0.6599, 'grad_norm': 4.172513273585029, 'learning_rate': 4.99985460602919e-06, 'epoch': 0.03} 3%|▎ | 411/12313 [18:46<8:43:28, 2.64s/it] 3%|▎ | 412/12313 [18:48<8:45:22, 2.65s/it] {'loss': 0.5904, 'grad_norm': 4.750265866846706, 'learning_rate': 4.9998474272208445e-06, 'epoch': 0.03} 3%|▎ | 412/12313 [18:48<8:45:22, 2.65s/it] 3%|▎ | 413/12313 [18:51<8:28:24, 2.56s/it] {'loss': 0.5972, 'grad_norm': 5.899694142476412, 'learning_rate': 4.999840075436286e-06, 'epoch': 0.03} 3%|▎ | 413/12313 [18:51<8:28:24, 2.56s/it] 3%|▎ | 414/12313 [18:53<8:41:55, 2.63s/it] {'loss': 0.5492, 'grad_norm': 8.320548493491238, 'learning_rate': 4.999832550676026e-06, 'epoch': 0.03} 3%|▎ | 414/12313 [18:53<8:41:55, 2.63s/it] 3%|▎ | 415/12313 [18:56<8:35:38, 2.60s/it] {'loss': 0.7786, 'grad_norm': 4.509397796756542, 'learning_rate': 4.999824852940583e-06, 'epoch': 0.03} 3%|▎ | 415/12313 [18:56<8:35:38, 2.60s/it] 3%|▎ | 416/12313 [18:58<8:33:31, 2.59s/it] {'loss': 0.7635, 'grad_norm': 5.616284004220855, 'learning_rate': 4.999816982230491e-06, 'epoch': 0.03} 3%|▎ | 416/12313 [18:58<8:33:31, 2.59s/it] 3%|▎ | 417/12313 [19:01<8:34:02, 2.59s/it] {'loss': 0.759, 'grad_norm': 5.599669292826977, 'learning_rate': 4.999808938546294e-06, 'epoch': 0.03} 3%|▎ | 417/12313 [19:01<8:34:02, 2.59s/it] 3%|▎ | 418/12313 [19:04<8:32:38, 2.59s/it] {'loss': 0.6626, 'grad_norm': 6.6642180967818945, 'learning_rate': 4.999800721888548e-06, 'epoch': 0.03} 3%|▎ | 418/12313 [19:04<8:32:38, 2.59s/it] 3%|▎ | 419/12313 [19:06<8:34:13, 2.59s/it] {'loss': 0.8418, 'grad_norm': 5.882885604144428, 'learning_rate': 4.999792332257822e-06, 'epoch': 0.03} 3%|▎ | 419/12313 [19:06<8:34:13, 2.59s/it] 3%|▎ | 420/12313 [19:09<8:47:59, 2.66s/it] {'loss': 0.6134, 'grad_norm': 4.363979980545484, 'learning_rate': 4.999783769654697e-06, 'epoch': 0.03} 3%|▎ | 420/12313 [19:09<8:47:59, 2.66s/it] 3%|▎ | 421/12313 [19:12<8:33:30, 2.59s/it] {'loss': 0.7331, 'grad_norm': 4.776238235905251, 'learning_rate': 4.999775034079765e-06, 'epoch': 0.03} 3%|▎ | 421/12313 [19:12<8:33:30, 2.59s/it] 3%|▎ | 422/12313 [19:14<8:35:24, 2.60s/it] {'loss': 0.692, 'grad_norm': 4.945197802004096, 'learning_rate': 4.99976612553363e-06, 'epoch': 0.03} 3%|▎ | 422/12313 [19:14<8:35:24, 2.60s/it] 3%|▎ | 423/12313 [19:17<8:38:41, 2.62s/it] {'loss': 0.7029, 'grad_norm': 3.1538233404162628, 'learning_rate': 4.999757044016909e-06, 'epoch': 0.03} 3%|▎ | 423/12313 [19:17<8:38:41, 2.62s/it] 3%|▎ | 424/12313 [19:19<8:39:50, 2.62s/it] {'loss': 0.6652, 'grad_norm': 5.660874509416241, 'learning_rate': 4.99974778953023e-06, 'epoch': 0.03} 3%|▎ | 424/12313 [19:19<8:39:50, 2.62s/it] 3%|▎ | 425/12313 [19:22<8:44:15, 2.65s/it] {'loss': 0.7189, 'grad_norm': 3.5726277574942857, 'learning_rate': 4.9997383620742354e-06, 'epoch': 0.03} 3%|▎ | 425/12313 [19:22<8:44:15, 2.65s/it] 3%|▎ | 426/12313 [19:25<8:33:52, 2.59s/it] {'loss': 0.6376, 'grad_norm': 5.076928873914476, 'learning_rate': 4.9997287616495745e-06, 'epoch': 0.03} 3%|▎ | 426/12313 [19:25<8:33:52, 2.59s/it] 3%|▎ | 427/12313 [19:27<8:31:06, 2.58s/it] {'loss': 0.6691, 'grad_norm': 7.805331425608438, 'learning_rate': 4.999718988256913e-06, 'epoch': 0.03} 3%|▎ | 427/12313 [19:27<8:31:06, 2.58s/it] 3%|▎ | 428/12313 [19:30<8:38:54, 2.62s/it] {'loss': 0.6528, 'grad_norm': 6.422161317682913, 'learning_rate': 4.999709041896927e-06, 'epoch': 0.03} 3%|▎ | 428/12313 [19:30<8:38:54, 2.62s/it] 3%|▎ | 429/12313 [19:33<8:48:39, 2.67s/it] {'loss': 0.5714, 'grad_norm': 4.561480725928831, 'learning_rate': 4.9996989225703055e-06, 'epoch': 0.03} 3%|▎ | 429/12313 [19:33<8:48:39, 2.67s/it] 3%|▎ | 430/12313 [19:35<8:56:19, 2.71s/it] {'loss': 0.6321, 'grad_norm': 5.799407616311238, 'learning_rate': 4.9996886302777466e-06, 'epoch': 0.03} 3%|▎ | 430/12313 [19:35<8:56:19, 2.71s/it] 4%|▎ | 431/12313 [19:38<8:53:20, 2.69s/it] {'loss': 0.6045, 'grad_norm': 6.86134449223638, 'learning_rate': 4.9996781650199655e-06, 'epoch': 0.04} 4%|▎ | 431/12313 [19:38<8:53:20, 2.69s/it] 4%|▎ | 432/12313 [19:41<8:57:24, 2.71s/it] {'loss': 0.8719, 'grad_norm': 4.7834450971379034, 'learning_rate': 4.999667526797685e-06, 'epoch': 0.04} 4%|▎ | 432/12313 [19:41<8:57:24, 2.71s/it] 4%|▎ | 433/12313 [19:43<8:39:45, 2.63s/it] {'loss': 0.7095, 'grad_norm': 8.624024638101803, 'learning_rate': 4.9996567156116395e-06, 'epoch': 0.04} 4%|▎ | 433/12313 [19:43<8:39:45, 2.63s/it] 4%|▎ | 434/12313 [19:46<8:27:05, 2.56s/it] {'loss': 0.5537, 'grad_norm': 4.366180557531383, 'learning_rate': 4.9996457314625794e-06, 'epoch': 0.04} 4%|▎ | 434/12313 [19:46<8:27:05, 2.56s/it] 4%|▎ | 435/12313 [19:48<8:28:41, 2.57s/it] {'loss': 0.8241, 'grad_norm': 5.128282050398408, 'learning_rate': 4.9996345743512635e-06, 'epoch': 0.04} 4%|▎ | 435/12313 [19:48<8:28:41, 2.57s/it] 4%|▎ | 436/12313 [19:51<8:38:19, 2.62s/it] {'loss': 0.6241, 'grad_norm': 5.611359253780588, 'learning_rate': 4.999623244278464e-06, 'epoch': 0.04} 4%|▎ | 436/12313 [19:51<8:38:19, 2.62s/it] 4%|▎ | 437/12313 [19:54<8:39:16, 2.62s/it] {'loss': 0.5835, 'grad_norm': 14.430464729028463, 'learning_rate': 4.999611741244965e-06, 'epoch': 0.04} 4%|▎ | 437/12313 [19:54<8:39:16, 2.62s/it] 4%|▎ | 438/12313 [19:56<8:29:41, 2.58s/it] {'loss': 0.6353, 'grad_norm': 5.4664011456664285, 'learning_rate': 4.999600065251563e-06, 'epoch': 0.04} 4%|▎ | 438/12313 [19:56<8:29:41, 2.58s/it] 4%|▎ | 439/12313 [19:59<8:37:22, 2.61s/it] {'loss': 0.5976, 'grad_norm': 3.9622611695813172, 'learning_rate': 4.999588216299065e-06, 'epoch': 0.04} 4%|▎ | 439/12313 [19:59<8:37:22, 2.61s/it] 4%|▎ | 440/12313 [20:02<8:42:18, 2.64s/it] {'loss': 0.7192, 'grad_norm': 5.613516770561363, 'learning_rate': 4.999576194388292e-06, 'epoch': 0.04} 4%|▎ | 440/12313 [20:02<8:42:18, 2.64s/it] 4%|▎ | 441/12313 [20:04<8:40:49, 2.63s/it] {'loss': 0.587, 'grad_norm': 4.503094326677596, 'learning_rate': 4.999563999520075e-06, 'epoch': 0.04} 4%|▎ | 441/12313 [20:04<8:40:49, 2.63s/it] 4%|▎ | 442/12313 [20:07<9:19:52, 2.83s/it] {'loss': 0.4959, 'grad_norm': 4.636439079968228, 'learning_rate': 4.999551631695257e-06, 'epoch': 0.04} 4%|▎ | 442/12313 [20:07<9:19:52, 2.83s/it] 4%|▎ | 443/12313 [20:10<9:21:04, 2.84s/it] {'loss': 0.8282, 'grad_norm': 4.842745847770766, 'learning_rate': 4.999539090914696e-06, 'epoch': 0.04} 4%|▎ | 443/12313 [20:10<9:21:04, 2.84s/it] 4%|▎ | 444/12313 [20:13<9:13:14, 2.80s/it] {'loss': 0.6707, 'grad_norm': 19.890875208483, 'learning_rate': 4.999526377179259e-06, 'epoch': 0.04} 4%|▎ | 444/12313 [20:13<9:13:14, 2.80s/it] 4%|▎ | 445/12313 [20:16<9:01:32, 2.74s/it] {'loss': 0.5994, 'grad_norm': 4.903212810732332, 'learning_rate': 4.999513490489824e-06, 'epoch': 0.04} 4%|▎ | 445/12313 [20:16<9:01:32, 2.74s/it] 4%|▎ | 446/12313 [20:18<8:47:59, 2.67s/it] {'loss': 0.667, 'grad_norm': 4.394920629425536, 'learning_rate': 4.999500430847284e-06, 'epoch': 0.04} 4%|▎ | 446/12313 [20:18<8:47:59, 2.67s/it] 4%|▎ | 447/12313 [20:21<9:06:00, 2.76s/it] {'loss': 0.703, 'grad_norm': 3.62470443817531, 'learning_rate': 4.9994871982525425e-06, 'epoch': 0.04} 4%|▎ | 447/12313 [20:21<9:06:00, 2.76s/it] 4%|▎ | 448/12313 [20:24<9:13:14, 2.80s/it] {'loss': 0.6643, 'grad_norm': 6.417116392361057, 'learning_rate': 4.999473792706516e-06, 'epoch': 0.04} 4%|▎ | 448/12313 [20:24<9:13:14, 2.80s/it] 4%|▎ | 449/12313 [20:27<9:09:27, 2.78s/it] {'loss': 0.7914, 'grad_norm': 4.999543742426304, 'learning_rate': 4.999460214210131e-06, 'epoch': 0.04} 4%|▎ | 449/12313 [20:27<9:09:27, 2.78s/it] 4%|▎ | 450/12313 [20:29<9:08:42, 2.78s/it] {'loss': 0.6302, 'grad_norm': 5.788280407646206, 'learning_rate': 4.999446462764327e-06, 'epoch': 0.04} 4%|▎ | 450/12313 [20:29<9:08:42, 2.78s/it] 4%|▎ | 451/12313 [20:32<8:54:31, 2.70s/it] {'loss': 0.6356, 'grad_norm': 4.2978695329143495, 'learning_rate': 4.999432538370057e-06, 'epoch': 0.04} 4%|▎ | 451/12313 [20:32<8:54:31, 2.70s/it] 4%|▎ | 452/12313 [20:35<8:45:55, 2.66s/it] {'loss': 0.7923, 'grad_norm': 4.224970563305071, 'learning_rate': 4.999418441028283e-06, 'epoch': 0.04} 4%|▎ | 452/12313 [20:35<8:45:55, 2.66s/it] 4%|▎ | 453/12313 [20:37<8:42:06, 2.64s/it] {'loss': 0.4936, 'grad_norm': 6.438067501026122, 'learning_rate': 4.9994041707399794e-06, 'epoch': 0.04} 4%|▎ | 453/12313 [20:37<8:42:06, 2.64s/it] 4%|▎ | 454/12313 [20:40<8:44:04, 2.65s/it] {'loss': 0.58, 'grad_norm': 4.579554653548933, 'learning_rate': 4.999389727506137e-06, 'epoch': 0.04} 4%|▎ | 454/12313 [20:40<8:44:04, 2.65s/it] 4%|▎ | 455/12313 [20:42<8:44:02, 2.65s/it] {'loss': 0.5475, 'grad_norm': 4.4684771869350035, 'learning_rate': 4.999375111327753e-06, 'epoch': 0.04} 4%|▎ | 455/12313 [20:42<8:44:02, 2.65s/it] 4%|▎ | 456/12313 [20:45<8:49:26, 2.68s/it] {'loss': 0.6459, 'grad_norm': 7.907710639614374, 'learning_rate': 4.999360322205838e-06, 'epoch': 0.04} 4%|▎ | 456/12313 [20:45<8:49:26, 2.68s/it] 4%|▎ | 457/12313 [20:48<9:00:25, 2.73s/it] {'loss': 0.6723, 'grad_norm': 5.848886945428208, 'learning_rate': 4.999345360141417e-06, 'epoch': 0.04} 4%|▎ | 457/12313 [20:48<9:00:25, 2.73s/it] 4%|▎ | 458/12313 [20:51<8:52:46, 2.70s/it] {'loss': 0.7079, 'grad_norm': 3.055736509730621, 'learning_rate': 4.999330225135525e-06, 'epoch': 0.04} 4%|▎ | 458/12313 [20:51<8:52:46, 2.70s/it] 4%|▎ | 459/12313 [20:53<8:37:52, 2.62s/it] {'loss': 0.8212, 'grad_norm': 4.84395863310794, 'learning_rate': 4.999314917189209e-06, 'epoch': 0.04} 4%|▎ | 459/12313 [20:53<8:37:52, 2.62s/it] 4%|▎ | 460/12313 [20:56<8:51:01, 2.69s/it] {'loss': 0.7551, 'grad_norm': 4.4914093742426, 'learning_rate': 4.999299436303527e-06, 'epoch': 0.04} 4%|▎ | 460/12313 [20:56<8:51:01, 2.69s/it] 4%|▎ | 461/12313 [20:58<8:41:29, 2.64s/it] {'loss': 0.755, 'grad_norm': 5.747535926175868, 'learning_rate': 4.999283782479552e-06, 'epoch': 0.04} 4%|▎ | 461/12313 [20:58<8:41:29, 2.64s/it] 4%|▍ | 462/12313 [21:02<9:03:26, 2.75s/it] {'loss': 0.6151, 'grad_norm': 4.124807094235786, 'learning_rate': 4.999267955718367e-06, 'epoch': 0.04} 4%|▍ | 462/12313 [21:02<9:03:26, 2.75s/it] 4%|▍ | 463/12313 [21:04<8:48:39, 2.68s/it] {'loss': 0.6636, 'grad_norm': 4.951699574302496, 'learning_rate': 4.999251956021066e-06, 'epoch': 0.04} 4%|▍ | 463/12313 [21:04<8:48:39, 2.68s/it] 4%|▍ | 464/12313 [21:07<8:56:16, 2.72s/it] {'loss': 0.7115, 'grad_norm': 3.634905291149469, 'learning_rate': 4.999235783388757e-06, 'epoch': 0.04} 4%|▍ | 464/12313 [21:07<8:56:16, 2.72s/it] 4%|▍ | 465/12313 [21:10<8:58:31, 2.73s/it] {'loss': 0.689, 'grad_norm': 7.068387813096013, 'learning_rate': 4.999219437822559e-06, 'epoch': 0.04} 4%|▍ | 465/12313 [21:10<8:58:31, 2.73s/it] 4%|▍ | 466/12313 [21:12<9:01:00, 2.74s/it] {'loss': 0.6214, 'grad_norm': 4.404856833972122, 'learning_rate': 4.999202919323603e-06, 'epoch': 0.04} 4%|▍ | 466/12313 [21:12<9:01:00, 2.74s/it] 4%|▍ | 467/12313 [21:15<8:56:55, 2.72s/it] {'loss': 0.7647, 'grad_norm': 4.578684856908467, 'learning_rate': 4.9991862278930315e-06, 'epoch': 0.04} 4%|▍ | 467/12313 [21:15<8:56:55, 2.72s/it] 4%|▍ | 468/12313 [21:18<8:58:09, 2.73s/it] {'loss': 0.6884, 'grad_norm': 4.898513292964633, 'learning_rate': 4.9991693635320005e-06, 'epoch': 0.04} 4%|▍ | 468/12313 [21:18<8:58:09, 2.73s/it] 4%|▍ | 469/12313 [21:21<9:07:31, 2.77s/it] {'loss': 0.6647, 'grad_norm': 4.575368944744301, 'learning_rate': 4.999152326241675e-06, 'epoch': 0.04} 4%|▍ | 469/12313 [21:21<9:07:31, 2.77s/it] 4%|▍ | 470/12313 [21:23<8:49:26, 2.68s/it] {'loss': 0.6667, 'grad_norm': 4.238070688134001, 'learning_rate': 4.999135116023236e-06, 'epoch': 0.04} 4%|▍ | 470/12313 [21:23<8:49:26, 2.68s/it] 4%|▍ | 471/12313 [21:26<8:41:05, 2.64s/it] {'loss': 0.6754, 'grad_norm': 6.035566032840847, 'learning_rate': 4.999117732877873e-06, 'epoch': 0.04} 4%|▍ | 471/12313 [21:26<8:41:05, 2.64s/it] 4%|▍ | 472/12313 [21:28<8:26:12, 2.56s/it] {'loss': 0.8115, 'grad_norm': 3.607010706319493, 'learning_rate': 4.9991001768067895e-06, 'epoch': 0.04} 4%|▍ | 472/12313 [21:28<8:26:12, 2.56s/it] 4%|▍ | 473/12313 [21:31<8:20:10, 2.53s/it] {'loss': 0.6837, 'grad_norm': 6.191780527301594, 'learning_rate': 4.9990824478112e-06, 'epoch': 0.04} 4%|▍ | 473/12313 [21:31<8:20:10, 2.53s/it] 4%|▍ | 474/12313 [21:33<8:28:30, 2.58s/it] {'loss': 0.7097, 'grad_norm': 5.355617747355118, 'learning_rate': 4.999064545892331e-06, 'epoch': 0.04} 4%|▍ | 474/12313 [21:33<8:28:30, 2.58s/it] 4%|▍ | 475/12313 [21:36<8:20:20, 2.54s/it] {'loss': 0.5915, 'grad_norm': 4.233673909331852, 'learning_rate': 4.999046471051422e-06, 'epoch': 0.04} 4%|▍ | 475/12313 [21:36<8:20:20, 2.54s/it] 4%|▍ | 476/12313 [21:38<8:33:33, 2.60s/it] {'loss': 0.9316, 'grad_norm': 4.780173065160517, 'learning_rate': 4.999028223289724e-06, 'epoch': 0.04} 4%|▍ | 476/12313 [21:38<8:33:33, 2.60s/it] 4%|▍ | 477/12313 [21:41<8:52:01, 2.70s/it] {'loss': 0.6671, 'grad_norm': 4.063084409048435, 'learning_rate': 4.999009802608497e-06, 'epoch': 0.04} 4%|▍ | 477/12313 [21:41<8:52:01, 2.70s/it] 4%|▍ | 478/12313 [21:44<8:46:06, 2.67s/it] {'loss': 0.8034, 'grad_norm': 4.851439856688829, 'learning_rate': 4.998991209009019e-06, 'epoch': 0.04} 4%|▍ | 478/12313 [21:44<8:46:06, 2.67s/it] 4%|▍ | 479/12313 [21:47<8:45:16, 2.66s/it] {'loss': 0.532, 'grad_norm': 7.105889513775342, 'learning_rate': 4.998972442492575e-06, 'epoch': 0.04} 4%|▍ | 479/12313 [21:47<8:45:16, 2.66s/it] 4%|▍ | 480/12313 [21:49<8:33:39, 2.60s/it] {'loss': 0.5993, 'grad_norm': 3.8451222439770683, 'learning_rate': 4.9989535030604615e-06, 'epoch': 0.04} 4%|▍ | 480/12313 [21:49<8:33:39, 2.60s/it] 4%|▍ | 481/12313 [21:52<8:53:58, 2.71s/it] {'loss': 0.5273, 'grad_norm': 5.621997119878575, 'learning_rate': 4.998934390713994e-06, 'epoch': 0.04} 4%|▍ | 481/12313 [21:52<8:53:58, 2.71s/it] 4%|▍ | 482/12313 [21:55<8:58:37, 2.73s/it] {'loss': 0.6318, 'grad_norm': 4.81307738692358, 'learning_rate': 4.9989151054544905e-06, 'epoch': 0.04} 4%|▍ | 482/12313 [21:55<8:58:37, 2.73s/it] 4%|▍ | 483/12313 [21:57<8:56:13, 2.72s/it] {'loss': 0.6886, 'grad_norm': 3.849217538759433, 'learning_rate': 4.998895647283287e-06, 'epoch': 0.04} 4%|▍ | 483/12313 [21:57<8:56:13, 2.72s/it] 4%|▍ | 484/12313 [22:00<8:51:27, 2.70s/it] {'loss': 0.7443, 'grad_norm': 3.909888024088152, 'learning_rate': 4.99887601620173e-06, 'epoch': 0.04} 4%|▍ | 484/12313 [22:00<8:51:27, 2.70s/it] 4%|▍ | 485/12313 [22:03<8:51:04, 2.69s/it] {'loss': 0.6783, 'grad_norm': 5.4327964670836835, 'learning_rate': 4.9988562122111785e-06, 'epoch': 0.04} 4%|▍ | 485/12313 [22:03<8:51:04, 2.69s/it] 4%|▍ | 486/12313 [22:05<8:42:30, 2.65s/it] {'loss': 0.6589, 'grad_norm': 4.60325431470408, 'learning_rate': 4.998836235313001e-06, 'epoch': 0.04} 4%|▍ | 486/12313 [22:05<8:42:30, 2.65s/it] 4%|▍ | 487/12313 [22:08<8:57:09, 2.73s/it] {'loss': 0.5747, 'grad_norm': 4.5476339338388945, 'learning_rate': 4.998816085508582e-06, 'epoch': 0.04} 4%|▍ | 487/12313 [22:08<8:57:09, 2.73s/it] 4%|▍ | 488/12313 [22:11<8:42:31, 2.65s/it] {'loss': 0.6026, 'grad_norm': 4.970415173334207, 'learning_rate': 4.9987957627993145e-06, 'epoch': 0.04} 4%|▍ | 488/12313 [22:11<8:42:31, 2.65s/it] 4%|▍ | 489/12313 [22:13<8:50:05, 2.69s/it] {'loss': 0.7375, 'grad_norm': 4.315065332489597, 'learning_rate': 4.998775267186605e-06, 'epoch': 0.04} 4%|▍ | 489/12313 [22:13<8:50:05, 2.69s/it] 4%|▍ | 490/12313 [22:16<8:33:27, 2.61s/it] {'loss': 0.6427, 'grad_norm': 5.5992123809226, 'learning_rate': 4.998754598671871e-06, 'epoch': 0.04} 4%|▍ | 490/12313 [22:16<8:33:27, 2.61s/it] 4%|▍ | 491/12313 [22:19<8:33:58, 2.61s/it] {'loss': 0.7897, 'grad_norm': 3.7014622620168405, 'learning_rate': 4.998733757256544e-06, 'epoch': 0.04} 4%|▍ | 491/12313 [22:19<8:33:58, 2.61s/it] 4%|▍ | 492/12313 [22:21<8:39:50, 2.64s/it] {'loss': 0.6393, 'grad_norm': 6.7379942751786235, 'learning_rate': 4.998712742942065e-06, 'epoch': 0.04} 4%|▍ | 492/12313 [22:21<8:39:50, 2.64s/it] 4%|▍ | 493/12313 [22:24<8:33:20, 2.61s/it] {'loss': 0.6606, 'grad_norm': 4.857719068955753, 'learning_rate': 4.998691555729888e-06, 'epoch': 0.04} 4%|▍ | 493/12313 [22:24<8:33:20, 2.61s/it] 4%|▍ | 494/12313 [22:27<8:56:45, 2.72s/it] {'loss': 0.7552, 'grad_norm': 4.0539838554657885, 'learning_rate': 4.9986701956214804e-06, 'epoch': 0.04} 4%|▍ | 494/12313 [22:27<8:56:45, 2.72s/it] 4%|▍ | 495/12313 [22:29<8:51:43, 2.70s/it] {'loss': 0.666, 'grad_norm': 4.957439145405436, 'learning_rate': 4.998648662618318e-06, 'epoch': 0.04} 4%|▍ | 495/12313 [22:29<8:51:43, 2.70s/it] 4%|▍ | 496/12313 [22:32<8:51:44, 2.70s/it] {'loss': 0.6508, 'grad_norm': 4.953757290265213, 'learning_rate': 4.998626956721894e-06, 'epoch': 0.04} 4%|▍ | 496/12313 [22:32<8:51:44, 2.70s/it] 4%|▍ | 497/12313 [22:35<8:48:30, 2.68s/it] {'loss': 0.6307, 'grad_norm': 3.84200026766362, 'learning_rate': 4.998605077933706e-06, 'epoch': 0.04} 4%|▍ | 497/12313 [22:35<8:48:30, 2.68s/it] 4%|▍ | 498/12313 [22:37<8:43:19, 2.66s/it] {'loss': 0.794, 'grad_norm': 6.038380469894045, 'learning_rate': 4.998583026255272e-06, 'epoch': 0.04} 4%|▍ | 498/12313 [22:37<8:43:19, 2.66s/it] 4%|▍ | 499/12313 [22:40<8:50:59, 2.70s/it] {'loss': 0.5471, 'grad_norm': 5.862309248926957, 'learning_rate': 4.998560801688116e-06, 'epoch': 0.04} 4%|▍ | 499/12313 [22:40<8:50:59, 2.70s/it] 4%|▍ | 500/12313 [22:43<8:50:05, 2.69s/it] {'loss': 0.6685, 'grad_norm': 5.376571375980098, 'learning_rate': 4.998538404233776e-06, 'epoch': 0.04} 4%|▍ | 500/12313 [22:43<8:50:05, 2.69s/it] 4%|▍ | 501/12313 [22:45<8:47:42, 2.68s/it] {'loss': 0.5943, 'grad_norm': 3.972813169207523, 'learning_rate': 4.998515833893801e-06, 'epoch': 0.04} 4%|▍ | 501/12313 [22:45<8:47:42, 2.68s/it] 4%|▍ | 502/12313 [22:48<8:49:05, 2.69s/it] {'loss': 0.6821, 'grad_norm': 5.457428585886831, 'learning_rate': 4.998493090669754e-06, 'epoch': 0.04} 4%|▍ | 502/12313 [22:48<8:49:05, 2.69s/it] 4%|▍ | 503/12313 [22:51<8:56:11, 2.72s/it] {'loss': 0.6268, 'grad_norm': 5.526128969258129, 'learning_rate': 4.998470174563208e-06, 'epoch': 0.04} 4%|▍ | 503/12313 [22:51<8:56:11, 2.72s/it] 4%|▍ | 504/12313 [22:54<8:48:24, 2.68s/it] {'loss': 0.5511, 'grad_norm': 5.00358557141072, 'learning_rate': 4.9984470855757485e-06, 'epoch': 0.04} 4%|▍ | 504/12313 [22:54<8:48:24, 2.68s/it] 4%|▍ | 505/12313 [22:56<8:42:24, 2.65s/it] {'loss': 0.7192, 'grad_norm': 9.228964746118006, 'learning_rate': 4.998423823708974e-06, 'epoch': 0.04} 4%|▍ | 505/12313 [22:56<8:42:24, 2.65s/it] 4%|▍ | 506/12313 [22:59<8:50:52, 2.70s/it] {'loss': 0.6898, 'grad_norm': 7.043937824113614, 'learning_rate': 4.998400388964494e-06, 'epoch': 0.04} 4%|▍ | 506/12313 [22:59<8:50:52, 2.70s/it] 4%|▍ | 507/12313 [23:01<8:40:11, 2.64s/it] {'loss': 0.6673, 'grad_norm': 7.469052434654404, 'learning_rate': 4.998376781343929e-06, 'epoch': 0.04} 4%|▍ | 507/12313 [23:01<8:40:11, 2.64s/it] 4%|▍ | 508/12313 [23:04<9:00:31, 2.75s/it] {'loss': 0.8352, 'grad_norm': 5.819947119265823, 'learning_rate': 4.998353000848913e-06, 'epoch': 0.04} 4%|▍ | 508/12313 [23:04<9:00:31, 2.75s/it] 4%|▍ | 509/12313 [23:07<8:50:13, 2.70s/it] {'loss': 0.6627, 'grad_norm': 4.632267764249478, 'learning_rate': 4.998329047481093e-06, 'epoch': 0.04} 4%|▍ | 509/12313 [23:07<8:50:13, 2.70s/it] 4%|▍ | 510/12313 [23:10<9:04:48, 2.77s/it] {'loss': 0.6928, 'grad_norm': 5.04622538690542, 'learning_rate': 4.998304921242124e-06, 'epoch': 0.04} 4%|▍ | 510/12313 [23:10<9:04:48, 2.77s/it] 4%|▍ | 511/12313 [23:12<8:43:25, 2.66s/it] {'loss': 0.6864, 'grad_norm': 4.929054550494181, 'learning_rate': 4.998280622133677e-06, 'epoch': 0.04} 4%|▍ | 511/12313 [23:12<8:43:25, 2.66s/it] 4%|▍ | 512/12313 [23:15<8:39:46, 2.64s/it] {'loss': 0.5752, 'grad_norm': 5.630142199208348, 'learning_rate': 4.998256150157433e-06, 'epoch': 0.04} 4%|▍ | 512/12313 [23:15<8:39:46, 2.64s/it] 4%|▍ | 513/12313 [23:18<8:41:53, 2.65s/it] {'loss': 0.6679, 'grad_norm': 4.327018939734424, 'learning_rate': 4.998231505315085e-06, 'epoch': 0.04} 4%|▍ | 513/12313 [23:18<8:41:53, 2.65s/it] 4%|▍ | 514/12313 [23:21<8:59:48, 2.75s/it] {'loss': 0.5235, 'grad_norm': 5.207414191020043, 'learning_rate': 4.998206687608339e-06, 'epoch': 0.04} 4%|▍ | 514/12313 [23:21<8:59:48, 2.75s/it] 4%|▍ | 515/12313 [23:23<8:56:56, 2.73s/it] {'loss': 0.5066, 'grad_norm': 5.078181936565571, 'learning_rate': 4.998181697038912e-06, 'epoch': 0.04} 4%|▍ | 515/12313 [23:23<8:56:56, 2.73s/it] 4%|▍ | 516/12313 [23:26<8:59:55, 2.75s/it] {'loss': 0.5846, 'grad_norm': 4.596975319564036, 'learning_rate': 4.998156533608531e-06, 'epoch': 0.04} 4%|▍ | 516/12313 [23:26<8:59:55, 2.75s/it] 4%|▍ | 517/12313 [23:29<8:54:32, 2.72s/it] {'loss': 0.5737, 'grad_norm': 6.0532848525291545, 'learning_rate': 4.998131197318942e-06, 'epoch': 0.04} 4%|▍ | 517/12313 [23:29<8:54:32, 2.72s/it] 4%|▍ | 518/12313 [23:31<8:52:52, 2.71s/it] {'loss': 0.6762, 'grad_norm': 4.35820106724424, 'learning_rate': 4.998105688171893e-06, 'epoch': 0.04} 4%|▍ | 518/12313 [23:31<8:52:52, 2.71s/it] 4%|▍ | 519/12313 [23:34<8:47:52, 2.69s/it] {'loss': 0.5852, 'grad_norm': 6.664678597912458, 'learning_rate': 4.998080006169153e-06, 'epoch': 0.04} 4%|▍ | 519/12313 [23:34<8:47:52, 2.69s/it] 4%|▍ | 520/12313 [23:37<8:35:12, 2.62s/it] {'loss': 0.7535, 'grad_norm': 8.897090441557413, 'learning_rate': 4.9980541513124966e-06, 'epoch': 0.04} 4%|▍ | 520/12313 [23:37<8:35:12, 2.62s/it] 4%|▍ | 521/12313 [23:39<8:32:29, 2.61s/it] {'loss': 0.5836, 'grad_norm': 5.576078105545307, 'learning_rate': 4.998028123603714e-06, 'epoch': 0.04} 4%|▍ | 521/12313 [23:39<8:32:29, 2.61s/it] 4%|▍ | 522/12313 [23:42<8:34:36, 2.62s/it] {'loss': 0.638, 'grad_norm': 5.498839425247008, 'learning_rate': 4.998001923044605e-06, 'epoch': 0.04} 4%|▍ | 522/12313 [23:42<8:34:36, 2.62s/it] 4%|▍ | 523/12313 [23:44<8:23:20, 2.56s/it] {'loss': 0.6051, 'grad_norm': 7.495880701431175, 'learning_rate': 4.997975549636985e-06, 'epoch': 0.04} 4%|▍ | 523/12313 [23:44<8:23:20, 2.56s/it] 4%|▍ | 524/12313 [23:47<8:28:59, 2.59s/it] {'loss': 0.5373, 'grad_norm': 8.634014617651163, 'learning_rate': 4.997949003382676e-06, 'epoch': 0.04} 4%|▍ | 524/12313 [23:47<8:28:59, 2.59s/it] 4%|▍ | 525/12313 [23:50<8:36:48, 2.63s/it] {'loss': 0.8439, 'grad_norm': 3.6876707339560153, 'learning_rate': 4.997922284283517e-06, 'epoch': 0.04} 4%|▍ | 525/12313 [23:50<8:36:48, 2.63s/it] 4%|▍ | 526/12313 [23:52<8:43:39, 2.67s/it] {'loss': 0.9273, 'grad_norm': 4.386768122021177, 'learning_rate': 4.997895392341356e-06, 'epoch': 0.04} 4%|▍ | 526/12313 [23:52<8:43:39, 2.67s/it] 4%|▍ | 527/12313 [23:55<8:38:33, 2.64s/it] {'loss': 1.0235, 'grad_norm': 4.887505724579102, 'learning_rate': 4.997868327558053e-06, 'epoch': 0.04} 4%|▍ | 527/12313 [23:55<8:38:33, 2.64s/it] 4%|▍ | 528/12313 [23:58<8:37:42, 2.64s/it] {'loss': 0.7851, 'grad_norm': 4.002537001522704, 'learning_rate': 4.997841089935482e-06, 'epoch': 0.04} 4%|▍ | 528/12313 [23:58<8:37:42, 2.64s/it] 4%|▍ | 529/12313 [24:00<8:28:58, 2.59s/it] {'loss': 0.7571, 'grad_norm': 7.16699779080668, 'learning_rate': 4.997813679475528e-06, 'epoch': 0.04} 4%|▍ | 529/12313 [24:00<8:28:58, 2.59s/it] 4%|▍ | 530/12313 [24:03<8:38:14, 2.64s/it] {'loss': 0.721, 'grad_norm': 4.775024972507428, 'learning_rate': 4.997786096180086e-06, 'epoch': 0.04} 4%|▍ | 530/12313 [24:03<8:38:14, 2.64s/it] 4%|▍ | 531/12313 [24:05<8:36:24, 2.63s/it] {'loss': 0.5692, 'grad_norm': 5.099681936395401, 'learning_rate': 4.997758340051066e-06, 'epoch': 0.04} 4%|▍ | 531/12313 [24:05<8:36:24, 2.63s/it] 4%|▍ | 532/12313 [24:08<8:25:33, 2.57s/it] {'loss': 0.643, 'grad_norm': 4.818659941047612, 'learning_rate': 4.997730411090387e-06, 'epoch': 0.04} 4%|▍ | 532/12313 [24:08<8:25:33, 2.57s/it] 4%|▍ | 533/12313 [24:11<8:45:40, 2.68s/it] {'loss': 0.7149, 'grad_norm': 6.604728440380809, 'learning_rate': 4.997702309299983e-06, 'epoch': 0.04} 4%|▍ | 533/12313 [24:11<8:45:40, 2.68s/it] 4%|▍ | 534/12313 [24:13<8:48:24, 2.69s/it] {'loss': 0.6094, 'grad_norm': 5.6047753258513415, 'learning_rate': 4.997674034681799e-06, 'epoch': 0.04} 4%|▍ | 534/12313 [24:13<8:48:24, 2.69s/it] 4%|▍ | 535/12313 [24:16<8:35:25, 2.63s/it] {'loss': 0.5527, 'grad_norm': 5.772050070771113, 'learning_rate': 4.99764558723779e-06, 'epoch': 0.04} 4%|▍ | 535/12313 [24:16<8:35:25, 2.63s/it] 4%|▍ | 536/12313 [24:18<8:30:19, 2.60s/it] {'loss': 0.8073, 'grad_norm': 6.695215883564865, 'learning_rate': 4.997616966969925e-06, 'epoch': 0.04} 4%|▍ | 536/12313 [24:18<8:30:19, 2.60s/it] 4%|▍ | 537/12313 [24:21<8:35:33, 2.63s/it] {'loss': 0.7011, 'grad_norm': 5.0342584361934035, 'learning_rate': 4.997588173880184e-06, 'epoch': 0.04} 4%|▍ | 537/12313 [24:21<8:35:33, 2.63s/it] 4%|▍ | 538/12313 [24:24<8:25:29, 2.58s/it] {'loss': 0.6319, 'grad_norm': 4.746592585436567, 'learning_rate': 4.99755920797056e-06, 'epoch': 0.04} 4%|▍ | 538/12313 [24:24<8:25:29, 2.58s/it] 4%|▍ | 539/12313 [24:26<8:33:35, 2.62s/it] {'loss': 0.5789, 'grad_norm': 5.2897074099894095, 'learning_rate': 4.997530069243057e-06, 'epoch': 0.04} 4%|▍ | 539/12313 [24:26<8:33:35, 2.62s/it] 4%|▍ | 540/12313 [24:29<8:27:53, 2.59s/it] {'loss': 0.6953, 'grad_norm': 3.8007318490045416, 'learning_rate': 4.997500757699691e-06, 'epoch': 0.04} 4%|▍ | 540/12313 [24:29<8:27:53, 2.59s/it] 4%|▍ | 541/12313 [24:32<8:32:57, 2.61s/it] {'loss': 0.621, 'grad_norm': 5.144861861634209, 'learning_rate': 4.9974712733424905e-06, 'epoch': 0.04} 4%|▍ | 541/12313 [24:32<8:32:57, 2.61s/it] 4%|▍ | 542/12313 [24:34<8:21:23, 2.56s/it] {'loss': 0.6511, 'grad_norm': 5.905282410348868, 'learning_rate': 4.997441616173495e-06, 'epoch': 0.04} 4%|▍ | 542/12313 [24:34<8:21:23, 2.56s/it] 4%|▍ | 543/12313 [24:37<8:41:23, 2.66s/it] {'loss': 0.5526, 'grad_norm': 5.309964350881816, 'learning_rate': 4.997411786194758e-06, 'epoch': 0.04} 4%|▍ | 543/12313 [24:37<8:41:23, 2.66s/it] 4%|▍ | 544/12313 [24:39<8:30:20, 2.60s/it] {'loss': 0.6067, 'grad_norm': 11.149541991220845, 'learning_rate': 4.997381783408343e-06, 'epoch': 0.04} 4%|▍ | 544/12313 [24:39<8:30:20, 2.60s/it] 4%|▍ | 545/12313 [24:42<8:29:22, 2.60s/it] {'loss': 0.6418, 'grad_norm': 5.410844505382304, 'learning_rate': 4.9973516078163256e-06, 'epoch': 0.04} 4%|▍ | 545/12313 [24:42<8:29:22, 2.60s/it] 4%|▍ | 546/12313 [24:44<8:24:10, 2.57s/it] {'loss': 0.6921, 'grad_norm': 5.430764606424544, 'learning_rate': 4.997321259420793e-06, 'epoch': 0.04} 4%|▍ | 546/12313 [24:44<8:24:10, 2.57s/it] 4%|▍ | 547/12313 [24:47<8:22:40, 2.56s/it] {'loss': 0.5558, 'grad_norm': 11.412874034754271, 'learning_rate': 4.997290738223847e-06, 'epoch': 0.04} 4%|▍ | 547/12313 [24:47<8:22:40, 2.56s/it] 4%|▍ | 548/12313 [24:50<8:27:55, 2.59s/it] {'loss': 0.6049, 'grad_norm': 4.133154872685264, 'learning_rate': 4.9972600442275985e-06, 'epoch': 0.04} 4%|▍ | 548/12313 [24:50<8:27:55, 2.59s/it] 4%|▍ | 549/12313 [24:52<8:31:49, 2.61s/it] {'loss': 0.5138, 'grad_norm': 5.592148797903198, 'learning_rate': 4.997229177434171e-06, 'epoch': 0.04} 4%|▍ | 549/12313 [24:52<8:31:49, 2.61s/it] 4%|▍ | 550/12313 [24:55<8:41:39, 2.66s/it] {'loss': 0.7193, 'grad_norm': 8.487263757967648, 'learning_rate': 4.997198137845702e-06, 'epoch': 0.04} 4%|▍ | 550/12313 [24:55<8:41:39, 2.66s/it] 4%|▍ | 551/12313 [24:58<8:55:48, 2.73s/it] {'loss': 0.7594, 'grad_norm': 7.830506522230207, 'learning_rate': 4.997166925464337e-06, 'epoch': 0.04} 4%|▍ | 551/12313 [24:58<8:55:48, 2.73s/it] 4%|▍ | 552/12313 [25:01<8:57:01, 2.74s/it] {'loss': 0.5762, 'grad_norm': 8.206186247100188, 'learning_rate': 4.997135540292237e-06, 'epoch': 0.04} 4%|▍ | 552/12313 [25:01<8:57:01, 2.74s/it] 4%|▍ | 553/12313 [25:03<8:45:08, 2.68s/it] {'loss': 0.5914, 'grad_norm': 4.437021611163288, 'learning_rate': 4.997103982331574e-06, 'epoch': 0.04} 4%|▍ | 553/12313 [25:03<8:45:08, 2.68s/it] 4%|▍ | 554/12313 [25:06<8:43:34, 2.67s/it] {'loss': 0.5335, 'grad_norm': 5.38722516332683, 'learning_rate': 4.997072251584531e-06, 'epoch': 0.04} 4%|▍ | 554/12313 [25:06<8:43:34, 2.67s/it] 5%|▍ | 555/12313 [25:08<8:34:46, 2.63s/it] {'loss': 0.6441, 'grad_norm': 7.791956117401669, 'learning_rate': 4.997040348053304e-06, 'epoch': 0.05} 5%|▍ | 555/12313 [25:08<8:34:46, 2.63s/it] 5%|▍ | 556/12313 [25:11<8:37:46, 2.64s/it] {'loss': 0.8322, 'grad_norm': 4.66481520418914, 'learning_rate': 4.9970082717401e-06, 'epoch': 0.05} 5%|▍ | 556/12313 [25:11<8:37:46, 2.64s/it] 5%|▍ | 557/12313 [25:14<8:37:07, 2.64s/it] {'loss': 0.6918, 'grad_norm': 5.615716066994191, 'learning_rate': 4.9969760226471385e-06, 'epoch': 0.05} 5%|▍ | 557/12313 [25:14<8:37:07, 2.64s/it] 5%|▍ | 558/12313 [25:16<8:31:38, 2.61s/it] {'loss': 0.5777, 'grad_norm': 5.424151701577734, 'learning_rate': 4.9969436007766514e-06, 'epoch': 0.05} 5%|▍ | 558/12313 [25:16<8:31:38, 2.61s/it] 5%|▍ | 559/12313 [25:19<8:41:32, 2.66s/it] {'loss': 0.7466, 'grad_norm': 4.7203985695060195, 'learning_rate': 4.9969110061308826e-06, 'epoch': 0.05} 5%|▍ | 559/12313 [25:19<8:41:32, 2.66s/it] 5%|▍ | 560/12313 [25:22<8:53:56, 2.73s/it] {'loss': 0.679, 'grad_norm': 4.730334252790068, 'learning_rate': 4.996878238712087e-06, 'epoch': 0.05} 5%|▍ | 560/12313 [25:22<8:53:56, 2.73s/it] 5%|▍ | 561/12313 [25:25<9:14:12, 2.83s/it] {'loss': 0.5825, 'grad_norm': 3.979712937320014, 'learning_rate': 4.996845298522531e-06, 'epoch': 0.05} 5%|▍ | 561/12313 [25:25<9:14:12, 2.83s/it] 5%|▍ | 562/12313 [25:28<9:08:00, 2.80s/it] {'loss': 0.6467, 'grad_norm': 6.730522946218954, 'learning_rate': 4.996812185564496e-06, 'epoch': 0.05} 5%|▍ | 562/12313 [25:28<9:08:00, 2.80s/it] 5%|▍ | 563/12313 [25:30<8:59:55, 2.76s/it] {'loss': 0.7073, 'grad_norm': 3.33620867884503, 'learning_rate': 4.99677889984027e-06, 'epoch': 0.05} 5%|▍ | 563/12313 [25:30<8:59:55, 2.76s/it] 5%|▍ | 564/12313 [25:33<8:56:59, 2.74s/it] {'loss': 0.667, 'grad_norm': 5.926130876855334, 'learning_rate': 4.996745441352159e-06, 'epoch': 0.05} 5%|▍ | 564/12313 [25:33<8:56:59, 2.74s/it] 5%|▍ | 565/12313 [25:36<9:27:45, 2.90s/it] {'loss': 0.8168, 'grad_norm': 5.572153026812714, 'learning_rate': 4.996711810102478e-06, 'epoch': 0.05} 5%|▍ | 565/12313 [25:36<9:27:45, 2.90s/it] 5%|▍ | 566/12313 [25:39<9:12:34, 2.82s/it] {'loss': 0.6412, 'grad_norm': 8.174580316906296, 'learning_rate': 4.996678006093553e-06, 'epoch': 0.05} 5%|▍ | 566/12313 [25:39<9:12:34, 2.82s/it] 5%|▍ | 567/12313 [25:42<9:11:42, 2.82s/it] {'loss': 0.6742, 'grad_norm': 4.09571754078007, 'learning_rate': 4.996644029327723e-06, 'epoch': 0.05} 5%|▍ | 567/12313 [25:42<9:11:42, 2.82s/it] 5%|▍ | 568/12313 [25:45<9:04:43, 2.78s/it] {'loss': 0.8325, 'grad_norm': 4.109112417478182, 'learning_rate': 4.996609879807341e-06, 'epoch': 0.05} 5%|▍ | 568/12313 [25:45<9:04:43, 2.78s/it] 5%|▍ | 569/12313 [25:47<8:57:17, 2.75s/it] {'loss': 0.6972, 'grad_norm': 4.752061462760559, 'learning_rate': 4.9965755575347665e-06, 'epoch': 0.05} 5%|▍ | 569/12313 [25:47<8:57:17, 2.75s/it] 5%|▍ | 570/12313 [25:50<9:20:22, 2.86s/it] {'loss': 0.5081, 'grad_norm': 3.9521387750727524, 'learning_rate': 4.996541062512377e-06, 'epoch': 0.05} 5%|▍ | 570/12313 [25:50<9:20:22, 2.86s/it] 5%|▍ | 571/12313 [25:53<9:07:13, 2.80s/it] {'loss': 0.5242, 'grad_norm': 7.487630136380704, 'learning_rate': 4.996506394742559e-06, 'epoch': 0.05} 5%|▍ | 571/12313 [25:53<9:07:13, 2.80s/it] 5%|▍ | 572/12313 [25:56<9:14:57, 2.84s/it] {'loss': 0.6066, 'grad_norm': 5.41182957872003, 'learning_rate': 4.996471554227711e-06, 'epoch': 0.05} 5%|▍ | 572/12313 [25:56<9:14:57, 2.84s/it] 5%|▍ | 573/12313 [25:59<9:09:59, 2.81s/it] {'loss': 0.647, 'grad_norm': 4.448050180267281, 'learning_rate': 4.996436540970243e-06, 'epoch': 0.05} 5%|▍ | 573/12313 [25:59<9:09:59, 2.81s/it] 5%|▍ | 574/12313 [26:01<9:04:32, 2.78s/it] {'loss': 0.5929, 'grad_norm': 7.285104243616152, 'learning_rate': 4.99640135497258e-06, 'epoch': 0.05} 5%|▍ | 574/12313 [26:01<9:04:32, 2.78s/it] 5%|▍ | 575/12313 [26:04<9:01:27, 2.77s/it] {'loss': 0.616, 'grad_norm': 3.4657873419662413, 'learning_rate': 4.996365996237155e-06, 'epoch': 0.05} 5%|▍ | 575/12313 [26:04<9:01:27, 2.77s/it] 5%|▍ | 576/12313 [26:08<9:45:18, 2.99s/it] {'loss': 0.7037, 'grad_norm': 6.130204069422916, 'learning_rate': 4.996330464766414e-06, 'epoch': 0.05} 5%|▍ | 576/12313 [26:08<9:45:18, 2.99s/it] 5%|▍ | 577/12313 [26:11<9:59:11, 3.06s/it] {'loss': 0.735, 'grad_norm': 4.045238400737803, 'learning_rate': 4.996294760562817e-06, 'epoch': 0.05} 5%|▍ | 577/12313 [26:11<9:59:11, 3.06s/it] 5%|▍ | 578/12313 [26:14<9:43:17, 2.98s/it] {'loss': 0.5942, 'grad_norm': 4.541687240117253, 'learning_rate': 4.996258883628834e-06, 'epoch': 0.05} 5%|▍ | 578/12313 [26:14<9:43:17, 2.98s/it] 5%|▍ | 579/12313 [26:16<9:17:46, 2.85s/it] {'loss': 0.5935, 'grad_norm': 3.8724612963620326, 'learning_rate': 4.996222833966947e-06, 'epoch': 0.05} 5%|▍ | 579/12313 [26:16<9:17:46, 2.85s/it] 5%|▍ | 580/12313 [26:19<9:00:40, 2.76s/it] {'loss': 0.5016, 'grad_norm': 6.548773937549966, 'learning_rate': 4.996186611579652e-06, 'epoch': 0.05} 5%|▍ | 580/12313 [26:19<9:00:40, 2.76s/it] 5%|▍ | 581/12313 [26:22<9:07:52, 2.80s/it] {'loss': 0.7185, 'grad_norm': 4.80061928529426, 'learning_rate': 4.996150216469454e-06, 'epoch': 0.05} 5%|▍ | 581/12313 [26:22<9:07:52, 2.80s/it] 5%|▍ | 582/12313 [26:24<8:50:13, 2.71s/it] {'loss': 0.6544, 'grad_norm': 3.5697724511020974, 'learning_rate': 4.996113648638872e-06, 'epoch': 0.05} 5%|▍ | 582/12313 [26:24<8:50:13, 2.71s/it] 5%|▍ | 583/12313 [26:27<8:49:04, 2.71s/it] {'loss': 0.712, 'grad_norm': 8.657160084841168, 'learning_rate': 4.996076908090435e-06, 'epoch': 0.05} 5%|▍ | 583/12313 [26:27<8:49:04, 2.71s/it] 5%|▍ | 584/12313 [26:30<9:34:17, 2.94s/it] {'loss': 0.6283, 'grad_norm': 5.811863937951256, 'learning_rate': 4.9960399948266865e-06, 'epoch': 0.05} 5%|▍ | 584/12313 [26:30<9:34:17, 2.94s/it] 5%|▍ | 585/12313 [26:33<9:26:22, 2.90s/it] {'loss': 0.5942, 'grad_norm': 4.144290801020178, 'learning_rate': 4.9960029088501814e-06, 'epoch': 0.05} 5%|▍ | 585/12313 [26:33<9:26:22, 2.90s/it] 5%|▍ | 586/12313 [26:36<9:07:01, 2.80s/it] {'loss': 0.7404, 'grad_norm': 11.532211557025384, 'learning_rate': 4.995965650163485e-06, 'epoch': 0.05} 5%|▍ | 586/12313 [26:36<9:07:01, 2.80s/it] 5%|▍ | 587/12313 [26:39<9:53:02, 3.03s/it] {'loss': 0.5951, 'grad_norm': 4.498137515882452, 'learning_rate': 4.995928218769174e-06, 'epoch': 0.05} 5%|▍ | 587/12313 [26:39<9:53:02, 3.03s/it] 5%|▍ | 588/12313 [26:42<9:32:17, 2.93s/it] {'loss': 0.6123, 'grad_norm': 5.1824043128703465, 'learning_rate': 4.99589061466984e-06, 'epoch': 0.05} 5%|▍ | 588/12313 [26:42<9:32:17, 2.93s/it] 5%|▍ | 589/12313 [26:44<9:06:57, 2.80s/it] {'loss': 0.5376, 'grad_norm': 5.787634937924542, 'learning_rate': 4.995852837868086e-06, 'epoch': 0.05} 5%|▍ | 589/12313 [26:44<9:06:57, 2.80s/it] 5%|▍ | 590/12313 [26:47<8:51:03, 2.72s/it] {'loss': 0.599, 'grad_norm': 7.146634422767184, 'learning_rate': 4.995814888366523e-06, 'epoch': 0.05} 5%|▍ | 590/12313 [26:47<8:51:03, 2.72s/it] 5%|▍ | 591/12313 [26:50<8:57:21, 2.75s/it] {'loss': 0.7119, 'grad_norm': 4.672683860091777, 'learning_rate': 4.995776766167781e-06, 'epoch': 0.05} 5%|▍ | 591/12313 [26:50<8:57:21, 2.75s/it] 5%|▍ | 592/12313 [26:52<8:41:12, 2.67s/it] {'loss': 0.5744, 'grad_norm': 4.546708179061631, 'learning_rate': 4.9957384712744935e-06, 'epoch': 0.05} 5%|▍ | 592/12313 [26:52<8:41:12, 2.67s/it] 5%|▍ | 593/12313 [26:55<8:42:12, 2.67s/it] {'loss': 0.6749, 'grad_norm': 5.406893771451819, 'learning_rate': 4.9957000036893124e-06, 'epoch': 0.05} 5%|▍ | 593/12313 [26:55<8:42:12, 2.67s/it] 5%|▍ | 594/12313 [26:58<8:38:39, 2.66s/it] {'loss': 0.9589, 'grad_norm': 4.6310196800989685, 'learning_rate': 4.9956613634149e-06, 'epoch': 0.05} 5%|▍ | 594/12313 [26:58<8:38:39, 2.66s/it] 5%|▍ | 595/12313 [27:00<8:37:20, 2.65s/it] {'loss': 0.7278, 'grad_norm': 5.64526302845874, 'learning_rate': 4.995622550453929e-06, 'epoch': 0.05} 5%|▍ | 595/12313 [27:00<8:37:20, 2.65s/it] 5%|▍ | 596/12313 [27:03<8:37:31, 2.65s/it] {'loss': 0.56, 'grad_norm': 7.582504845192103, 'learning_rate': 4.995583564809086e-06, 'epoch': 0.05} 5%|▍ | 596/12313 [27:03<8:37:31, 2.65s/it] 5%|▍ | 597/12313 [27:05<8:19:08, 2.56s/it] {'loss': 0.7885, 'grad_norm': 4.631867662632177, 'learning_rate': 4.995544406483067e-06, 'epoch': 0.05} 5%|▍ | 597/12313 [27:05<8:19:08, 2.56s/it] 5%|▍ | 598/12313 [27:08<8:30:05, 2.61s/it] {'loss': 0.7185, 'grad_norm': 4.646870659330431, 'learning_rate': 4.9955050754785835e-06, 'epoch': 0.05} 5%|▍ | 598/12313 [27:08<8:30:05, 2.61s/it] 5%|▍ | 599/12313 [27:11<8:55:35, 2.74s/it] {'loss': 0.6945, 'grad_norm': 4.724376677010658, 'learning_rate': 4.995465571798356e-06, 'epoch': 0.05} 5%|▍ | 599/12313 [27:11<8:55:35, 2.74s/it] 5%|▍ | 600/12313 [27:14<8:52:32, 2.73s/it] {'loss': 0.7329, 'grad_norm': 3.7877986447817684, 'learning_rate': 4.995425895445118e-06, 'epoch': 0.05} 5%|▍ | 600/12313 [27:14<8:52:32, 2.73s/it] 5%|▍ | 601/12313 [27:16<8:48:58, 2.71s/it] {'loss': 0.6972, 'grad_norm': 4.6352382568877335, 'learning_rate': 4.995386046421614e-06, 'epoch': 0.05} 5%|▍ | 601/12313 [27:16<8:48:58, 2.71s/it] 5%|▍ | 602/12313 [27:19<8:51:19, 2.72s/it] {'loss': 0.502, 'grad_norm': 4.7603716353361305, 'learning_rate': 4.9953460247306035e-06, 'epoch': 0.05} 5%|▍ | 602/12313 [27:19<8:51:19, 2.72s/it] 5%|▍ | 603/12313 [27:22<9:03:04, 2.78s/it] {'loss': 0.9242, 'grad_norm': 4.752679614641838, 'learning_rate': 4.995305830374854e-06, 'epoch': 0.05} 5%|▍ | 603/12313 [27:22<9:03:04, 2.78s/it] 5%|▍ | 604/12313 [27:25<9:13:46, 2.84s/it] {'loss': 0.7566, 'grad_norm': 3.4417838568029855, 'learning_rate': 4.995265463357147e-06, 'epoch': 0.05} 5%|▍ | 604/12313 [27:25<9:13:46, 2.84s/it] 5%|▍ | 605/12313 [27:28<9:04:00, 2.79s/it] {'loss': 0.6201, 'grad_norm': 6.888094078901035, 'learning_rate': 4.995224923680277e-06, 'epoch': 0.05} 5%|▍ | 605/12313 [27:28<9:04:00, 2.79s/it] 5%|▍ | 606/12313 [27:30<8:49:47, 2.72s/it] {'loss': 0.7392, 'grad_norm': 4.161212593195377, 'learning_rate': 4.995184211347046e-06, 'epoch': 0.05} 5%|▍ | 606/12313 [27:30<8:49:47, 2.72s/it] 5%|▍ | 607/12313 [27:33<8:48:44, 2.71s/it] {'loss': 0.6514, 'grad_norm': 7.5719084378043515, 'learning_rate': 4.995143326360274e-06, 'epoch': 0.05} 5%|▍ | 607/12313 [27:33<8:48:44, 2.71s/it] 5%|▍ | 608/12313 [27:36<8:44:19, 2.69s/it] {'loss': 0.5716, 'grad_norm': 10.224202349098311, 'learning_rate': 4.99510226872279e-06, 'epoch': 0.05} 5%|▍ | 608/12313 [27:36<8:44:19, 2.69s/it] 5%|▍ | 609/12313 [27:38<8:43:48, 2.69s/it] {'loss': 0.6206, 'grad_norm': 5.767252392060593, 'learning_rate': 4.995061038437434e-06, 'epoch': 0.05} 5%|▍ | 609/12313 [27:38<8:43:48, 2.69s/it] 5%|▍ | 610/12313 [27:41<8:51:16, 2.72s/it] {'loss': 0.5774, 'grad_norm': 5.065547686279448, 'learning_rate': 4.995019635507059e-06, 'epoch': 0.05} 5%|▍ | 610/12313 [27:41<8:51:16, 2.72s/it] 5%|▍ | 611/12313 [27:44<8:43:53, 2.69s/it] {'loss': 0.5498, 'grad_norm': 5.65097481771209, 'learning_rate': 4.9949780599345295e-06, 'epoch': 0.05} 5%|▍ | 611/12313 [27:44<8:43:53, 2.69s/it] 5%|▍ | 612/12313 [27:47<9:34:30, 2.95s/it] {'loss': 0.6711, 'grad_norm': 6.523300211900963, 'learning_rate': 4.994936311722723e-06, 'epoch': 0.05} 5%|▍ | 612/12313 [27:47<9:34:30, 2.95s/it] 5%|▍ | 613/12313 [27:50<9:25:57, 2.90s/it] {'loss': 0.7809, 'grad_norm': 6.2556246954187715, 'learning_rate': 4.994894390874527e-06, 'epoch': 0.05} 5%|▍ | 613/12313 [27:50<9:25:57, 2.90s/it] 5%|▍ | 614/12313 [27:53<9:12:28, 2.83s/it] {'loss': 0.5909, 'grad_norm': 5.630465680189662, 'learning_rate': 4.994852297392845e-06, 'epoch': 0.05} 5%|▍ | 614/12313 [27:53<9:12:28, 2.83s/it] 5%|▍ | 615/12313 [27:55<9:01:33, 2.78s/it] {'loss': 0.8296, 'grad_norm': 5.019843596472891, 'learning_rate': 4.994810031280587e-06, 'epoch': 0.05} 5%|▍ | 615/12313 [27:55<9:01:33, 2.78s/it] 5%|▌ | 616/12313 [27:58<8:51:43, 2.73s/it] {'loss': 0.5424, 'grad_norm': 6.614977325258197, 'learning_rate': 4.994767592540678e-06, 'epoch': 0.05} 5%|▌ | 616/12313 [27:58<8:51:43, 2.73s/it] 5%|▌ | 617/12313 [28:01<9:24:18, 2.89s/it] {'loss': 0.6781, 'grad_norm': 6.173835775193037, 'learning_rate': 4.9947249811760555e-06, 'epoch': 0.05} 5%|▌ | 617/12313 [28:01<9:24:18, 2.89s/it] 5%|▌ | 618/12313 [28:04<9:27:53, 2.91s/it] {'loss': 0.6022, 'grad_norm': 7.520930623536426, 'learning_rate': 4.994682197189667e-06, 'epoch': 0.05} 5%|▌ | 618/12313 [28:04<9:27:53, 2.91s/it] 5%|▌ | 619/12313 [28:07<9:07:29, 2.81s/it] {'loss': 0.6612, 'grad_norm': 8.187916646181824, 'learning_rate': 4.994639240584474e-06, 'epoch': 0.05} 5%|▌ | 619/12313 [28:07<9:07:29, 2.81s/it] 5%|▌ | 620/12313 [28:10<9:52:38, 3.04s/it] {'loss': 0.7443, 'grad_norm': 7.453069836355382, 'learning_rate': 4.994596111363448e-06, 'epoch': 0.05} 5%|▌ | 620/12313 [28:10<9:52:38, 3.04s/it] 5%|▌ | 621/12313 [28:13<9:33:12, 2.94s/it] {'loss': 0.5906, 'grad_norm': 7.747733385771382, 'learning_rate': 4.994552809529573e-06, 'epoch': 0.05} 5%|▌ | 621/12313 [28:13<9:33:12, 2.94s/it] 5%|▌ | 622/12313 [28:16<9:26:41, 2.91s/it] {'loss': 0.591, 'grad_norm': 9.01741469170529, 'learning_rate': 4.994509335085847e-06, 'epoch': 0.05} 5%|▌ | 622/12313 [28:16<9:26:41, 2.91s/it] 5%|▌ | 623/12313 [28:19<9:29:25, 2.92s/it] {'loss': 0.9078, 'grad_norm': 6.323489129605991, 'learning_rate': 4.994465688035276e-06, 'epoch': 0.05} 5%|▌ | 623/12313 [28:19<9:29:25, 2.92s/it] 5%|▌ | 624/12313 [28:21<9:17:25, 2.86s/it] {'loss': 0.525, 'grad_norm': 13.299317733180853, 'learning_rate': 4.994421868380881e-06, 'epoch': 0.05} 5%|▌ | 624/12313 [28:21<9:17:25, 2.86s/it] 5%|▌ | 625/12313 [28:24<9:10:53, 2.83s/it] {'loss': 0.5857, 'grad_norm': 4.296125414620111, 'learning_rate': 4.994377876125695e-06, 'epoch': 0.05} 5%|▌ | 625/12313 [28:24<9:10:53, 2.83s/it] 5%|▌ | 626/12313 [28:27<9:01:10, 2.78s/it] {'loss': 0.6115, 'grad_norm': 3.877492783088978, 'learning_rate': 4.994333711272761e-06, 'epoch': 0.05} 5%|▌ | 626/12313 [28:27<9:01:10, 2.78s/it] 5%|▌ | 627/12313 [28:30<9:13:35, 2.84s/it] {'loss': 0.5045, 'grad_norm': 7.011247326243034, 'learning_rate': 4.9942893738251355e-06, 'epoch': 0.05} 5%|▌ | 627/12313 [28:30<9:13:35, 2.84s/it] 5%|▌ | 628/12313 [28:32<8:49:27, 2.72s/it] {'loss': 0.6668, 'grad_norm': 7.046298816782971, 'learning_rate': 4.994244863785887e-06, 'epoch': 0.05} 5%|▌ | 628/12313 [28:32<8:49:27, 2.72s/it] 5%|▌ | 629/12313 [28:35<8:47:47, 2.71s/it] {'loss': 0.5775, 'grad_norm': 13.568359594461246, 'learning_rate': 4.994200181158093e-06, 'epoch': 0.05} 5%|▌ | 629/12313 [28:35<8:47:47, 2.71s/it] 5%|▌ | 630/12313 [28:38<8:40:05, 2.67s/it] {'loss': 0.5511, 'grad_norm': 7.5605209242829545, 'learning_rate': 4.9941553259448475e-06, 'epoch': 0.05} 5%|▌ | 630/12313 [28:38<8:40:05, 2.67s/it] 5%|▌ | 631/12313 [28:41<8:57:31, 2.76s/it] {'loss': 0.6272, 'grad_norm': 7.085727266285509, 'learning_rate': 4.994110298149253e-06, 'epoch': 0.05} 5%|▌ | 631/12313 [28:41<8:57:31, 2.76s/it] 5%|▌ | 632/12313 [28:43<8:43:11, 2.69s/it] {'loss': 0.6843, 'grad_norm': 4.890560092384863, 'learning_rate': 4.994065097774426e-06, 'epoch': 0.05} 5%|▌ | 632/12313 [28:43<8:43:11, 2.69s/it] 5%|▌ | 633/12313 [28:45<8:25:51, 2.60s/it] {'loss': 0.9322, 'grad_norm': 5.665397086701724, 'learning_rate': 4.994019724823495e-06, 'epoch': 0.05} 5%|▌ | 633/12313 [28:45<8:25:51, 2.60s/it] 5%|▌ | 634/12313 [28:48<8:25:09, 2.60s/it] {'loss': 0.9683, 'grad_norm': 12.520866191934164, 'learning_rate': 4.993974179299597e-06, 'epoch': 0.05} 5%|▌ | 634/12313 [28:48<8:25:09, 2.60s/it] 5%|▌ | 635/12313 [28:51<8:25:56, 2.60s/it] {'loss': 0.7176, 'grad_norm': 6.353771093218972, 'learning_rate': 4.993928461205885e-06, 'epoch': 0.05} 5%|▌ | 635/12313 [28:51<8:25:56, 2.60s/it] 5%|▌ | 636/12313 [28:53<8:33:18, 2.64s/it] {'loss': 0.6833, 'grad_norm': 6.398842006345174, 'learning_rate': 4.993882570545523e-06, 'epoch': 0.05} 5%|▌ | 636/12313 [28:53<8:33:18, 2.64s/it] 5%|▌ | 637/12313 [28:56<8:31:51, 2.63s/it] {'loss': 0.554, 'grad_norm': 4.720771899259599, 'learning_rate': 4.993836507321686e-06, 'epoch': 0.05} 5%|▌ | 637/12313 [28:56<8:31:51, 2.63s/it] 5%|▌ | 638/12313 [28:59<8:42:03, 2.68s/it] {'loss': 0.8592, 'grad_norm': 6.5529107721965945, 'learning_rate': 4.9937902715375605e-06, 'epoch': 0.05} 5%|▌ | 638/12313 [28:59<8:42:03, 2.68s/it] 5%|▌ | 639/12313 [29:01<8:39:03, 2.67s/it] {'loss': 0.6642, 'grad_norm': 6.335582258551991, 'learning_rate': 4.993743863196348e-06, 'epoch': 0.05} 5%|▌ | 639/12313 [29:01<8:39:03, 2.67s/it] 5%|▌ | 640/12313 [29:04<8:39:49, 2.67s/it] {'loss': 0.6354, 'grad_norm': 5.488427009047834, 'learning_rate': 4.993697282301256e-06, 'epoch': 0.05} 5%|▌ | 640/12313 [29:04<8:39:49, 2.67s/it] 5%|▌ | 641/12313 [29:07<8:41:49, 2.68s/it] {'loss': 0.6361, 'grad_norm': 4.435230831471148, 'learning_rate': 4.99365052885551e-06, 'epoch': 0.05} 5%|▌ | 641/12313 [29:07<8:41:49, 2.68s/it] 5%|▌ | 642/12313 [29:10<8:56:26, 2.76s/it] {'loss': 0.6415, 'grad_norm': 3.973892603431311, 'learning_rate': 4.9936036028623465e-06, 'epoch': 0.05} 5%|▌ | 642/12313 [29:10<8:56:26, 2.76s/it] 5%|▌ | 643/12313 [29:12<8:48:47, 2.72s/it] {'loss': 0.6895, 'grad_norm': 11.239148633823167, 'learning_rate': 4.99355650432501e-06, 'epoch': 0.05} 5%|▌ | 643/12313 [29:12<8:48:47, 2.72s/it] 5%|▌ | 644/12313 [29:16<9:11:42, 2.84s/it] {'loss': 0.5007, 'grad_norm': 4.274914346041189, 'learning_rate': 4.993509233246761e-06, 'epoch': 0.05} 5%|▌ | 644/12313 [29:16<9:11:42, 2.84s/it] 5%|▌ | 645/12313 [29:18<9:08:03, 2.82s/it] {'loss': 0.6882, 'grad_norm': 6.237193126781422, 'learning_rate': 4.9934617896308675e-06, 'epoch': 0.05} 5%|▌ | 645/12313 [29:18<9:08:03, 2.82s/it] 5%|▌ | 646/12313 [29:21<9:00:43, 2.78s/it] {'loss': 0.6089, 'grad_norm': 3.98788426140263, 'learning_rate': 4.993414173480617e-06, 'epoch': 0.05} 5%|▌ | 646/12313 [29:21<9:00:43, 2.78s/it] 5%|▌ | 647/12313 [29:24<8:57:04, 2.76s/it] {'loss': 0.7351, 'grad_norm': 14.335138449996226, 'learning_rate': 4.9933663847993005e-06, 'epoch': 0.05} 5%|▌ | 647/12313 [29:24<8:57:04, 2.76s/it] 5%|▌ | 648/12313 [29:26<8:47:30, 2.71s/it] {'loss': 0.7582, 'grad_norm': 7.149335425005816, 'learning_rate': 4.9933184235902275e-06, 'epoch': 0.05} 5%|▌ | 648/12313 [29:26<8:47:30, 2.71s/it] 5%|▌ | 649/12313 [29:29<8:46:33, 2.71s/it] {'loss': 0.5349, 'grad_norm': 4.593224187186037, 'learning_rate': 4.993270289856714e-06, 'epoch': 0.05} 5%|▌ | 649/12313 [29:29<8:46:33, 2.71s/it] 5%|▌ | 650/12313 [29:32<8:39:13, 2.67s/it] {'loss': 0.5782, 'grad_norm': 11.086443268562299, 'learning_rate': 4.993221983602093e-06, 'epoch': 0.05} 5%|▌ | 650/12313 [29:32<8:39:13, 2.67s/it] 5%|▌ | 651/12313 [29:34<8:35:51, 2.65s/it] {'loss': 0.5413, 'grad_norm': 4.566408185176231, 'learning_rate': 4.993173504829705e-06, 'epoch': 0.05} 5%|▌ | 651/12313 [29:34<8:35:51, 2.65s/it] 5%|▌ | 652/12313 [29:37<8:28:37, 2.62s/it] {'loss': 0.7764, 'grad_norm': 4.3786114129219165, 'learning_rate': 4.993124853542906e-06, 'epoch': 0.05} 5%|▌ | 652/12313 [29:37<8:28:37, 2.62s/it] 5%|▌ | 653/12313 [29:40<8:38:33, 2.67s/it] {'loss': 0.5315, 'grad_norm': 4.4667797149749955, 'learning_rate': 4.993076029745061e-06, 'epoch': 0.05} 5%|▌ | 653/12313 [29:40<8:38:33, 2.67s/it] 5%|▌ | 654/12313 [29:42<8:26:09, 2.60s/it] {'loss': 0.6889, 'grad_norm': 4.402424708224538, 'learning_rate': 4.99302703343955e-06, 'epoch': 0.05} 5%|▌ | 654/12313 [29:42<8:26:09, 2.60s/it] 5%|▌ | 655/12313 [29:45<8:38:08, 2.67s/it] {'loss': 0.7257, 'grad_norm': 5.124489236968013, 'learning_rate': 4.992977864629762e-06, 'epoch': 0.05} 5%|▌ | 655/12313 [29:45<8:38:08, 2.67s/it] 5%|▌ | 656/12313 [29:47<8:29:32, 2.62s/it] {'loss': 0.6547, 'grad_norm': 5.821195473769332, 'learning_rate': 4.9929285233191005e-06, 'epoch': 0.05} 5%|▌ | 656/12313 [29:47<8:29:32, 2.62s/it] 5%|▌ | 657/12313 [29:50<8:28:58, 2.62s/it] {'loss': 0.5126, 'grad_norm': 6.888571998941501, 'learning_rate': 4.992879009510978e-06, 'epoch': 0.05} 5%|▌ | 657/12313 [29:50<8:28:58, 2.62s/it] 5%|▌ | 658/12313 [29:53<8:29:45, 2.62s/it] {'loss': 0.6526, 'grad_norm': 5.619280693409271, 'learning_rate': 4.992829323208822e-06, 'epoch': 0.05} 5%|▌ | 658/12313 [29:53<8:29:45, 2.62s/it] 5%|▌ | 659/12313 [29:55<8:26:51, 2.61s/it] {'loss': 0.596, 'grad_norm': 6.967418216632179, 'learning_rate': 4.992779464416069e-06, 'epoch': 0.05} 5%|▌ | 659/12313 [29:55<8:26:51, 2.61s/it] 5%|▌ | 660/12313 [29:58<8:52:13, 2.74s/it] {'loss': 0.6319, 'grad_norm': 4.9434745741217645, 'learning_rate': 4.992729433136171e-06, 'epoch': 0.05} 5%|▌ | 660/12313 [29:58<8:52:13, 2.74s/it] 5%|▌ | 661/12313 [30:01<8:49:10, 2.72s/it] {'loss': 0.5502, 'grad_norm': 8.648403799491454, 'learning_rate': 4.992679229372588e-06, 'epoch': 0.05} 5%|▌ | 661/12313 [30:01<8:49:10, 2.72s/it] 5%|▌ | 662/12313 [30:04<8:49:11, 2.73s/it] {'loss': 0.6938, 'grad_norm': 11.185178358108589, 'learning_rate': 4.9926288531287946e-06, 'epoch': 0.05} 5%|▌ | 662/12313 [30:04<8:49:11, 2.73s/it] 5%|▌ | 663/12313 [30:06<8:39:14, 2.67s/it] {'loss': 0.6992, 'grad_norm': 5.6934860327185115, 'learning_rate': 4.992578304408278e-06, 'epoch': 0.05} 5%|▌ | 663/12313 [30:06<8:39:14, 2.67s/it] 5%|▌ | 664/12313 [30:09<8:31:01, 2.63s/it] {'loss': 0.738, 'grad_norm': 5.313881451433935, 'learning_rate': 4.992527583214533e-06, 'epoch': 0.05} 5%|▌ | 664/12313 [30:09<8:31:01, 2.63s/it] 5%|▌ | 665/12313 [30:11<8:28:40, 2.62s/it] {'loss': 0.6326, 'grad_norm': 6.054202350907215, 'learning_rate': 4.992476689551071e-06, 'epoch': 0.05} 5%|▌ | 665/12313 [30:11<8:28:40, 2.62s/it] 5%|▌ | 666/12313 [30:14<8:35:51, 2.66s/it] {'loss': 0.7014, 'grad_norm': 7.4133417376434725, 'learning_rate': 4.992425623421414e-06, 'epoch': 0.05} 5%|▌ | 666/12313 [30:14<8:35:51, 2.66s/it] 5%|▌ | 667/12313 [30:17<8:26:58, 2.61s/it] {'loss': 0.7564, 'grad_norm': 4.221490444985979, 'learning_rate': 4.992374384829094e-06, 'epoch': 0.05} 5%|▌ | 667/12313 [30:17<8:26:58, 2.61s/it] 5%|▌ | 668/12313 [30:19<8:47:06, 2.72s/it] {'loss': 0.6686, 'grad_norm': 10.257715188715267, 'learning_rate': 4.992322973777658e-06, 'epoch': 0.05} 5%|▌ | 668/12313 [30:19<8:47:06, 2.72s/it] 5%|▌ | 669/12313 [30:22<8:44:18, 2.70s/it] {'loss': 0.5668, 'grad_norm': 4.701020797968543, 'learning_rate': 4.992271390270662e-06, 'epoch': 0.05} 5%|▌ | 669/12313 [30:22<8:44:18, 2.70s/it] 5%|▌ | 670/12313 [30:25<8:46:06, 2.71s/it] {'loss': 0.563, 'grad_norm': 5.172562987556883, 'learning_rate': 4.992219634311677e-06, 'epoch': 0.05} 5%|▌ | 670/12313 [30:25<8:46:06, 2.71s/it] 5%|▌ | 671/12313 [30:28<8:45:45, 2.71s/it] {'loss': 0.694, 'grad_norm': 8.921514201488444, 'learning_rate': 4.992167705904282e-06, 'epoch': 0.05} 5%|▌ | 671/12313 [30:28<8:45:45, 2.71s/it] 5%|▌ | 672/12313 [30:30<8:43:55, 2.70s/it] {'loss': 0.5912, 'grad_norm': 3.807857859349466, 'learning_rate': 4.992115605052072e-06, 'epoch': 0.05} 5%|▌ | 672/12313 [30:30<8:43:55, 2.70s/it] 5%|▌ | 673/12313 [30:33<8:56:59, 2.77s/it] {'loss': 0.4573, 'grad_norm': 3.2898787814816357, 'learning_rate': 4.992063331758651e-06, 'epoch': 0.05} 5%|▌ | 673/12313 [30:33<8:56:59, 2.77s/it] 5%|▌ | 674/12313 [30:36<8:58:48, 2.78s/it] {'loss': 0.6525, 'grad_norm': 3.4263761182268526, 'learning_rate': 4.9920108860276375e-06, 'epoch': 0.05} 5%|▌ | 674/12313 [30:36<8:58:48, 2.78s/it] 5%|▌ | 675/12313 [30:39<8:53:23, 2.75s/it] {'loss': 0.51, 'grad_norm': 4.749436044976401, 'learning_rate': 4.991958267862659e-06, 'epoch': 0.05} 5%|▌ | 675/12313 [30:39<8:53:23, 2.75s/it] 5%|▌ | 676/12313 [30:41<8:50:57, 2.74s/it] {'loss': 0.6024, 'grad_norm': 5.875542464144077, 'learning_rate': 4.991905477267356e-06, 'epoch': 0.05} 5%|▌ | 676/12313 [30:41<8:50:57, 2.74s/it] 5%|▌ | 677/12313 [30:44<8:53:26, 2.75s/it] {'loss': 0.6975, 'grad_norm': 5.5252942081115695, 'learning_rate': 4.991852514245384e-06, 'epoch': 0.05} 5%|▌ | 677/12313 [30:44<8:53:26, 2.75s/it] 6%|▌ | 678/12313 [30:47<9:03:09, 2.80s/it] {'loss': 0.7191, 'grad_norm': 3.5503049002474336, 'learning_rate': 4.991799378800404e-06, 'epoch': 0.06} 6%|▌ | 678/12313 [30:47<9:03:09, 2.80s/it] 6%|▌ | 679/12313 [30:50<9:01:18, 2.79s/it] {'loss': 0.6743, 'grad_norm': 3.9287521299019947, 'learning_rate': 4.9917460709360955e-06, 'epoch': 0.06} 6%|▌ | 679/12313 [30:50<9:01:18, 2.79s/it] 6%|▌ | 680/12313 [30:53<8:53:25, 2.75s/it] {'loss': 0.6277, 'grad_norm': 7.458997142187155, 'learning_rate': 4.991692590656146e-06, 'epoch': 0.06} 6%|▌ | 680/12313 [30:53<8:53:25, 2.75s/it] 6%|▌ | 681/12313 [30:55<8:32:57, 2.65s/it] {'loss': 0.5941, 'grad_norm': 5.196449898312582, 'learning_rate': 4.991638937964257e-06, 'epoch': 0.06} 6%|▌ | 681/12313 [30:55<8:32:57, 2.65s/it] 6%|▌ | 682/12313 [30:57<8:29:35, 2.63s/it] {'loss': 0.4626, 'grad_norm': 7.808673495814518, 'learning_rate': 4.9915851128641405e-06, 'epoch': 0.06} 6%|▌ | 682/12313 [30:57<8:29:35, 2.63s/it] 6%|▌ | 683/12313 [31:00<8:31:20, 2.64s/it] {'loss': 0.6285, 'grad_norm': 10.233712958106928, 'learning_rate': 4.991531115359519e-06, 'epoch': 0.06} 6%|▌ | 683/12313 [31:00<8:31:20, 2.64s/it] 6%|▌ | 684/12313 [31:03<8:32:12, 2.64s/it] {'loss': 0.5607, 'grad_norm': 4.467794770096275, 'learning_rate': 4.991476945454133e-06, 'epoch': 0.06} 6%|▌ | 684/12313 [31:03<8:32:12, 2.64s/it] 6%|▌ | 685/12313 [31:06<8:46:19, 2.72s/it] {'loss': 0.5919, 'grad_norm': 5.521835573749263, 'learning_rate': 4.991422603151727e-06, 'epoch': 0.06} 6%|▌ | 685/12313 [31:06<8:46:19, 2.72s/it] 6%|▌ | 686/12313 [31:08<8:37:00, 2.67s/it] {'loss': 0.5745, 'grad_norm': 5.041631495422818, 'learning_rate': 4.991368088456062e-06, 'epoch': 0.06} 6%|▌ | 686/12313 [31:08<8:37:00, 2.67s/it] 6%|▌ | 687/12313 [31:11<8:36:54, 2.67s/it] {'loss': 0.7395, 'grad_norm': 4.822878210770882, 'learning_rate': 4.99131340137091e-06, 'epoch': 0.06} 6%|▌ | 687/12313 [31:11<8:36:54, 2.67s/it] 6%|▌ | 688/12313 [31:13<8:22:46, 2.60s/it] {'loss': 0.587, 'grad_norm': 8.016561518905647, 'learning_rate': 4.991258541900058e-06, 'epoch': 0.06} 6%|▌ | 688/12313 [31:13<8:22:46, 2.60s/it] 6%|▌ | 689/12313 [31:16<8:29:39, 2.63s/it] {'loss': 0.7209, 'grad_norm': 8.66516164876571, 'learning_rate': 4.991203510047299e-06, 'epoch': 0.06} 6%|▌ | 689/12313 [31:16<8:29:39, 2.63s/it] 6%|▌ | 690/12313 [31:19<8:45:29, 2.71s/it] {'loss': 0.5705, 'grad_norm': 4.158292161529495, 'learning_rate': 4.991148305816441e-06, 'epoch': 0.06} 6%|▌ | 690/12313 [31:19<8:45:29, 2.71s/it] 6%|▌ | 691/12313 [31:22<8:37:13, 2.67s/it] {'loss': 0.6297, 'grad_norm': 8.004317430680866, 'learning_rate': 4.991092929211305e-06, 'epoch': 0.06} 6%|▌ | 691/12313 [31:22<8:37:13, 2.67s/it] 6%|▌ | 692/12313 [31:24<8:34:46, 2.66s/it] {'loss': 0.5579, 'grad_norm': 7.148231924748558, 'learning_rate': 4.9910373802357214e-06, 'epoch': 0.06} 6%|▌ | 692/12313 [31:24<8:34:46, 2.66s/it] 6%|▌ | 693/12313 [31:28<9:20:28, 2.89s/it] {'loss': 0.5622, 'grad_norm': 4.294720627427241, 'learning_rate': 4.990981658893535e-06, 'epoch': 0.06} 6%|▌ | 693/12313 [31:28<9:20:28, 2.89s/it] 6%|▌ | 694/12313 [31:30<8:59:53, 2.79s/it] {'loss': 0.8874, 'grad_norm': 7.230336818000405, 'learning_rate': 4.990925765188602e-06, 'epoch': 0.06} 6%|▌ | 694/12313 [31:30<8:59:53, 2.79s/it] 6%|▌ | 695/12313 [31:33<8:54:06, 2.76s/it] {'loss': 0.6678, 'grad_norm': 3.9400202583073662, 'learning_rate': 4.9908696991247885e-06, 'epoch': 0.06} 6%|▌ | 695/12313 [31:33<8:54:06, 2.76s/it] 6%|▌ | 696/12313 [31:36<8:51:17, 2.74s/it] {'loss': 0.6598, 'grad_norm': 4.144892106492086, 'learning_rate': 4.990813460705975e-06, 'epoch': 0.06} 6%|▌ | 696/12313 [31:36<8:51:17, 2.74s/it] 6%|▌ | 697/12313 [31:38<8:45:28, 2.71s/it] {'loss': 0.5704, 'grad_norm': 5.029300780914533, 'learning_rate': 4.990757049936051e-06, 'epoch': 0.06} 6%|▌ | 697/12313 [31:38<8:45:28, 2.71s/it] 6%|▌ | 698/12313 [31:41<8:51:04, 2.74s/it] {'loss': 0.4935, 'grad_norm': 5.440381358402857, 'learning_rate': 4.990700466818923e-06, 'epoch': 0.06} 6%|▌ | 698/12313 [31:41<8:51:04, 2.74s/it] 6%|▌ | 699/12313 [31:44<8:40:12, 2.69s/it] {'loss': 0.6541, 'grad_norm': 64.88422852309522, 'learning_rate': 4.990643711358504e-06, 'epoch': 0.06} 6%|▌ | 699/12313 [31:44<8:40:12, 2.69s/it] 6%|▌ | 700/12313 [31:46<8:40:14, 2.69s/it] {'loss': 0.6356, 'grad_norm': 9.761223335464411, 'learning_rate': 4.990586783558722e-06, 'epoch': 0.06} 6%|▌ | 700/12313 [31:46<8:40:14, 2.69s/it] 6%|▌ | 701/12313 [31:49<8:44:39, 2.71s/it] {'loss': 0.6032, 'grad_norm': 8.471075449596462, 'learning_rate': 4.990529683423515e-06, 'epoch': 0.06} 6%|▌ | 701/12313 [31:49<8:44:39, 2.71s/it] 6%|▌ | 702/12313 [31:52<8:54:52, 2.76s/it] {'loss': 0.6641, 'grad_norm': 8.610029295506125, 'learning_rate': 4.990472410956835e-06, 'epoch': 0.06} 6%|▌ | 702/12313 [31:52<8:54:52, 2.76s/it] 6%|▌ | 703/12313 [31:55<8:49:24, 2.74s/it] {'loss': 0.7349, 'grad_norm': 8.074345676422041, 'learning_rate': 4.9904149661626456e-06, 'epoch': 0.06} 6%|▌ | 703/12313 [31:55<8:49:24, 2.74s/it] 6%|▌ | 704/12313 [31:57<8:27:42, 2.62s/it] {'loss': 0.601, 'grad_norm': 6.475287058051118, 'learning_rate': 4.99035734904492e-06, 'epoch': 0.06} 6%|▌ | 704/12313 [31:57<8:27:42, 2.62s/it] 6%|▌ | 705/12313 [32:00<8:25:02, 2.61s/it] {'loss': 0.6079, 'grad_norm': 5.1928926651854335, 'learning_rate': 4.990299559607646e-06, 'epoch': 0.06} 6%|▌ | 705/12313 [32:00<8:25:02, 2.61s/it] 6%|▌ | 706/12313 [32:02<8:35:30, 2.66s/it] {'loss': 0.7949, 'grad_norm': 5.520091117650318, 'learning_rate': 4.990241597854822e-06, 'epoch': 0.06} 6%|▌ | 706/12313 [32:02<8:35:30, 2.66s/it] 6%|▌ | 707/12313 [32:05<8:23:58, 2.61s/it] {'loss': 0.5618, 'grad_norm': 6.225321905485309, 'learning_rate': 4.99018346379046e-06, 'epoch': 0.06} 6%|▌ | 707/12313 [32:05<8:23:58, 2.61s/it] 6%|▌ | 708/12313 [32:07<8:18:08, 2.58s/it] {'loss': 0.5987, 'grad_norm': 4.444351207370442, 'learning_rate': 4.99012515741858e-06, 'epoch': 0.06} 6%|▌ | 708/12313 [32:07<8:18:08, 2.58s/it] 6%|▌ | 709/12313 [32:10<8:22:49, 2.60s/it] {'loss': 0.5409, 'grad_norm': 11.383212330047579, 'learning_rate': 4.990066678743219e-06, 'epoch': 0.06} 6%|▌ | 709/12313 [32:10<8:22:49, 2.60s/it] 6%|▌ | 710/12313 [32:13<8:26:10, 2.62s/it] {'loss': 0.8077, 'grad_norm': 5.0174724436449765, 'learning_rate': 4.9900080277684224e-06, 'epoch': 0.06} 6%|▌ | 710/12313 [32:13<8:26:10, 2.62s/it] 6%|▌ | 711/12313 [32:15<8:33:38, 2.66s/it] {'loss': 0.7131, 'grad_norm': 6.82758331108689, 'learning_rate': 4.989949204498248e-06, 'epoch': 0.06} 6%|▌ | 711/12313 [32:15<8:33:38, 2.66s/it] 6%|▌ | 712/12313 [32:18<8:27:50, 2.63s/it] {'loss': 0.7143, 'grad_norm': 4.599805153792023, 'learning_rate': 4.989890208936767e-06, 'epoch': 0.06} 6%|▌ | 712/12313 [32:18<8:27:50, 2.63s/it] 6%|▌ | 713/12313 [32:21<8:36:43, 2.67s/it] {'loss': 0.4989, 'grad_norm': 5.615806754693292, 'learning_rate': 4.98983104108806e-06, 'epoch': 0.06} 6%|▌ | 713/12313 [32:21<8:36:43, 2.67s/it] 6%|▌ | 714/12313 [32:23<8:32:11, 2.65s/it] {'loss': 0.6002, 'grad_norm': 5.944167262991299, 'learning_rate': 4.989771700956223e-06, 'epoch': 0.06} 6%|▌ | 714/12313 [32:23<8:32:11, 2.65s/it] 6%|▌ | 715/12313 [32:26<8:33:40, 2.66s/it] {'loss': 0.5249, 'grad_norm': 5.363075574799061, 'learning_rate': 4.989712188545362e-06, 'epoch': 0.06} 6%|▌ | 715/12313 [32:26<8:33:40, 2.66s/it] 6%|▌ | 716/12313 [32:29<8:51:23, 2.75s/it] {'loss': 0.6282, 'grad_norm': 5.71188897749546, 'learning_rate': 4.989652503859592e-06, 'epoch': 0.06} 6%|▌ | 716/12313 [32:29<8:51:23, 2.75s/it] 6%|▌ | 717/12313 [32:32<8:43:26, 2.71s/it] {'loss': 0.8009, 'grad_norm': 6.4060919441849835, 'learning_rate': 4.989592646903047e-06, 'epoch': 0.06} 6%|▌ | 717/12313 [32:32<8:43:26, 2.71s/it] 6%|▌ | 718/12313 [32:34<8:29:40, 2.64s/it] {'loss': 0.5443, 'grad_norm': 5.806526365359627, 'learning_rate': 4.989532617679866e-06, 'epoch': 0.06} 6%|▌ | 718/12313 [32:34<8:29:40, 2.64s/it] 6%|▌ | 719/12313 [32:37<8:32:48, 2.65s/it] {'loss': 0.5308, 'grad_norm': 5.459430108939258, 'learning_rate': 4.989472416194204e-06, 'epoch': 0.06} 6%|▌ | 719/12313 [32:37<8:32:48, 2.65s/it] 6%|▌ | 720/12313 [32:40<8:46:53, 2.73s/it] {'loss': 0.6284, 'grad_norm': 5.688152424438068, 'learning_rate': 4.9894120424502254e-06, 'epoch': 0.06} 6%|▌ | 720/12313 [32:40<8:46:53, 2.73s/it] 6%|▌ | 721/12313 [32:42<8:35:23, 2.67s/it] {'loss': 0.5055, 'grad_norm': 7.784647717876923, 'learning_rate': 4.989351496452109e-06, 'epoch': 0.06} 6%|▌ | 721/12313 [32:42<8:35:23, 2.67s/it] 6%|▌ | 722/12313 [32:45<8:30:02, 2.64s/it] {'loss': 0.6513, 'grad_norm': 4.6787370535165245, 'learning_rate': 4.9892907782040435e-06, 'epoch': 0.06} 6%|▌ | 722/12313 [32:45<8:30:02, 2.64s/it] 6%|▌ | 723/12313 [32:48<8:46:28, 2.73s/it] {'loss': 0.7293, 'grad_norm': 6.799832486685082, 'learning_rate': 4.9892298877102305e-06, 'epoch': 0.06} 6%|▌ | 723/12313 [32:48<8:46:28, 2.73s/it] 6%|▌ | 724/12313 [32:50<8:46:19, 2.72s/it] {'loss': 0.5597, 'grad_norm': 5.66494166268401, 'learning_rate': 4.989168824974884e-06, 'epoch': 0.06} 6%|▌ | 724/12313 [32:50<8:46:19, 2.72s/it] 6%|▌ | 725/12313 [32:53<8:45:31, 2.72s/it] {'loss': 0.6171, 'grad_norm': 6.3608187122909206, 'learning_rate': 4.989107590002228e-06, 'epoch': 0.06} 6%|▌ | 725/12313 [32:53<8:45:31, 2.72s/it] 6%|▌ | 726/12313 [32:56<8:42:27, 2.71s/it] {'loss': 0.5414, 'grad_norm': 6.134461895108829, 'learning_rate': 4.989046182796501e-06, 'epoch': 0.06} 6%|▌ | 726/12313 [32:56<8:42:27, 2.71s/it] 6%|▌ | 727/12313 [32:59<8:54:08, 2.77s/it] {'loss': 0.6406, 'grad_norm': 7.100189240185222, 'learning_rate': 4.988984603361949e-06, 'epoch': 0.06} 6%|▌ | 727/12313 [32:59<8:54:08, 2.77s/it] 6%|▌ | 728/12313 [33:01<8:44:05, 2.71s/it] {'loss': 0.6249, 'grad_norm': 5.1992788570507535, 'learning_rate': 4.988922851702837e-06, 'epoch': 0.06} 6%|▌ | 728/12313 [33:01<8:44:05, 2.71s/it] 6%|▌ | 729/12313 [33:04<8:42:17, 2.71s/it] {'loss': 0.8036, 'grad_norm': 4.3545253550275635, 'learning_rate': 4.988860927823436e-06, 'epoch': 0.06} 6%|▌ | 729/12313 [33:04<8:42:17, 2.71s/it] 6%|▌ | 730/12313 [33:07<8:43:25, 2.71s/it] {'loss': 0.5943, 'grad_norm': 5.3206883020912, 'learning_rate': 4.988798831728031e-06, 'epoch': 0.06} 6%|▌ | 730/12313 [33:07<8:43:25, 2.71s/it] 6%|▌ | 731/12313 [33:09<8:39:59, 2.69s/it] {'loss': 0.7094, 'grad_norm': 6.089917627663601, 'learning_rate': 4.9887365634209186e-06, 'epoch': 0.06} 6%|▌ | 731/12313 [33:09<8:39:59, 2.69s/it] 6%|▌ | 732/12313 [33:12<8:41:34, 2.70s/it] {'loss': 0.5626, 'grad_norm': 3.956275748674879, 'learning_rate': 4.9886741229064075e-06, 'epoch': 0.06} 6%|▌ | 732/12313 [33:12<8:41:34, 2.70s/it] 6%|▌ | 733/12313 [33:15<8:30:31, 2.65s/it] {'loss': 0.5764, 'grad_norm': 4.511627354156277, 'learning_rate': 4.988611510188818e-06, 'epoch': 0.06} 6%|▌ | 733/12313 [33:15<8:30:31, 2.65s/it] 6%|▌ | 734/12313 [33:17<8:28:26, 2.63s/it] {'loss': 0.5982, 'grad_norm': 6.4116489379834025, 'learning_rate': 4.988548725272482e-06, 'epoch': 0.06} 6%|▌ | 734/12313 [33:17<8:28:26, 2.63s/it] 6%|▌ | 735/12313 [33:20<8:18:01, 2.58s/it] {'loss': 0.5339, 'grad_norm': 6.311885375734862, 'learning_rate': 4.988485768161746e-06, 'epoch': 0.06} 6%|▌ | 735/12313 [33:20<8:18:01, 2.58s/it] 6%|▌ | 736/12313 [33:22<8:28:26, 2.64s/it] {'loss': 0.584, 'grad_norm': 3.9308120481243902, 'learning_rate': 4.988422638860964e-06, 'epoch': 0.06} 6%|▌ | 736/12313 [33:22<8:28:26, 2.64s/it] 6%|▌ | 737/12313 [33:25<8:58:04, 2.79s/it] {'loss': 0.5078, 'grad_norm': 4.880711425301184, 'learning_rate': 4.988359337374505e-06, 'epoch': 0.06} 6%|▌ | 737/12313 [33:25<8:58:04, 2.79s/it] 6%|▌ | 738/12313 [33:28<8:54:17, 2.77s/it] {'loss': 0.5754, 'grad_norm': 5.281870836321762, 'learning_rate': 4.988295863706751e-06, 'epoch': 0.06} 6%|▌ | 738/12313 [33:28<8:54:17, 2.77s/it] 6%|▌ | 739/12313 [33:31<8:41:46, 2.70s/it] {'loss': 0.6391, 'grad_norm': 5.0230810112537485, 'learning_rate': 4.988232217862091e-06, 'epoch': 0.06} 6%|▌ | 739/12313 [33:31<8:41:46, 2.70s/it] 6%|▌ | 740/12313 [33:33<8:41:48, 2.71s/it] {'loss': 0.5283, 'grad_norm': 6.955537743254657, 'learning_rate': 4.988168399844931e-06, 'epoch': 0.06} 6%|▌ | 740/12313 [33:33<8:41:48, 2.71s/it] 6%|▌ | 741/12313 [33:36<8:39:04, 2.69s/it] {'loss': 0.653, 'grad_norm': 4.147704339162062, 'learning_rate': 4.988104409659685e-06, 'epoch': 0.06} 6%|▌ | 741/12313 [33:36<8:39:04, 2.69s/it] 6%|▌ | 742/12313 [33:39<8:39:40, 2.69s/it] {'loss': 0.6525, 'grad_norm': 7.537477012145328, 'learning_rate': 4.988040247310783e-06, 'epoch': 0.06} 6%|▌ | 742/12313 [33:39<8:39:40, 2.69s/it] 6%|▌ | 743/12313 [33:42<8:42:35, 2.71s/it] {'loss': 0.5931, 'grad_norm': 4.863974888473347, 'learning_rate': 4.987975912802663e-06, 'epoch': 0.06} 6%|▌ | 743/12313 [33:42<8:42:35, 2.71s/it] 6%|▌ | 744/12313 [33:44<8:36:11, 2.68s/it] {'loss': 0.4838, 'grad_norm': 5.366327422788624, 'learning_rate': 4.9879114061397784e-06, 'epoch': 0.06} 6%|▌ | 744/12313 [33:44<8:36:11, 2.68s/it] 6%|▌ | 745/12313 [33:47<8:21:24, 2.60s/it] {'loss': 0.681, 'grad_norm': 4.656417177398882, 'learning_rate': 4.987846727326591e-06, 'epoch': 0.06} 6%|▌ | 745/12313 [33:47<8:21:24, 2.60s/it] 6%|▌ | 746/12313 [33:49<8:17:01, 2.58s/it] {'loss': 0.6331, 'grad_norm': 20.788942947113796, 'learning_rate': 4.987781876367576e-06, 'epoch': 0.06} 6%|▌ | 746/12313 [33:49<8:17:01, 2.58s/it] 6%|▌ | 747/12313 [33:52<8:10:56, 2.55s/it] {'loss': 0.5166, 'grad_norm': 7.459258545982763, 'learning_rate': 4.987716853267222e-06, 'epoch': 0.06} 6%|▌ | 747/12313 [33:52<8:10:56, 2.55s/it] 6%|▌ | 748/12313 [33:54<8:23:58, 2.61s/it] {'loss': 0.4729, 'grad_norm': 7.51821304798715, 'learning_rate': 4.9876516580300285e-06, 'epoch': 0.06} 6%|▌ | 748/12313 [33:54<8:23:58, 2.61s/it] 6%|▌ | 749/12313 [33:57<8:22:58, 2.61s/it] {'loss': 0.5615, 'grad_norm': 6.15090790030722, 'learning_rate': 4.987586290660506e-06, 'epoch': 0.06} 6%|▌ | 749/12313 [33:57<8:22:58, 2.61s/it] 6%|▌ | 750/12313 [33:59<8:17:12, 2.58s/it] {'loss': 0.4638, 'grad_norm': 5.226446559415324, 'learning_rate': 4.987520751163176e-06, 'epoch': 0.06} 6%|▌ | 750/12313 [33:59<8:17:12, 2.58s/it] 6%|▌ | 751/12313 [34:03<8:50:17, 2.75s/it] {'loss': 0.5201, 'grad_norm': 6.752685506869791, 'learning_rate': 4.9874550395425764e-06, 'epoch': 0.06} 6%|▌ | 751/12313 [34:03<8:50:17, 2.75s/it] 6%|▌ | 752/12313 [34:06<9:05:55, 2.83s/it] {'loss': 0.7494, 'grad_norm': 4.989408238882795, 'learning_rate': 4.987389155803252e-06, 'epoch': 0.06} 6%|▌ | 752/12313 [34:06<9:05:55, 2.83s/it] 6%|▌ | 753/12313 [34:08<8:59:14, 2.80s/it] {'loss': 0.65, 'grad_norm': 6.750783797733722, 'learning_rate': 4.987323099949763e-06, 'epoch': 0.06} 6%|▌ | 753/12313 [34:08<8:59:14, 2.80s/it] 6%|▌ | 754/12313 [34:11<8:51:20, 2.76s/it] {'loss': 0.6736, 'grad_norm': 7.609585837462416, 'learning_rate': 4.9872568719866795e-06, 'epoch': 0.06} 6%|▌ | 754/12313 [34:11<8:51:20, 2.76s/it] 6%|▌ | 755/12313 [34:14<8:49:21, 2.75s/it] {'loss': 0.5907, 'grad_norm': 4.452864412048146, 'learning_rate': 4.987190471918584e-06, 'epoch': 0.06} 6%|▌ | 755/12313 [34:14<8:49:21, 2.75s/it] 6%|▌ | 756/12313 [34:16<8:37:11, 2.69s/it] {'loss': 0.653, 'grad_norm': 4.734835124703949, 'learning_rate': 4.98712389975007e-06, 'epoch': 0.06} 6%|▌ | 756/12313 [34:16<8:37:11, 2.69s/it] 6%|▌ | 757/12313 [34:19<8:36:15, 2.68s/it] {'loss': 0.7772, 'grad_norm': 5.928107826920242, 'learning_rate': 4.987057155485746e-06, 'epoch': 0.06} 6%|▌ | 757/12313 [34:19<8:36:15, 2.68s/it] 6%|▌ | 758/12313 [34:21<8:22:44, 2.61s/it] {'loss': 0.5529, 'grad_norm': 9.884828624607355, 'learning_rate': 4.98699023913023e-06, 'epoch': 0.06} 6%|▌ | 758/12313 [34:21<8:22:44, 2.61s/it] 6%|▌ | 759/12313 [34:24<8:16:34, 2.58s/it] {'loss': 0.6011, 'grad_norm': 4.2764549097941, 'learning_rate': 4.986923150688151e-06, 'epoch': 0.06} 6%|▌ | 759/12313 [34:24<8:16:34, 2.58s/it] 6%|▌ | 760/12313 [34:26<8:13:17, 2.56s/it] {'loss': 0.6427, 'grad_norm': 7.233144313683277, 'learning_rate': 4.986855890164152e-06, 'epoch': 0.06} 6%|▌ | 760/12313 [34:26<8:13:17, 2.56s/it] 6%|▌ | 761/12313 [34:29<8:08:26, 2.54s/it] {'loss': 0.6915, 'grad_norm': 5.536503055820394, 'learning_rate': 4.986788457562887e-06, 'epoch': 0.06} 6%|▌ | 761/12313 [34:29<8:08:26, 2.54s/it] 6%|▌ | 762/12313 [34:32<8:14:58, 2.57s/it] {'loss': 0.5991, 'grad_norm': 5.309270582449383, 'learning_rate': 4.986720852889021e-06, 'epoch': 0.06} 6%|▌ | 762/12313 [34:32<8:14:58, 2.57s/it] 6%|▌ | 763/12313 [34:34<8:20:15, 2.60s/it] {'loss': 0.8062, 'grad_norm': 4.284356637879479, 'learning_rate': 4.9866530761472335e-06, 'epoch': 0.06} 6%|▌ | 763/12313 [34:34<8:20:15, 2.60s/it] 6%|▌ | 764/12313 [34:37<8:27:07, 2.63s/it] {'loss': 0.7215, 'grad_norm': 5.643793776418088, 'learning_rate': 4.986585127342214e-06, 'epoch': 0.06} 6%|▌ | 764/12313 [34:37<8:27:07, 2.63s/it] 6%|▌ | 765/12313 [34:40<8:24:48, 2.62s/it] {'loss': 0.4926, 'grad_norm': 4.295093117802471, 'learning_rate': 4.986517006478663e-06, 'epoch': 0.06} 6%|▌ | 765/12313 [34:40<8:24:48, 2.62s/it] 6%|▌ | 766/12313 [34:42<8:21:12, 2.60s/it] {'loss': 0.5777, 'grad_norm': 5.881778143353826, 'learning_rate': 4.986448713561295e-06, 'epoch': 0.06} 6%|▌ | 766/12313 [34:42<8:21:12, 2.60s/it] 6%|▌ | 767/12313 [34:45<8:39:11, 2.70s/it] {'loss': 0.8141, 'grad_norm': 3.5376491877561533, 'learning_rate': 4.986380248594835e-06, 'epoch': 0.06} 6%|▌ | 767/12313 [34:45<8:39:11, 2.70s/it] 6%|▌ | 768/12313 [34:48<8:36:17, 2.68s/it] {'loss': 0.6347, 'grad_norm': 15.197506269463709, 'learning_rate': 4.9863116115840215e-06, 'epoch': 0.06} 6%|▌ | 768/12313 [34:48<8:36:17, 2.68s/it] 6%|▌ | 769/12313 [34:50<8:31:17, 2.66s/it] {'loss': 0.7381, 'grad_norm': 5.5635913852411765, 'learning_rate': 4.986242802533603e-06, 'epoch': 0.06} 6%|▌ | 769/12313 [34:50<8:31:17, 2.66s/it] 6%|▋ | 770/12313 [34:53<8:31:19, 2.66s/it] {'loss': 0.5331, 'grad_norm': 4.921640572698924, 'learning_rate': 4.986173821448341e-06, 'epoch': 0.06} 6%|▋ | 770/12313 [34:53<8:31:19, 2.66s/it] 6%|▋ | 771/12313 [34:56<8:52:05, 2.77s/it] {'loss': 0.7414, 'grad_norm': 4.243356448186626, 'learning_rate': 4.9861046683330085e-06, 'epoch': 0.06} 6%|▋ | 771/12313 [34:56<8:52:05, 2.77s/it] 6%|▋ | 772/12313 [34:59<8:42:38, 2.72s/it] {'loss': 0.7152, 'grad_norm': 4.7676286369533685, 'learning_rate': 4.986035343192389e-06, 'epoch': 0.06} 6%|▋ | 772/12313 [34:59<8:42:38, 2.72s/it] 6%|▋ | 773/12313 [35:01<8:25:57, 2.63s/it] {'loss': 0.5741, 'grad_norm': 5.052963380949943, 'learning_rate': 4.985965846031283e-06, 'epoch': 0.06} 6%|▋ | 773/12313 [35:01<8:25:57, 2.63s/it] 6%|▋ | 774/12313 [35:04<8:31:17, 2.66s/it] {'loss': 0.5442, 'grad_norm': 5.830703109939469, 'learning_rate': 4.985896176854496e-06, 'epoch': 0.06} 6%|▋ | 774/12313 [35:04<8:31:17, 2.66s/it] 6%|▋ | 775/12313 [35:06<8:34:35, 2.68s/it] {'loss': 0.6674, 'grad_norm': 28.565367004080375, 'learning_rate': 4.9858263356668505e-06, 'epoch': 0.06} 6%|▋ | 775/12313 [35:06<8:34:35, 2.68s/it] 6%|▋ | 776/12313 [35:09<8:20:29, 2.60s/it] {'loss': 0.5452, 'grad_norm': 5.2243621774870475, 'learning_rate': 4.985756322473178e-06, 'epoch': 0.06} 6%|▋ | 776/12313 [35:09<8:20:29, 2.60s/it] 6%|▋ | 777/12313 [35:12<8:22:52, 2.62s/it] {'loss': 0.5991, 'grad_norm': 5.380977330836419, 'learning_rate': 4.9856861372783236e-06, 'epoch': 0.06} 6%|▋ | 777/12313 [35:12<8:22:52, 2.62s/it] 6%|▋ | 778/12313 [35:14<8:30:49, 2.66s/it] {'loss': 0.661, 'grad_norm': 6.151320235729201, 'learning_rate': 4.9856157800871455e-06, 'epoch': 0.06} 6%|▋ | 778/12313 [35:14<8:30:49, 2.66s/it] 6%|▋ | 779/12313 [35:17<8:27:12, 2.64s/it] {'loss': 0.5931, 'grad_norm': 4.520649099692102, 'learning_rate': 4.985545250904509e-06, 'epoch': 0.06} 6%|▋ | 779/12313 [35:17<8:27:12, 2.64s/it] 6%|▋ | 780/12313 [35:20<8:34:07, 2.67s/it] {'loss': 0.6474, 'grad_norm': 3.8505363861712336, 'learning_rate': 4.985474549735296e-06, 'epoch': 0.06} 6%|▋ | 780/12313 [35:20<8:34:07, 2.67s/it] 6%|▋ | 781/12313 [35:22<8:35:34, 2.68s/it] {'loss': 0.7205, 'grad_norm': 4.383396583650249, 'learning_rate': 4.985403676584397e-06, 'epoch': 0.06} 6%|▋ | 781/12313 [35:22<8:35:34, 2.68s/it] 6%|▋ | 782/12313 [35:25<8:21:42, 2.61s/it] {'loss': 0.6348, 'grad_norm': 5.432225271309565, 'learning_rate': 4.985332631456719e-06, 'epoch': 0.06} 6%|▋ | 782/12313 [35:25<8:21:42, 2.61s/it] 6%|▋ | 783/12313 [35:27<8:24:29, 2.63s/it] {'loss': 0.6672, 'grad_norm': 8.317084663016479, 'learning_rate': 4.9852614143571755e-06, 'epoch': 0.06} 6%|▋ | 783/12313 [35:27<8:24:29, 2.63s/it] 6%|▋ | 784/12313 [35:30<8:31:01, 2.66s/it] {'loss': 0.5816, 'grad_norm': 6.050699233785058, 'learning_rate': 4.985190025290696e-06, 'epoch': 0.06} 6%|▋ | 784/12313 [35:30<8:31:01, 2.66s/it] 6%|▋ | 785/12313 [35:33<8:30:59, 2.66s/it] {'loss': 0.5229, 'grad_norm': 4.054546774991104, 'learning_rate': 4.985118464262219e-06, 'epoch': 0.06} 6%|▋ | 785/12313 [35:33<8:30:59, 2.66s/it] 6%|▋ | 786/12313 [35:35<8:20:24, 2.60s/it] {'loss': 0.6044, 'grad_norm': 6.331916697325334, 'learning_rate': 4.985046731276697e-06, 'epoch': 0.06} 6%|▋ | 786/12313 [35:35<8:20:24, 2.60s/it] 6%|▋ | 787/12313 [35:38<8:22:38, 2.62s/it] {'loss': 0.6758, 'grad_norm': 5.448136315607667, 'learning_rate': 4.984974826339093e-06, 'epoch': 0.06} 6%|▋ | 787/12313 [35:38<8:22:38, 2.62s/it] 6%|▋ | 788/12313 [35:41<8:24:03, 2.62s/it] {'loss': 0.5999, 'grad_norm': 3.0701662409934065, 'learning_rate': 4.984902749454382e-06, 'epoch': 0.06} 6%|▋ | 788/12313 [35:41<8:24:03, 2.62s/it] 6%|▋ | 789/12313 [35:43<8:28:13, 2.65s/it] {'loss': 0.6773, 'grad_norm': 4.6802757341120556, 'learning_rate': 4.9848305006275525e-06, 'epoch': 0.06} 6%|▋ | 789/12313 [35:43<8:28:13, 2.65s/it] 6%|▋ | 790/12313 [35:46<8:21:15, 2.61s/it] {'loss': 0.5526, 'grad_norm': 5.59540917186297, 'learning_rate': 4.984758079863603e-06, 'epoch': 0.06} 6%|▋ | 790/12313 [35:46<8:21:15, 2.61s/it] 6%|▋ | 791/12313 [35:48<8:17:25, 2.59s/it] {'loss': 0.4746, 'grad_norm': 4.319537345372907, 'learning_rate': 4.984685487167544e-06, 'epoch': 0.06} 6%|▋ | 791/12313 [35:48<8:17:25, 2.59s/it] 6%|▋ | 792/12313 [35:51<8:24:55, 2.63s/it] {'loss': 0.5985, 'grad_norm': 4.812410544795939, 'learning_rate': 4.9846127225444e-06, 'epoch': 0.06} 6%|▋ | 792/12313 [35:51<8:24:55, 2.63s/it] 6%|▋ | 793/12313 [35:54<8:29:45, 2.65s/it] {'loss': 0.8711, 'grad_norm': 3.8747262789638, 'learning_rate': 4.984539785999205e-06, 'epoch': 0.06} 6%|▋ | 793/12313 [35:54<8:29:45, 2.65s/it] 6%|▋ | 794/12313 [35:56<8:21:50, 2.61s/it] {'loss': 0.6323, 'grad_norm': 9.168257138239873, 'learning_rate': 4.984466677537007e-06, 'epoch': 0.06} 6%|▋ | 794/12313 [35:56<8:21:50, 2.61s/it] 6%|▋ | 795/12313 [35:59<8:11:11, 2.56s/it] {'loss': 0.6117, 'grad_norm': 9.621282105988284, 'learning_rate': 4.984393397162862e-06, 'epoch': 0.06} 6%|▋ | 795/12313 [35:59<8:11:11, 2.56s/it] 6%|▋ | 796/12313 [36:01<8:13:47, 2.57s/it] {'loss': 0.6991, 'grad_norm': 4.3783026160070895, 'learning_rate': 4.984319944881844e-06, 'epoch': 0.06} 6%|▋ | 796/12313 [36:01<8:13:47, 2.57s/it] 6%|▋ | 797/12313 [36:04<8:10:50, 2.56s/it] {'loss': 0.7034, 'grad_norm': 4.35590663167429, 'learning_rate': 4.984246320699033e-06, 'epoch': 0.06} 6%|▋ | 797/12313 [36:04<8:10:50, 2.56s/it] 6%|▋ | 798/12313 [36:06<8:12:24, 2.57s/it] {'loss': 0.6776, 'grad_norm': 4.746827833135994, 'learning_rate': 4.984172524619525e-06, 'epoch': 0.06} 6%|▋ | 798/12313 [36:06<8:12:24, 2.57s/it] 6%|▋ | 799/12313 [36:09<8:15:46, 2.58s/it] {'loss': 0.5483, 'grad_norm': 5.905725252167514, 'learning_rate': 4.984098556648425e-06, 'epoch': 0.06} 6%|▋ | 799/12313 [36:09<8:15:46, 2.58s/it] 6%|▋ | 800/12313 [36:12<8:21:51, 2.62s/it] {'loss': 0.688, 'grad_norm': 4.596246589585779, 'learning_rate': 4.984024416790852e-06, 'epoch': 0.06} 6%|▋ | 800/12313 [36:12<8:21:51, 2.62s/it] 7%|▋ | 801/12313 [36:15<8:40:15, 2.71s/it] {'loss': 0.6562, 'grad_norm': 4.9534579923413595, 'learning_rate': 4.983950105051936e-06, 'epoch': 0.07} 7%|▋ | 801/12313 [36:15<8:40:15, 2.71s/it] 7%|▋ | 802/12313 [36:17<8:39:33, 2.71s/it] {'loss': 0.6707, 'grad_norm': 5.123985851109989, 'learning_rate': 4.9838756214368185e-06, 'epoch': 0.07} 7%|▋ | 802/12313 [36:17<8:39:33, 2.71s/it] 7%|▋ | 803/12313 [36:20<8:35:49, 2.69s/it] {'loss': 0.6378, 'grad_norm': 5.402799215185337, 'learning_rate': 4.9838009659506535e-06, 'epoch': 0.07} 7%|▋ | 803/12313 [36:20<8:35:49, 2.69s/it] 7%|▋ | 804/12313 [36:23<8:36:43, 2.69s/it] {'loss': 0.5916, 'grad_norm': 4.492757760289716, 'learning_rate': 4.983726138598608e-06, 'epoch': 0.07} 7%|▋ | 804/12313 [36:23<8:36:43, 2.69s/it] 7%|▋ | 805/12313 [36:25<8:36:29, 2.69s/it] {'loss': 0.5023, 'grad_norm': 3.95999131962753, 'learning_rate': 4.9836511393858575e-06, 'epoch': 0.07} 7%|▋ | 805/12313 [36:25<8:36:29, 2.69s/it] 7%|▋ | 806/12313 [36:28<8:41:33, 2.72s/it] {'loss': 0.7991, 'grad_norm': 3.920207373866997, 'learning_rate': 4.983575968317593e-06, 'epoch': 0.07} 7%|▋ | 806/12313 [36:28<8:41:33, 2.72s/it] 7%|▋ | 807/12313 [36:31<8:25:58, 2.64s/it] {'loss': 0.5301, 'grad_norm': 7.315668961691789, 'learning_rate': 4.983500625399017e-06, 'epoch': 0.07} 7%|▋ | 807/12313 [36:31<8:25:58, 2.64s/it] 7%|▋ | 808/12313 [36:33<8:22:12, 2.62s/it] {'loss': 0.7372, 'grad_norm': 3.912121127147346, 'learning_rate': 4.98342511063534e-06, 'epoch': 0.07} 7%|▋ | 808/12313 [36:33<8:22:12, 2.62s/it] 7%|▋ | 809/12313 [36:36<8:25:48, 2.64s/it] {'loss': 0.5946, 'grad_norm': 7.552130637632774, 'learning_rate': 4.983349424031789e-06, 'epoch': 0.07} 7%|▋ | 809/12313 [36:36<8:25:48, 2.64s/it] 7%|▋ | 810/12313 [36:39<8:34:17, 2.68s/it] {'loss': 0.642, 'grad_norm': 6.533691238009157, 'learning_rate': 4.983273565593601e-06, 'epoch': 0.07} 7%|▋ | 810/12313 [36:39<8:34:17, 2.68s/it] 7%|▋ | 811/12313 [36:41<8:32:51, 2.68s/it] {'loss': 0.6117, 'grad_norm': 9.916532299872616, 'learning_rate': 4.983197535326024e-06, 'epoch': 0.07} 7%|▋ | 811/12313 [36:41<8:32:51, 2.68s/it] 7%|▋ | 812/12313 [36:44<8:30:31, 2.66s/it] {'loss': 0.576, 'grad_norm': 5.29822144361094, 'learning_rate': 4.983121333234321e-06, 'epoch': 0.07} 7%|▋ | 812/12313 [36:44<8:30:31, 2.66s/it] 7%|▋ | 813/12313 [36:47<8:30:20, 2.66s/it] {'loss': 0.6305, 'grad_norm': 7.011546578891193, 'learning_rate': 4.983044959323763e-06, 'epoch': 0.07} 7%|▋ | 813/12313 [36:47<8:30:20, 2.66s/it] 7%|▋ | 814/12313 [36:49<8:24:18, 2.63s/it] {'loss': 0.5282, 'grad_norm': 7.6729419826090695, 'learning_rate': 4.982968413599635e-06, 'epoch': 0.07} 7%|▋ | 814/12313 [36:49<8:24:18, 2.63s/it] 7%|▋ | 815/12313 [36:52<8:15:41, 2.59s/it] {'loss': 0.5766, 'grad_norm': 3.568801069591388, 'learning_rate': 4.982891696067234e-06, 'epoch': 0.07} 7%|▋ | 815/12313 [36:52<8:15:41, 2.59s/it] 7%|▋ | 816/12313 [36:54<8:18:22, 2.60s/it] {'loss': 0.647, 'grad_norm': 3.822836645624077, 'learning_rate': 4.9828148067318675e-06, 'epoch': 0.07} 7%|▋ | 816/12313 [36:54<8:18:22, 2.60s/it] 7%|▋ | 817/12313 [36:57<8:18:01, 2.60s/it] {'loss': 0.8134, 'grad_norm': 4.578694026237904, 'learning_rate': 4.982737745598857e-06, 'epoch': 0.07} 7%|▋ | 817/12313 [36:57<8:18:01, 2.60s/it] 7%|▋ | 818/12313 [36:59<8:15:03, 2.58s/it] {'loss': 0.6404, 'grad_norm': 6.7533794019944065, 'learning_rate': 4.982660512673534e-06, 'epoch': 0.07} 7%|▋ | 818/12313 [36:59<8:15:03, 2.58s/it] 7%|▋ | 819/12313 [37:02<8:15:12, 2.59s/it] {'loss': 0.5909, 'grad_norm': 5.791281720391061, 'learning_rate': 4.982583107961243e-06, 'epoch': 0.07} 7%|▋ | 819/12313 [37:02<8:15:12, 2.59s/it] 7%|▋ | 820/12313 [37:05<8:33:26, 2.68s/it] {'loss': 0.5977, 'grad_norm': 4.161592546084501, 'learning_rate': 4.982505531467339e-06, 'epoch': 0.07} 7%|▋ | 820/12313 [37:05<8:33:26, 2.68s/it] 7%|▋ | 821/12313 [37:08<8:35:22, 2.69s/it] {'loss': 0.5928, 'grad_norm': 6.394531347329611, 'learning_rate': 4.982427783197191e-06, 'epoch': 0.07} 7%|▋ | 821/12313 [37:08<8:35:22, 2.69s/it] 7%|▋ | 822/12313 [37:10<8:39:55, 2.71s/it] {'loss': 0.7508, 'grad_norm': 8.488158051535715, 'learning_rate': 4.982349863156179e-06, 'epoch': 0.07} 7%|▋ | 822/12313 [37:10<8:39:55, 2.71s/it] 7%|▋ | 823/12313 [37:13<8:28:17, 2.65s/it] {'loss': 0.5392, 'grad_norm': 5.428324952116931, 'learning_rate': 4.982271771349694e-06, 'epoch': 0.07} 7%|▋ | 823/12313 [37:13<8:28:17, 2.65s/it] 7%|▋ | 824/12313 [37:16<8:33:37, 2.68s/it] {'loss': 0.4991, 'grad_norm': 4.025855738456331, 'learning_rate': 4.98219350778314e-06, 'epoch': 0.07} 7%|▋ | 824/12313 [37:16<8:33:37, 2.68s/it] 7%|▋ | 825/12313 [37:18<8:34:26, 2.69s/it] {'loss': 0.522, 'grad_norm': 5.231237358948681, 'learning_rate': 4.982115072461932e-06, 'epoch': 0.07} 7%|▋ | 825/12313 [37:18<8:34:26, 2.69s/it] 7%|▋ | 826/12313 [37:21<8:37:27, 2.70s/it] {'loss': 0.6092, 'grad_norm': 3.4702942997245616, 'learning_rate': 4.9820364653914964e-06, 'epoch': 0.07} 7%|▋ | 826/12313 [37:21<8:37:27, 2.70s/it] 7%|▋ | 827/12313 [37:24<8:25:25, 2.64s/it] {'loss': 0.8485, 'grad_norm': 8.455649789106133, 'learning_rate': 4.981957686577275e-06, 'epoch': 0.07} 7%|▋ | 827/12313 [37:24<8:25:25, 2.64s/it] 7%|▋ | 828/12313 [37:26<8:30:14, 2.67s/it] {'loss': 0.5669, 'grad_norm': 6.0384535186038955, 'learning_rate': 4.981878736024716e-06, 'epoch': 0.07} 7%|▋ | 828/12313 [37:26<8:30:14, 2.67s/it] 7%|▋ | 829/12313 [37:29<8:28:17, 2.66s/it] {'loss': 0.5404, 'grad_norm': 3.4751635293718732, 'learning_rate': 4.981799613739284e-06, 'epoch': 0.07} 7%|▋ | 829/12313 [37:29<8:28:17, 2.66s/it] 7%|▋ | 830/12313 [37:32<8:29:28, 2.66s/it] {'loss': 0.512, 'grad_norm': 4.516175343460089, 'learning_rate': 4.981720319726453e-06, 'epoch': 0.07} 7%|▋ | 830/12313 [37:32<8:29:28, 2.66s/it] 7%|▋ | 831/12313 [37:34<8:29:24, 2.66s/it] {'loss': 0.7, 'grad_norm': 6.121636308285373, 'learning_rate': 4.981640853991712e-06, 'epoch': 0.07} 7%|▋ | 831/12313 [37:34<8:29:24, 2.66s/it] 7%|▋ | 832/12313 [37:37<8:32:39, 2.68s/it] {'loss': 0.6203, 'grad_norm': 4.916730473198547, 'learning_rate': 4.981561216540556e-06, 'epoch': 0.07} 7%|▋ | 832/12313 [37:37<8:32:39, 2.68s/it] 7%|▋ | 833/12313 [37:40<8:32:48, 2.68s/it] {'loss': 0.6689, 'grad_norm': 6.140736960651221, 'learning_rate': 4.981481407378498e-06, 'epoch': 0.07} 7%|▋ | 833/12313 [37:40<8:32:48, 2.68s/it] 7%|▋ | 834/12313 [37:42<8:26:57, 2.65s/it] {'loss': 0.6711, 'grad_norm': 3.651255962877865, 'learning_rate': 4.981401426511059e-06, 'epoch': 0.07} 7%|▋ | 834/12313 [37:42<8:26:57, 2.65s/it] 7%|▋ | 835/12313 [37:45<8:39:33, 2.72s/it] {'loss': 0.5962, 'grad_norm': 4.433646055135197, 'learning_rate': 4.981321273943775e-06, 'epoch': 0.07} 7%|▋ | 835/12313 [37:45<8:39:33, 2.72s/it] 7%|▋ | 836/12313 [37:48<8:25:18, 2.64s/it] {'loss': 0.6696, 'grad_norm': 4.214671257503501, 'learning_rate': 4.98124094968219e-06, 'epoch': 0.07} 7%|▋ | 836/12313 [37:48<8:25:18, 2.64s/it] 7%|▋ | 837/12313 [37:50<8:28:52, 2.66s/it] {'loss': 0.5597, 'grad_norm': 8.938425585700115, 'learning_rate': 4.981160453731864e-06, 'epoch': 0.07} 7%|▋ | 837/12313 [37:50<8:28:52, 2.66s/it] 7%|▋ | 838/12313 [37:53<8:45:40, 2.75s/it] {'loss': 0.6733, 'grad_norm': 5.5327509204918215, 'learning_rate': 4.981079786098365e-06, 'epoch': 0.07} 7%|▋ | 838/12313 [37:53<8:45:40, 2.75s/it] 7%|▋ | 839/12313 [37:56<8:43:09, 2.74s/it] {'loss': 0.6717, 'grad_norm': 4.203367930591512, 'learning_rate': 4.980998946787276e-06, 'epoch': 0.07} 7%|▋ | 839/12313 [37:56<8:43:09, 2.74s/it] 7%|▋ | 840/12313 [37:59<8:39:34, 2.72s/it] {'loss': 0.6467, 'grad_norm': 4.743933277670276, 'learning_rate': 4.98091793580419e-06, 'epoch': 0.07} 7%|▋ | 840/12313 [37:59<8:39:34, 2.72s/it] 7%|▋ | 841/12313 [38:01<8:19:39, 2.61s/it] {'loss': 0.5875, 'grad_norm': 5.519327116976564, 'learning_rate': 4.9808367531547144e-06, 'epoch': 0.07} 7%|▋ | 841/12313 [38:01<8:19:39, 2.61s/it] 7%|▋ | 842/12313 [38:04<8:22:11, 2.63s/it] {'loss': 0.5742, 'grad_norm': 5.421051052782078, 'learning_rate': 4.980755398844464e-06, 'epoch': 0.07} 7%|▋ | 842/12313 [38:04<8:22:11, 2.63s/it] 7%|▋ | 843/12313 [38:06<8:20:43, 2.62s/it] {'loss': 0.734, 'grad_norm': 5.205786851548457, 'learning_rate': 4.980673872879069e-06, 'epoch': 0.07} 7%|▋ | 843/12313 [38:06<8:20:43, 2.62s/it] 7%|▋ | 844/12313 [38:09<8:24:05, 2.64s/it] {'loss': 0.6009, 'grad_norm': 7.795985083428892, 'learning_rate': 4.980592175264172e-06, 'epoch': 0.07} 7%|▋ | 844/12313 [38:09<8:24:05, 2.64s/it] 7%|▋ | 845/12313 [38:12<8:31:19, 2.68s/it] {'loss': 0.5548, 'grad_norm': 4.005038090148275, 'learning_rate': 4.9805103060054235e-06, 'epoch': 0.07} 7%|▋ | 845/12313 [38:12<8:31:19, 2.68s/it] 7%|▋ | 846/12313 [38:14<8:28:21, 2.66s/it] {'loss': 0.5983, 'grad_norm': 5.440366650852374, 'learning_rate': 4.980428265108491e-06, 'epoch': 0.07} 7%|▋ | 846/12313 [38:14<8:28:21, 2.66s/it] 7%|▋ | 847/12313 [38:18<8:54:43, 2.80s/it] {'loss': 0.5767, 'grad_norm': 6.265163613283854, 'learning_rate': 4.980346052579049e-06, 'epoch': 0.07} 7%|▋ | 847/12313 [38:18<8:54:43, 2.80s/it] 7%|▋ | 848/12313 [38:20<9:04:27, 2.85s/it] {'loss': 0.6295, 'grad_norm': 4.18497253484256, 'learning_rate': 4.9802636684227875e-06, 'epoch': 0.07} 7%|▋ | 848/12313 [38:20<9:04:27, 2.85s/it] 7%|▋ | 849/12313 [38:23<9:09:31, 2.88s/it] {'loss': 0.5392, 'grad_norm': 36.607837712422636, 'learning_rate': 4.980181112645407e-06, 'epoch': 0.07} 7%|▋ | 849/12313 [38:23<9:09:31, 2.88s/it] 7%|▋ | 850/12313 [38:26<9:01:01, 2.83s/it] {'loss': 0.7019, 'grad_norm': 4.250875496889967, 'learning_rate': 4.9800983852526195e-06, 'epoch': 0.07} 7%|▋ | 850/12313 [38:26<9:01:01, 2.83s/it] 7%|▋ | 851/12313 [38:29<8:41:09, 2.73s/it] {'loss': 0.6024, 'grad_norm': 4.20052946210975, 'learning_rate': 4.980015486250149e-06, 'epoch': 0.07} 7%|▋ | 851/12313 [38:29<8:41:09, 2.73s/it] 7%|▋ | 852/12313 [38:31<8:41:08, 2.73s/it] {'loss': 0.505, 'grad_norm': 8.36262034310868, 'learning_rate': 4.979932415643733e-06, 'epoch': 0.07} 7%|▋ | 852/12313 [38:31<8:41:08, 2.73s/it] 7%|▋ | 853/12313 [38:34<8:39:23, 2.72s/it] {'loss': 0.6134, 'grad_norm': 3.9569054991402206, 'learning_rate': 4.9798491734391185e-06, 'epoch': 0.07} 7%|▋ | 853/12313 [38:34<8:39:23, 2.72s/it] 7%|▋ | 854/12313 [38:37<8:44:17, 2.75s/it] {'loss': 0.6008, 'grad_norm': 8.784347823002246, 'learning_rate': 4.9797657596420655e-06, 'epoch': 0.07} 7%|▋ | 854/12313 [38:37<8:44:17, 2.75s/it] 7%|▋ | 855/12313 [38:39<8:36:16, 2.70s/it] {'loss': 0.5597, 'grad_norm': 6.296297313209, 'learning_rate': 4.979682174258346e-06, 'epoch': 0.07} 7%|▋ | 855/12313 [38:39<8:36:16, 2.70s/it] 7%|▋ | 856/12313 [38:42<8:31:44, 2.68s/it] {'loss': 0.7964, 'grad_norm': 5.72675894986214, 'learning_rate': 4.979598417293743e-06, 'epoch': 0.07} 7%|▋ | 856/12313 [38:42<8:31:44, 2.68s/it] 7%|▋ | 857/12313 [38:45<8:23:44, 2.64s/it] {'loss': 0.6276, 'grad_norm': 9.153121757631224, 'learning_rate': 4.979514488754053e-06, 'epoch': 0.07} 7%|▋ | 857/12313 [38:45<8:23:44, 2.64s/it] 7%|▋ | 858/12313 [38:47<8:30:16, 2.67s/it] {'loss': 0.6616, 'grad_norm': 3.4432015000564156, 'learning_rate': 4.979430388645083e-06, 'epoch': 0.07} 7%|▋ | 858/12313 [38:47<8:30:16, 2.67s/it] 7%|▋ | 859/12313 [38:50<8:31:56, 2.68s/it] {'loss': 0.6686, 'grad_norm': 4.626309196229603, 'learning_rate': 4.979346116972653e-06, 'epoch': 0.07} 7%|▋ | 859/12313 [38:50<8:31:56, 2.68s/it] 7%|▋ | 860/12313 [38:53<8:45:16, 2.75s/it] {'loss': 0.7034, 'grad_norm': 4.3625603630162555, 'learning_rate': 4.979261673742592e-06, 'epoch': 0.07} 7%|▋ | 860/12313 [38:53<8:45:16, 2.75s/it] 7%|▋ | 861/12313 [38:56<8:57:19, 2.82s/it] {'loss': 0.7321, 'grad_norm': 3.987951551641592, 'learning_rate': 4.9791770589607455e-06, 'epoch': 0.07} 7%|▋ | 861/12313 [38:56<8:57:19, 2.82s/it] 7%|▋ | 862/12313 [38:59<8:54:58, 2.80s/it] {'loss': 0.6409, 'grad_norm': 4.806090340673884, 'learning_rate': 4.979092272632968e-06, 'epoch': 0.07} 7%|▋ | 862/12313 [38:59<8:54:58, 2.80s/it] 7%|▋ | 863/12313 [39:02<9:08:41, 2.88s/it] {'loss': 0.6183, 'grad_norm': 8.252678583793005, 'learning_rate': 4.979007314765124e-06, 'epoch': 0.07} 7%|▋ | 863/12313 [39:02<9:08:41, 2.88s/it] 7%|▋ | 864/12313 [39:05<9:01:58, 2.84s/it] {'loss': 0.5649, 'grad_norm': 7.048613756795895, 'learning_rate': 4.978922185363095e-06, 'epoch': 0.07} 7%|▋ | 864/12313 [39:05<9:01:58, 2.84s/it] 7%|▋ | 865/12313 [39:07<9:02:20, 2.84s/it] {'loss': 0.7172, 'grad_norm': 7.107709811977146, 'learning_rate': 4.97883688443277e-06, 'epoch': 0.07} 7%|▋ | 865/12313 [39:07<9:02:20, 2.84s/it] 7%|▋ | 866/12313 [39:10<8:54:00, 2.80s/it] {'loss': 0.756, 'grad_norm': 5.254387891978319, 'learning_rate': 4.9787514119800515e-06, 'epoch': 0.07} 7%|▋ | 866/12313 [39:10<8:54:00, 2.80s/it] 7%|▋ | 867/12313 [39:13<8:39:50, 2.73s/it] {'loss': 0.4973, 'grad_norm': 4.158496862178936, 'learning_rate': 4.9786657680108545e-06, 'epoch': 0.07} 7%|▋ | 867/12313 [39:13<8:39:50, 2.73s/it] 7%|▋ | 868/12313 [39:15<8:40:51, 2.73s/it] {'loss': 0.7624, 'grad_norm': 4.086601599664647, 'learning_rate': 4.978579952531104e-06, 'epoch': 0.07} 7%|▋ | 868/12313 [39:15<8:40:51, 2.73s/it] 7%|▋ | 869/12313 [39:18<8:36:52, 2.71s/it] {'loss': 0.4797, 'grad_norm': 4.066778591214372, 'learning_rate': 4.978493965546738e-06, 'epoch': 0.07} 7%|▋ | 869/12313 [39:18<8:36:52, 2.71s/it] 7%|▋ | 870/12313 [39:21<8:33:40, 2.69s/it] {'loss': 0.7739, 'grad_norm': 4.458730486420523, 'learning_rate': 4.9784078070637076e-06, 'epoch': 0.07} 7%|▋ | 870/12313 [39:21<8:33:40, 2.69s/it] 7%|▋ | 871/12313 [39:23<8:31:33, 2.68s/it] {'loss': 0.5737, 'grad_norm': 4.2724723822092425, 'learning_rate': 4.978321477087974e-06, 'epoch': 0.07} 7%|▋ | 871/12313 [39:23<8:31:33, 2.68s/it] 7%|▋ | 872/12313 [39:26<8:40:56, 2.73s/it] {'loss': 0.5447, 'grad_norm': 4.54951709242148, 'learning_rate': 4.97823497562551e-06, 'epoch': 0.07} 7%|▋ | 872/12313 [39:26<8:40:56, 2.73s/it] 7%|▋ | 873/12313 [39:29<8:31:31, 2.68s/it] {'loss': 0.7224, 'grad_norm': 4.839083574408698, 'learning_rate': 4.978148302682301e-06, 'epoch': 0.07} 7%|▋ | 873/12313 [39:29<8:31:31, 2.68s/it] 7%|▋ | 874/12313 [39:32<8:49:30, 2.78s/it] {'loss': 0.6231, 'grad_norm': 5.90958403636814, 'learning_rate': 4.978061458264346e-06, 'epoch': 0.07} 7%|▋ | 874/12313 [39:32<8:49:30, 2.78s/it] 7%|▋ | 875/12313 [39:35<9:01:59, 2.84s/it] {'loss': 0.5298, 'grad_norm': 4.950954034585027, 'learning_rate': 4.977974442377652e-06, 'epoch': 0.07} 7%|▋ | 875/12313 [39:35<9:01:59, 2.84s/it] 7%|▋ | 876/12313 [39:37<8:35:59, 2.71s/it] {'loss': 0.5893, 'grad_norm': 4.705158462536222, 'learning_rate': 4.977887255028241e-06, 'epoch': 0.07} 7%|▋ | 876/12313 [39:37<8:35:59, 2.71s/it] 7%|▋ | 877/12313 [39:40<8:43:02, 2.74s/it] {'loss': 0.6342, 'grad_norm': 6.237199358592749, 'learning_rate': 4.977799896222148e-06, 'epoch': 0.07} 7%|▋ | 877/12313 [39:40<8:43:02, 2.74s/it] 7%|▋ | 878/12313 [39:43<8:29:04, 2.67s/it] {'loss': 0.6228, 'grad_norm': 5.842398936618045, 'learning_rate': 4.977712365965414e-06, 'epoch': 0.07} 7%|▋ | 878/12313 [39:43<8:29:04, 2.67s/it] 7%|▋ | 879/12313 [39:45<8:38:39, 2.72s/it] {'loss': 0.639, 'grad_norm': 4.674400039131552, 'learning_rate': 4.9776246642640965e-06, 'epoch': 0.07} 7%|▋ | 879/12313 [39:45<8:38:39, 2.72s/it] 7%|▋ | 880/12313 [39:48<8:26:31, 2.66s/it] {'loss': 0.6026, 'grad_norm': 6.64025918584984, 'learning_rate': 4.977536791124267e-06, 'epoch': 0.07} 7%|▋ | 880/12313 [39:48<8:26:31, 2.66s/it] 7%|▋ | 881/12313 [39:50<8:24:06, 2.65s/it] {'loss': 0.8198, 'grad_norm': 6.036395710793269, 'learning_rate': 4.9774487465520025e-06, 'epoch': 0.07} 7%|▋ | 881/12313 [39:50<8:24:06, 2.65s/it] 7%|▋ | 882/12313 [39:53<8:17:43, 2.61s/it] {'loss': 0.5251, 'grad_norm': 4.761294703683568, 'learning_rate': 4.977360530553397e-06, 'epoch': 0.07} 7%|▋ | 882/12313 [39:53<8:17:43, 2.61s/it] 7%|▋ | 883/12313 [39:56<8:15:16, 2.60s/it] {'loss': 0.6486, 'grad_norm': 4.922109235469366, 'learning_rate': 4.977272143134554e-06, 'epoch': 0.07} 7%|▋ | 883/12313 [39:56<8:15:16, 2.60s/it] 7%|▋ | 884/12313 [39:58<8:17:07, 2.61s/it] {'loss': 0.6457, 'grad_norm': 4.672737077545084, 'learning_rate': 4.97718358430159e-06, 'epoch': 0.07} 7%|▋ | 884/12313 [39:58<8:17:07, 2.61s/it] 7%|▋ | 885/12313 [40:01<8:30:29, 2.68s/it] {'loss': 0.6234, 'grad_norm': 8.289465047137561, 'learning_rate': 4.977094854060631e-06, 'epoch': 0.07} 7%|▋ | 885/12313 [40:01<8:30:29, 2.68s/it] 7%|▋ | 886/12313 [40:04<8:27:16, 2.66s/it] {'loss': 0.6662, 'grad_norm': 5.587306517848276, 'learning_rate': 4.977005952417818e-06, 'epoch': 0.07} 7%|▋ | 886/12313 [40:04<8:27:16, 2.66s/it] 7%|▋ | 887/12313 [40:06<8:26:25, 2.66s/it] {'loss': 0.7314, 'grad_norm': 3.711622831241775, 'learning_rate': 4.9769168793793036e-06, 'epoch': 0.07} 7%|▋ | 887/12313 [40:06<8:26:25, 2.66s/it] 7%|▋ | 888/12313 [40:09<8:39:08, 2.73s/it] {'loss': 0.6133, 'grad_norm': 7.5781498502969145, 'learning_rate': 4.976827634951249e-06, 'epoch': 0.07} 7%|▋ | 888/12313 [40:09<8:39:08, 2.73s/it] 7%|▋ | 889/12313 [40:12<8:55:27, 2.81s/it] {'loss': 0.6408, 'grad_norm': 4.823514400373474, 'learning_rate': 4.976738219139831e-06, 'epoch': 0.07} 7%|▋ | 889/12313 [40:12<8:55:27, 2.81s/it] 7%|▋ | 890/12313 [40:15<8:48:56, 2.78s/it] {'loss': 0.6452, 'grad_norm': 8.761667850604423, 'learning_rate': 4.976648631951236e-06, 'epoch': 0.07} 7%|▋ | 890/12313 [40:15<8:48:56, 2.78s/it] 7%|▋ | 891/12313 [40:18<8:43:43, 2.75s/it] {'loss': 0.6598, 'grad_norm': 3.571671073730977, 'learning_rate': 4.976558873391663e-06, 'epoch': 0.07} 7%|▋ | 891/12313 [40:18<8:43:43, 2.75s/it] 7%|▋ | 892/12313 [40:20<8:36:43, 2.71s/it] {'loss': 0.5386, 'grad_norm': 5.322524151802511, 'learning_rate': 4.976468943467323e-06, 'epoch': 0.07} 7%|▋ | 892/12313 [40:20<8:36:43, 2.71s/it] 7%|▋ | 893/12313 [40:23<8:34:39, 2.70s/it] {'loss': 0.5516, 'grad_norm': 5.66163748773043, 'learning_rate': 4.976378842184439e-06, 'epoch': 0.07} 7%|▋ | 893/12313 [40:23<8:34:39, 2.70s/it] 7%|▋ | 894/12313 [40:26<8:32:58, 2.70s/it] {'loss': 0.6012, 'grad_norm': 5.353033651730894, 'learning_rate': 4.9762885695492454e-06, 'epoch': 0.07} 7%|▋ | 894/12313 [40:26<8:32:58, 2.70s/it] 7%|▋ | 895/12313 [40:28<8:30:08, 2.68s/it] {'loss': 0.6303, 'grad_norm': 6.855863397519137, 'learning_rate': 4.976198125567988e-06, 'epoch': 0.07} 7%|▋ | 895/12313 [40:28<8:30:08, 2.68s/it] 7%|▋ | 896/12313 [40:31<8:25:01, 2.65s/it] {'loss': 0.528, 'grad_norm': 4.84169944440175, 'learning_rate': 4.976107510246925e-06, 'epoch': 0.07} 7%|▋ | 896/12313 [40:31<8:25:01, 2.65s/it] 7%|▋ | 897/12313 [40:33<8:23:40, 2.65s/it] {'loss': 0.5648, 'grad_norm': 5.092992144693499, 'learning_rate': 4.976016723592328e-06, 'epoch': 0.07} 7%|▋ | 897/12313 [40:33<8:23:40, 2.65s/it] 7%|▋ | 898/12313 [40:36<8:21:53, 2.64s/it] {'loss': 0.5921, 'grad_norm': 4.858637678276697, 'learning_rate': 4.975925765610476e-06, 'epoch': 0.07} 7%|▋ | 898/12313 [40:36<8:21:53, 2.64s/it] 7%|▋ | 899/12313 [40:39<8:28:44, 2.67s/it] {'loss': 0.6145, 'grad_norm': 4.995201775650152, 'learning_rate': 4.975834636307667e-06, 'epoch': 0.07} 7%|▋ | 899/12313 [40:39<8:28:44, 2.67s/it] 7%|▋ | 900/12313 [40:42<8:32:09, 2.69s/it] {'loss': 0.5049, 'grad_norm': 6.2150093032360365, 'learning_rate': 4.975743335690203e-06, 'epoch': 0.07} 7%|▋ | 900/12313 [40:42<8:32:09, 2.69s/it] 7%|▋ | 901/12313 [40:44<8:23:49, 2.65s/it] {'loss': 0.5949, 'grad_norm': 5.737916586640195, 'learning_rate': 4.975651863764403e-06, 'epoch': 0.07} 7%|▋ | 901/12313 [40:44<8:23:49, 2.65s/it] 7%|▋ | 902/12313 [40:47<8:25:07, 2.66s/it] {'loss': 0.8498, 'grad_norm': 5.21474627504135, 'learning_rate': 4.975560220536596e-06, 'epoch': 0.07} 7%|▋ | 902/12313 [40:47<8:25:07, 2.66s/it] 7%|▋ | 903/12313 [40:50<8:41:41, 2.74s/it] {'loss': 0.6854, 'grad_norm': 3.9495934594747877, 'learning_rate': 4.975468406013124e-06, 'epoch': 0.07} 7%|▋ | 903/12313 [40:50<8:41:41, 2.74s/it] 7%|▋ | 904/12313 [40:52<8:39:23, 2.73s/it] {'loss': 0.585, 'grad_norm': 7.580396023436531, 'learning_rate': 4.97537642020034e-06, 'epoch': 0.07} 7%|▋ | 904/12313 [40:52<8:39:23, 2.73s/it] 7%|▋ | 905/12313 [40:55<8:38:56, 2.73s/it] {'loss': 0.5681, 'grad_norm': 6.081348176726435, 'learning_rate': 4.9752842631046075e-06, 'epoch': 0.07} 7%|▋ | 905/12313 [40:55<8:38:56, 2.73s/it] 7%|▋ | 906/12313 [40:58<8:32:42, 2.70s/it] {'loss': 0.5283, 'grad_norm': 7.6219497191818695, 'learning_rate': 4.975191934732304e-06, 'epoch': 0.07} 7%|▋ | 906/12313 [40:58<8:32:42, 2.70s/it] 7%|▋ | 907/12313 [41:00<8:22:29, 2.64s/it] {'loss': 0.544, 'grad_norm': 7.644719180949304, 'learning_rate': 4.975099435089819e-06, 'epoch': 0.07} 7%|▋ | 907/12313 [41:00<8:22:29, 2.64s/it] 7%|▋ | 908/12313 [41:03<8:25:41, 2.66s/it] {'loss': 0.6976, 'grad_norm': 4.967279914364186, 'learning_rate': 4.975006764183552e-06, 'epoch': 0.07} 7%|▋ | 908/12313 [41:03<8:25:41, 2.66s/it] 7%|▋ | 909/12313 [41:06<8:16:23, 2.61s/it] {'loss': 0.6466, 'grad_norm': 4.727348872362556, 'learning_rate': 4.974913922019916e-06, 'epoch': 0.07} 7%|▋ | 909/12313 [41:06<8:16:23, 2.61s/it] 7%|▋ | 910/12313 [41:08<8:15:19, 2.61s/it] {'loss': 0.5407, 'grad_norm': 6.6416921364354895, 'learning_rate': 4.974820908605336e-06, 'epoch': 0.07} 7%|▋ | 910/12313 [41:08<8:15:19, 2.61s/it] 7%|▋ | 911/12313 [41:11<8:25:02, 2.66s/it] {'loss': 0.6653, 'grad_norm': 4.934581673705169, 'learning_rate': 4.974727723946245e-06, 'epoch': 0.07} 7%|▋ | 911/12313 [41:11<8:25:02, 2.66s/it] 7%|▋ | 912/12313 [41:14<8:58:51, 2.84s/it] {'loss': 0.5007, 'grad_norm': 3.2488987014947286, 'learning_rate': 4.974634368049094e-06, 'epoch': 0.07} 7%|▋ | 912/12313 [41:14<8:58:51, 2.84s/it] 7%|▋ | 913/12313 [41:17<8:50:20, 2.79s/it] {'loss': 0.5501, 'grad_norm': 6.227255018738214, 'learning_rate': 4.974540840920341e-06, 'epoch': 0.07} 7%|▋ | 913/12313 [41:17<8:50:20, 2.79s/it] 7%|▋ | 914/12313 [41:19<8:36:50, 2.72s/it] {'loss': 0.7246, 'grad_norm': 4.977844807059578, 'learning_rate': 4.974447142566458e-06, 'epoch': 0.07} 7%|▋ | 914/12313 [41:19<8:36:50, 2.72s/it] 7%|▋ | 915/12313 [41:22<8:50:16, 2.79s/it] {'loss': 0.6714, 'grad_norm': 4.9582732140375105, 'learning_rate': 4.974353272993929e-06, 'epoch': 0.07} 7%|▋ | 915/12313 [41:22<8:50:16, 2.79s/it] 7%|▋ | 916/12313 [41:25<8:56:24, 2.82s/it] {'loss': 0.7354, 'grad_norm': 5.123191354056708, 'learning_rate': 4.974259232209249e-06, 'epoch': 0.07} 7%|▋ | 916/12313 [41:25<8:56:24, 2.82s/it] 7%|▋ | 917/12313 [41:28<8:50:35, 2.79s/it] {'loss': 0.6421, 'grad_norm': 5.206554108430837, 'learning_rate': 4.9741650202189245e-06, 'epoch': 0.07} 7%|▋ | 917/12313 [41:28<8:50:35, 2.79s/it] 7%|▋ | 918/12313 [41:31<8:38:39, 2.73s/it] {'loss': 0.8359, 'grad_norm': 8.459450047833085, 'learning_rate': 4.9740706370294755e-06, 'epoch': 0.07} 7%|▋ | 918/12313 [41:31<8:38:39, 2.73s/it] 7%|▋ | 919/12313 [41:33<8:22:42, 2.65s/it] {'loss': 0.6941, 'grad_norm': 4.78007740463755, 'learning_rate': 4.973976082647432e-06, 'epoch': 0.07} 7%|▋ | 919/12313 [41:33<8:22:42, 2.65s/it] 7%|▋ | 920/12313 [41:36<8:18:16, 2.62s/it] {'loss': 0.6078, 'grad_norm': 4.06407753833687, 'learning_rate': 4.9738813570793365e-06, 'epoch': 0.07} 7%|▋ | 920/12313 [41:36<8:18:16, 2.62s/it] 7%|▋ | 921/12313 [41:39<8:42:18, 2.75s/it] {'loss': 0.6072, 'grad_norm': 6.4634082754205915, 'learning_rate': 4.973786460331744e-06, 'epoch': 0.07} 7%|▋ | 921/12313 [41:39<8:42:18, 2.75s/it] 7%|▋ | 922/12313 [41:41<8:49:09, 2.79s/it] {'loss': 0.6489, 'grad_norm': 4.794171216936538, 'learning_rate': 4.973691392411221e-06, 'epoch': 0.07} 7%|▋ | 922/12313 [41:41<8:49:09, 2.79s/it] 7%|▋ | 923/12313 [41:44<8:54:10, 2.81s/it] {'loss': 0.6415, 'grad_norm': 4.048880670381879, 'learning_rate': 4.973596153324346e-06, 'epoch': 0.07} 7%|▋ | 923/12313 [41:44<8:54:10, 2.81s/it] 8%|▊ | 924/12313 [41:47<8:43:19, 2.76s/it] {'loss': 0.5398, 'grad_norm': 4.260077508939967, 'learning_rate': 4.973500743077707e-06, 'epoch': 0.08} 8%|▊ | 924/12313 [41:47<8:43:19, 2.76s/it] 8%|▊ | 925/12313 [41:50<8:45:14, 2.77s/it] {'loss': 0.5155, 'grad_norm': 4.630184924560269, 'learning_rate': 4.9734051616779085e-06, 'epoch': 0.08} 8%|▊ | 925/12313 [41:50<8:45:14, 2.77s/it] 8%|▊ | 926/12313 [41:52<8:34:30, 2.71s/it] {'loss': 0.5784, 'grad_norm': 6.107378791436437, 'learning_rate': 4.973309409131564e-06, 'epoch': 0.08} 8%|▊ | 926/12313 [41:52<8:34:30, 2.71s/it] 8%|▊ | 927/12313 [41:55<8:21:22, 2.64s/it] {'loss': 0.6463, 'grad_norm': 4.991440001564157, 'learning_rate': 4.973213485445298e-06, 'epoch': 0.08} 8%|▊ | 927/12313 [41:55<8:21:22, 2.64s/it] 8%|▊ | 928/12313 [41:58<8:27:40, 2.68s/it] {'loss': 0.5694, 'grad_norm': 3.9596877989124017, 'learning_rate': 4.973117390625746e-06, 'epoch': 0.08} 8%|▊ | 928/12313 [41:58<8:27:40, 2.68s/it] 8%|▊ | 929/12313 [42:00<8:12:25, 2.60s/it] {'loss': 0.5044, 'grad_norm': 5.9210563584390385, 'learning_rate': 4.9730211246795614e-06, 'epoch': 0.08} 8%|▊ | 929/12313 [42:00<8:12:25, 2.60s/it] 8%|▊ | 930/12313 [42:03<8:16:35, 2.62s/it] {'loss': 0.5711, 'grad_norm': 6.400935356399938, 'learning_rate': 4.9729246876134015e-06, 'epoch': 0.08} 8%|▊ | 930/12313 [42:03<8:16:35, 2.62s/it] 8%|▊ | 931/12313 [42:05<8:12:01, 2.59s/it] {'loss': 0.7171, 'grad_norm': 6.074570162257794, 'learning_rate': 4.9728280794339426e-06, 'epoch': 0.08} 8%|▊ | 931/12313 [42:05<8:12:01, 2.59s/it] 8%|▊ | 932/12313 [42:08<8:25:36, 2.67s/it] {'loss': 0.5338, 'grad_norm': 7.525931570888455, 'learning_rate': 4.972731300147867e-06, 'epoch': 0.08} 8%|▊ | 932/12313 [42:08<8:25:36, 2.67s/it] 8%|▊ | 933/12313 [42:10<8:11:04, 2.59s/it] {'loss': 0.5591, 'grad_norm': 5.22558586071865, 'learning_rate': 4.972634349761873e-06, 'epoch': 0.08} 8%|▊ | 933/12313 [42:10<8:11:04, 2.59s/it] 8%|▊ | 934/12313 [42:13<8:07:23, 2.57s/it] {'loss': 0.6477, 'grad_norm': 10.047768244682082, 'learning_rate': 4.972537228282668e-06, 'epoch': 0.08} 8%|▊ | 934/12313 [42:13<8:07:23, 2.57s/it] 8%|▊ | 935/12313 [42:16<8:11:02, 2.59s/it] {'loss': 0.5482, 'grad_norm': 6.876714921725991, 'learning_rate': 4.972439935716972e-06, 'epoch': 0.08} 8%|▊ | 935/12313 [42:16<8:11:02, 2.59s/it] 8%|▊ | 936/12313 [42:18<8:08:50, 2.58s/it] {'loss': 0.5923, 'grad_norm': 6.550177057547305, 'learning_rate': 4.972342472071518e-06, 'epoch': 0.08} 8%|▊ | 936/12313 [42:18<8:08:50, 2.58s/it] 8%|▊ | 937/12313 [42:21<8:01:14, 2.54s/it] {'loss': 0.6293, 'grad_norm': 5.709520994341491, 'learning_rate': 4.97224483735305e-06, 'epoch': 0.08} 8%|▊ | 937/12313 [42:21<8:01:14, 2.54s/it] 8%|▊ | 938/12313 [42:23<8:12:47, 2.60s/it] {'loss': 0.5793, 'grad_norm': 4.727570596744493, 'learning_rate': 4.972147031568322e-06, 'epoch': 0.08} 8%|▊ | 938/12313 [42:23<8:12:47, 2.60s/it] 8%|▊ | 939/12313 [42:26<8:22:50, 2.65s/it] {'loss': 0.7869, 'grad_norm': 5.251994860118902, 'learning_rate': 4.972049054724104e-06, 'epoch': 0.08} 8%|▊ | 939/12313 [42:26<8:22:50, 2.65s/it] 8%|▊ | 940/12313 [42:29<8:24:40, 2.66s/it] {'loss': 0.5164, 'grad_norm': 7.152014425413055, 'learning_rate': 4.9719509068271755e-06, 'epoch': 0.08} 8%|▊ | 940/12313 [42:29<8:24:40, 2.66s/it] 8%|▊ | 941/12313 [42:31<8:20:51, 2.64s/it] {'loss': 0.5625, 'grad_norm': 9.90794133427569, 'learning_rate': 4.971852587884325e-06, 'epoch': 0.08} 8%|▊ | 941/12313 [42:31<8:20:51, 2.64s/it] 8%|▊ | 942/12313 [42:34<8:18:12, 2.63s/it] {'loss': 0.6737, 'grad_norm': 5.980631957938455, 'learning_rate': 4.97175409790236e-06, 'epoch': 0.08} 8%|▊ | 942/12313 [42:34<8:18:12, 2.63s/it] 8%|▊ | 943/12313 [42:37<8:33:33, 2.71s/it] {'loss': 0.6511, 'grad_norm': 11.229665313768436, 'learning_rate': 4.97165543688809e-06, 'epoch': 0.08} 8%|▊ | 943/12313 [42:37<8:33:33, 2.71s/it] 8%|▊ | 944/12313 [42:40<8:28:35, 2.68s/it] {'loss': 0.595, 'grad_norm': 15.507076859457626, 'learning_rate': 4.971556604848346e-06, 'epoch': 0.08} 8%|▊ | 944/12313 [42:40<8:28:35, 2.68s/it] 8%|▊ | 945/12313 [42:42<8:34:43, 2.72s/it] {'loss': 0.5992, 'grad_norm': 4.875894631175207, 'learning_rate': 4.971457601789966e-06, 'epoch': 0.08} 8%|▊ | 945/12313 [42:42<8:34:43, 2.72s/it] 8%|▊ | 946/12313 [42:45<8:35:17, 2.72s/it] {'loss': 0.5121, 'grad_norm': 4.406952012061629, 'learning_rate': 4.9713584277198e-06, 'epoch': 0.08} 8%|▊ | 946/12313 [42:45<8:35:17, 2.72s/it] 8%|▊ | 947/12313 [42:48<8:26:06, 2.67s/it] {'loss': 0.521, 'grad_norm': 5.211822267384949, 'learning_rate': 4.97125908264471e-06, 'epoch': 0.08} 8%|▊ | 947/12313 [42:48<8:26:06, 2.67s/it] 8%|▊ | 948/12313 [42:50<8:29:55, 2.69s/it] {'loss': 0.5977, 'grad_norm': 6.646319909846443, 'learning_rate': 4.97115956657157e-06, 'epoch': 0.08} 8%|▊ | 948/12313 [42:50<8:29:55, 2.69s/it] 8%|▊ | 949/12313 [42:53<8:32:41, 2.71s/it] {'loss': 0.6193, 'grad_norm': 5.951258269254235, 'learning_rate': 4.971059879507268e-06, 'epoch': 0.08} 8%|▊ | 949/12313 [42:53<8:32:41, 2.71s/it] 8%|▊ | 950/12313 [42:56<8:18:39, 2.63s/it] {'loss': 0.7991, 'grad_norm': 15.849437156314497, 'learning_rate': 4.970960021458699e-06, 'epoch': 0.08} 8%|▊ | 950/12313 [42:56<8:18:39, 2.63s/it] 8%|▊ | 951/12313 [42:58<8:15:53, 2.62s/it] {'loss': 0.5992, 'grad_norm': 4.470056142523615, 'learning_rate': 4.9708599924327735e-06, 'epoch': 0.08} 8%|▊ | 951/12313 [42:58<8:15:53, 2.62s/it] 8%|▊ | 952/12313 [43:01<8:21:33, 2.65s/it] {'loss': 0.574, 'grad_norm': 7.357239226247135, 'learning_rate': 4.970759792436414e-06, 'epoch': 0.08} 8%|▊ | 952/12313 [43:01<8:21:33, 2.65s/it] 8%|▊ | 953/12313 [43:03<8:14:25, 2.61s/it] {'loss': 0.6871, 'grad_norm': 6.193524739615741, 'learning_rate': 4.970659421476553e-06, 'epoch': 0.08} 8%|▊ | 953/12313 [43:03<8:14:25, 2.61s/it] 8%|▊ | 954/12313 [43:06<8:14:51, 2.61s/it] {'loss': 0.6069, 'grad_norm': 6.192316599355833, 'learning_rate': 4.970558879560137e-06, 'epoch': 0.08} 8%|▊ | 954/12313 [43:06<8:14:51, 2.61s/it] 8%|▊ | 955/12313 [43:09<8:15:53, 2.62s/it] {'loss': 0.45, 'grad_norm': 7.025324708867777, 'learning_rate': 4.97045816669412e-06, 'epoch': 0.08} 8%|▊ | 955/12313 [43:09<8:15:53, 2.62s/it] 8%|▊ | 956/12313 [43:11<8:29:26, 2.69s/it] {'loss': 0.7007, 'grad_norm': 5.808792647688984, 'learning_rate': 4.970357282885473e-06, 'epoch': 0.08} 8%|▊ | 956/12313 [43:11<8:29:26, 2.69s/it] 8%|▊ | 957/12313 [43:14<8:46:18, 2.78s/it] {'loss': 0.5379, 'grad_norm': 5.800444379893061, 'learning_rate': 4.970256228141177e-06, 'epoch': 0.08} 8%|▊ | 957/12313 [43:14<8:46:18, 2.78s/it] 8%|▊ | 958/12313 [43:17<8:36:41, 2.73s/it] {'loss': 0.6805, 'grad_norm': 4.764091635977128, 'learning_rate': 4.970155002468223e-06, 'epoch': 0.08} 8%|▊ | 958/12313 [43:17<8:36:41, 2.73s/it] 8%|▊ | 959/12313 [43:20<8:22:49, 2.66s/it] {'loss': 0.6757, 'grad_norm': 5.223528426735964, 'learning_rate': 4.970053605873616e-06, 'epoch': 0.08} 8%|▊ | 959/12313 [43:20<8:22:49, 2.66s/it] 8%|▊ | 960/12313 [43:23<8:39:29, 2.75s/it] {'loss': 0.5716, 'grad_norm': 7.117293048534837, 'learning_rate': 4.969952038364372e-06, 'epoch': 0.08} 8%|▊ | 960/12313 [43:23<8:39:29, 2.75s/it] 8%|▊ | 961/12313 [43:25<8:34:51, 2.72s/it] {'loss': 0.7158, 'grad_norm': 8.849272822881796, 'learning_rate': 4.96985029994752e-06, 'epoch': 0.08} 8%|▊ | 961/12313 [43:25<8:34:51, 2.72s/it] 8%|▊ | 962/12313 [43:28<8:31:52, 2.71s/it] {'loss': 0.7014, 'grad_norm': 4.323247038552405, 'learning_rate': 4.969748390630097e-06, 'epoch': 0.08} 8%|▊ | 962/12313 [43:28<8:31:52, 2.71s/it] 8%|▊ | 963/12313 [43:30<8:26:16, 2.68s/it] {'loss': 0.6552, 'grad_norm': 3.15078482423997, 'learning_rate': 4.969646310419157e-06, 'epoch': 0.08} 8%|▊ | 963/12313 [43:30<8:26:16, 2.68s/it] 8%|▊ | 964/12313 [43:33<8:32:23, 2.71s/it] {'loss': 0.5731, 'grad_norm': 5.142051674034742, 'learning_rate': 4.9695440593217635e-06, 'epoch': 0.08} 8%|▊ | 964/12313 [43:33<8:32:23, 2.71s/it] 8%|▊ | 965/12313 [43:36<8:30:46, 2.70s/it] {'loss': 0.4648, 'grad_norm': 4.530754006814891, 'learning_rate': 4.96944163734499e-06, 'epoch': 0.08} 8%|▊ | 965/12313 [43:36<8:30:46, 2.70s/it] 8%|▊ | 966/12313 [43:38<8:22:10, 2.66s/it] {'loss': 0.6981, 'grad_norm': 5.999396192869684, 'learning_rate': 4.969339044495925e-06, 'epoch': 0.08} 8%|▊ | 966/12313 [43:38<8:22:10, 2.66s/it] 8%|▊ | 967/12313 [43:41<8:23:38, 2.66s/it] {'loss': 0.5834, 'grad_norm': 4.570316628434694, 'learning_rate': 4.969236280781667e-06, 'epoch': 0.08} 8%|▊ | 967/12313 [43:41<8:23:38, 2.66s/it] 8%|▊ | 968/12313 [43:44<8:19:23, 2.64s/it] {'loss': 0.6442, 'grad_norm': 29.36059594145154, 'learning_rate': 4.9691333462093264e-06, 'epoch': 0.08} 8%|▊ | 968/12313 [43:44<8:19:23, 2.64s/it] 8%|▊ | 969/12313 [43:46<8:10:40, 2.60s/it] {'loss': 0.4758, 'grad_norm': 4.95470560540118, 'learning_rate': 4.969030240786026e-06, 'epoch': 0.08} 8%|▊ | 969/12313 [43:46<8:10:40, 2.60s/it] 8%|▊ | 970/12313 [43:49<8:15:25, 2.62s/it] {'loss': 0.6544, 'grad_norm': 3.893402872277116, 'learning_rate': 4.9689269645189e-06, 'epoch': 0.08} 8%|▊ | 970/12313 [43:49<8:15:25, 2.62s/it] 8%|▊ | 971/12313 [43:52<8:15:03, 2.62s/it] {'loss': 0.611, 'grad_norm': 5.3148301362547885, 'learning_rate': 4.968823517415095e-06, 'epoch': 0.08} 8%|▊ | 971/12313 [43:52<8:15:03, 2.62s/it] 8%|▊ | 972/12313 [43:54<8:22:16, 2.66s/it] {'loss': 0.6703, 'grad_norm': 5.03652657763463, 'learning_rate': 4.9687198994817685e-06, 'epoch': 0.08} 8%|▊ | 972/12313 [43:54<8:22:16, 2.66s/it] 8%|▊ | 973/12313 [43:57<8:33:33, 2.72s/it] {'loss': 0.5816, 'grad_norm': 4.100966531743051, 'learning_rate': 4.9686161107260906e-06, 'epoch': 0.08} 8%|▊ | 973/12313 [43:57<8:33:33, 2.72s/it] 8%|▊ | 974/12313 [44:00<8:44:19, 2.77s/it] {'loss': 0.5917, 'grad_norm': 5.116827473240173, 'learning_rate': 4.968512151155242e-06, 'epoch': 0.08} 8%|▊ | 974/12313 [44:00<8:44:19, 2.77s/it] 8%|▊ | 975/12313 [44:03<8:37:36, 2.74s/it] {'loss': 0.5538, 'grad_norm': 3.2335435030359596, 'learning_rate': 4.968408020776419e-06, 'epoch': 0.08} 8%|▊ | 975/12313 [44:03<8:37:36, 2.74s/it] 8%|▊ | 976/12313 [44:06<8:44:33, 2.78s/it] {'loss': 0.461, 'grad_norm': 7.5591035161901035, 'learning_rate': 4.968303719596823e-06, 'epoch': 0.08} 8%|▊ | 976/12313 [44:06<8:44:33, 2.78s/it] 8%|▊ | 977/12313 [44:08<8:29:25, 2.70s/it] {'loss': 0.6354, 'grad_norm': 10.075832303176476, 'learning_rate': 4.9681992476236725e-06, 'epoch': 0.08} 8%|▊ | 977/12313 [44:08<8:29:25, 2.70s/it] 8%|▊ | 978/12313 [44:11<8:26:18, 2.68s/it] {'loss': 0.5383, 'grad_norm': 5.561625086524589, 'learning_rate': 4.968094604864198e-06, 'epoch': 0.08} 8%|▊ | 978/12313 [44:11<8:26:18, 2.68s/it] 8%|▊ | 979/12313 [44:13<8:28:44, 2.69s/it] {'loss': 0.6593, 'grad_norm': 5.6568874477353095, 'learning_rate': 4.967989791325639e-06, 'epoch': 0.08} 8%|▊ | 979/12313 [44:13<8:28:44, 2.69s/it] 8%|▊ | 980/12313 [44:16<8:39:55, 2.75s/it] {'loss': 0.833, 'grad_norm': 6.416186383122935, 'learning_rate': 4.967884807015247e-06, 'epoch': 0.08} 8%|▊ | 980/12313 [44:16<8:39:55, 2.75s/it] 8%|▊ | 981/12313 [44:19<8:29:12, 2.70s/it] {'loss': 0.7025, 'grad_norm': 5.7530564697103355, 'learning_rate': 4.967779651940289e-06, 'epoch': 0.08} 8%|▊ | 981/12313 [44:19<8:29:12, 2.70s/it] 8%|▊ | 982/12313 [44:22<8:28:39, 2.69s/it] {'loss': 0.5582, 'grad_norm': 6.850642228085424, 'learning_rate': 4.967674326108039e-06, 'epoch': 0.08} 8%|▊ | 982/12313 [44:22<8:28:39, 2.69s/it] 8%|▊ | 983/12313 [44:24<8:21:25, 2.66s/it] {'loss': 0.514, 'grad_norm': 4.208095970275836, 'learning_rate': 4.9675688295257855e-06, 'epoch': 0.08} 8%|▊ | 983/12313 [44:24<8:21:25, 2.66s/it] 8%|▊ | 984/12313 [44:27<8:34:08, 2.72s/it] {'loss': 0.6708, 'grad_norm': 5.1148065579680075, 'learning_rate': 4.967463162200828e-06, 'epoch': 0.08} 8%|▊ | 984/12313 [44:27<8:34:08, 2.72s/it] 8%|▊ | 985/12313 [44:30<8:20:34, 2.65s/it] {'loss': 0.639, 'grad_norm': 7.806584044546556, 'learning_rate': 4.967357324140479e-06, 'epoch': 0.08} 8%|▊ | 985/12313 [44:30<8:20:34, 2.65s/it] 8%|▊ | 986/12313 [44:32<8:07:45, 2.58s/it] {'loss': 0.7296, 'grad_norm': 7.822904916898733, 'learning_rate': 4.967251315352062e-06, 'epoch': 0.08} 8%|▊ | 986/12313 [44:32<8:07:45, 2.58s/it] 8%|▊ | 987/12313 [44:34<8:04:55, 2.57s/it] {'loss': 0.6169, 'grad_norm': 4.251509260453048, 'learning_rate': 4.9671451358429115e-06, 'epoch': 0.08} 8%|▊ | 987/12313 [44:34<8:04:55, 2.57s/it] 8%|▊ | 988/12313 [44:37<7:59:28, 2.54s/it] {'loss': 0.6307, 'grad_norm': 3.9712450920352014, 'learning_rate': 4.967038785620374e-06, 'epoch': 0.08} 8%|▊ | 988/12313 [44:37<7:59:28, 2.54s/it] 8%|▊ | 989/12313 [44:40<8:02:07, 2.55s/it] {'loss': 0.6896, 'grad_norm': 5.013282143950785, 'learning_rate': 4.96693226469181e-06, 'epoch': 0.08} 8%|▊ | 989/12313 [44:40<8:02:07, 2.55s/it] 8%|▊ | 990/12313 [44:42<8:09:48, 2.60s/it] {'loss': 0.5816, 'grad_norm': 6.309239309855388, 'learning_rate': 4.966825573064589e-06, 'epoch': 0.08} 8%|▊ | 990/12313 [44:42<8:09:48, 2.60s/it] 8%|▊ | 991/12313 [44:45<8:10:08, 2.60s/it] {'loss': 0.7024, 'grad_norm': 17.18761480554951, 'learning_rate': 4.9667187107460934e-06, 'epoch': 0.08} 8%|▊ | 991/12313 [44:45<8:10:08, 2.60s/it] 8%|▊ | 992/12313 [44:47<8:10:57, 2.60s/it] {'loss': 0.6545, 'grad_norm': 8.511668880504516, 'learning_rate': 4.966611677743719e-06, 'epoch': 0.08} 8%|▊ | 992/12313 [44:47<8:10:57, 2.60s/it] 8%|▊ | 993/12313 [44:50<8:12:33, 2.61s/it] {'loss': 0.6241, 'grad_norm': 7.07740712784662, 'learning_rate': 4.96650447406487e-06, 'epoch': 0.08} 8%|▊ | 993/12313 [44:50<8:12:33, 2.61s/it] 8%|▊ | 994/12313 [44:53<8:15:23, 2.63s/it] {'loss': 0.6096, 'grad_norm': 6.4092146264144505, 'learning_rate': 4.966397099716965e-06, 'epoch': 0.08} 8%|▊ | 994/12313 [44:53<8:15:23, 2.63s/it] 8%|▊ | 995/12313 [44:56<8:26:17, 2.68s/it] {'loss': 0.6807, 'grad_norm': 3.371066265022369, 'learning_rate': 4.9662895547074345e-06, 'epoch': 0.08} 8%|▊ | 995/12313 [44:56<8:26:17, 2.68s/it] 8%|▊ | 996/12313 [44:58<8:25:51, 2.68s/it] {'loss': 0.69, 'grad_norm': 11.15283861894202, 'learning_rate': 4.96618183904372e-06, 'epoch': 0.08} 8%|▊ | 996/12313 [44:58<8:25:51, 2.68s/it] 8%|▊ | 997/12313 [45:01<8:17:41, 2.64s/it] {'loss': 0.6545, 'grad_norm': 5.584728402833689, 'learning_rate': 4.966073952733273e-06, 'epoch': 0.08} 8%|▊ | 997/12313 [45:01<8:17:41, 2.64s/it] 8%|▊ | 998/12313 [45:03<8:09:09, 2.59s/it] {'loss': 0.7363, 'grad_norm': 4.928404998187488, 'learning_rate': 4.965965895783561e-06, 'epoch': 0.08} 8%|▊ | 998/12313 [45:03<8:09:09, 2.59s/it] 8%|▊ | 999/12313 [45:06<8:01:13, 2.55s/it] {'loss': 0.7405, 'grad_norm': 5.18162526836767, 'learning_rate': 4.96585766820206e-06, 'epoch': 0.08} 8%|▊ | 999/12313 [45:06<8:01:13, 2.55s/it] 8%|▊ | 1000/12313 [45:09<8:22:41, 2.67s/it] {'loss': 0.4666, 'grad_norm': 5.108535543984558, 'learning_rate': 4.965749269996258e-06, 'epoch': 0.08} 8%|▊ | 1000/12313 [45:09<8:22:41, 2.67s/it] 8%|▊ | 1001/12313 [45:11<8:32:40, 2.72s/it] {'loss': 0.6248, 'grad_norm': 4.081215207318422, 'learning_rate': 4.965640701173657e-06, 'epoch': 0.08} 8%|▊ | 1001/12313 [45:11<8:32:40, 2.72s/it] 8%|▊ | 1002/12313 [45:14<8:25:50, 2.68s/it] {'loss': 0.6029, 'grad_norm': 4.819602648972772, 'learning_rate': 4.9655319617417674e-06, 'epoch': 0.08} 8%|▊ | 1002/12313 [45:14<8:25:50, 2.68s/it] 8%|▊ | 1003/12313 [45:17<8:43:08, 2.78s/it] {'loss': 0.7782, 'grad_norm': 3.672836794950573, 'learning_rate': 4.965423051708116e-06, 'epoch': 0.08} 8%|▊ | 1003/12313 [45:17<8:43:08, 2.78s/it] 8%|▊ | 1004/12313 [45:20<8:44:54, 2.78s/it] {'loss': 0.5324, 'grad_norm': 4.726346322374398, 'learning_rate': 4.965313971080237e-06, 'epoch': 0.08} 8%|▊ | 1004/12313 [45:20<8:44:54, 2.78s/it] 8%|▊ | 1005/12313 [45:22<8:34:16, 2.73s/it] {'loss': 0.666, 'grad_norm': 5.155470667426197, 'learning_rate': 4.96520471986568e-06, 'epoch': 0.08} 8%|▊ | 1005/12313 [45:22<8:34:16, 2.73s/it] 8%|▊ | 1006/12313 [45:25<8:46:50, 2.80s/it] {'loss': 0.5542, 'grad_norm': 4.923143643344893, 'learning_rate': 4.965095298072001e-06, 'epoch': 0.08} 8%|▊ | 1006/12313 [45:25<8:46:50, 2.80s/it] 8%|▊ | 1007/12313 [45:28<8:41:32, 2.77s/it] {'loss': 0.5911, 'grad_norm': 4.1341959962667305, 'learning_rate': 4.964985705706775e-06, 'epoch': 0.08} 8%|▊ | 1007/12313 [45:28<8:41:32, 2.77s/it] 8%|▊ | 1008/12313 [45:31<8:32:24, 2.72s/it] {'loss': 0.7514, 'grad_norm': 5.790946529810634, 'learning_rate': 4.964875942777584e-06, 'epoch': 0.08} 8%|▊ | 1008/12313 [45:31<8:32:24, 2.72s/it] 8%|▊ | 1009/12313 [45:34<8:36:34, 2.74s/it] {'loss': 0.704, 'grad_norm': 5.188744184884793, 'learning_rate': 4.964766009292022e-06, 'epoch': 0.08} 8%|▊ | 1009/12313 [45:34<8:36:34, 2.74s/it] 8%|▊ | 1010/12313 [45:36<8:31:27, 2.72s/it] {'loss': 0.6622, 'grad_norm': 9.318471720754397, 'learning_rate': 4.9646559052576985e-06, 'epoch': 0.08} 8%|▊ | 1010/12313 [45:36<8:31:27, 2.72s/it] 8%|▊ | 1011/12313 [45:39<8:22:01, 2.67s/it] {'loss': 0.5447, 'grad_norm': 4.902314685975934, 'learning_rate': 4.9645456306822285e-06, 'epoch': 0.08} 8%|▊ | 1011/12313 [45:39<8:22:01, 2.67s/it] 8%|▊ | 1012/12313 [45:41<8:19:25, 2.65s/it] {'loss': 0.5288, 'grad_norm': 5.828651116573771, 'learning_rate': 4.964435185573245e-06, 'epoch': 0.08} 8%|▊ | 1012/12313 [45:41<8:19:25, 2.65s/it] 8%|▊ | 1013/12313 [45:44<8:28:04, 2.70s/it] {'loss': 0.6403, 'grad_norm': 2.864291283393185, 'learning_rate': 4.96432456993839e-06, 'epoch': 0.08} 8%|▊ | 1013/12313 [45:44<8:28:04, 2.70s/it] 8%|▊ | 1014/12313 [45:47<8:10:01, 2.60s/it] {'loss': 0.6291, 'grad_norm': 5.284219787159016, 'learning_rate': 4.964213783785317e-06, 'epoch': 0.08} 8%|▊ | 1014/12313 [45:47<8:10:01, 2.60s/it] 8%|▊ | 1015/12313 [45:49<8:07:34, 2.59s/it] {'loss': 0.8372, 'grad_norm': 4.43415065344841, 'learning_rate': 4.9641028271216905e-06, 'epoch': 0.08} 8%|▊ | 1015/12313 [45:49<8:07:34, 2.59s/it] 8%|▊ | 1016/12313 [45:52<8:10:01, 2.60s/it] {'loss': 0.5823, 'grad_norm': 4.37746143116903, 'learning_rate': 4.9639916999551905e-06, 'epoch': 0.08} 8%|▊ | 1016/12313 [45:52<8:10:01, 2.60s/it] 8%|▊ | 1017/12313 [45:54<8:17:53, 2.64s/it] {'loss': 0.6104, 'grad_norm': 15.987254695136924, 'learning_rate': 4.963880402293506e-06, 'epoch': 0.08} 8%|▊ | 1017/12313 [45:54<8:17:53, 2.64s/it] 8%|▊ | 1018/12313 [45:57<8:13:02, 2.62s/it] {'loss': 0.7552, 'grad_norm': 5.7244481019012925, 'learning_rate': 4.963768934144336e-06, 'epoch': 0.08} 8%|▊ | 1018/12313 [45:57<8:13:02, 2.62s/it] 8%|▊ | 1019/12313 [46:00<8:15:01, 2.63s/it] {'loss': 0.6159, 'grad_norm': 4.217271904051636, 'learning_rate': 4.963657295515396e-06, 'epoch': 0.08} 8%|▊ | 1019/12313 [46:00<8:15:01, 2.63s/it] 8%|▊ | 1020/12313 [46:02<8:22:44, 2.67s/it] {'loss': 0.6277, 'grad_norm': 7.382613542004634, 'learning_rate': 4.963545486414411e-06, 'epoch': 0.08} 8%|▊ | 1020/12313 [46:02<8:22:44, 2.67s/it] 8%|▊ | 1021/12313 [46:05<8:14:21, 2.63s/it] {'loss': 0.738, 'grad_norm': 5.353844268940195, 'learning_rate': 4.963433506849115e-06, 'epoch': 0.08} 8%|▊ | 1021/12313 [46:05<8:14:21, 2.63s/it] 8%|▊ | 1022/12313 [46:08<8:16:09, 2.64s/it] {'loss': 0.6173, 'grad_norm': 10.18166117587427, 'learning_rate': 4.963321356827258e-06, 'epoch': 0.08} 8%|▊ | 1022/12313 [46:08<8:16:09, 2.64s/it] 8%|▊ | 1023/12313 [46:10<8:12:27, 2.62s/it] {'loss': 0.6377, 'grad_norm': 7.319414111585189, 'learning_rate': 4.9632090363565995e-06, 'epoch': 0.08} 8%|▊ | 1023/12313 [46:10<8:12:27, 2.62s/it] 8%|▊ | 1024/12313 [46:13<8:16:07, 2.64s/it] {'loss': 0.5095, 'grad_norm': 6.252416872046548, 'learning_rate': 4.963096545444913e-06, 'epoch': 0.08} 8%|▊ | 1024/12313 [46:13<8:16:07, 2.64s/it] 8%|▊ | 1025/12313 [46:16<8:36:13, 2.74s/it] {'loss': 0.5225, 'grad_norm': 4.976076953773711, 'learning_rate': 4.962983884099981e-06, 'epoch': 0.08} 8%|▊ | 1025/12313 [46:16<8:36:13, 2.74s/it] 8%|▊ | 1026/12313 [46:18<8:20:47, 2.66s/it] {'loss': 0.5425, 'grad_norm': 6.865653287550914, 'learning_rate': 4.9628710523296e-06, 'epoch': 0.08} 8%|▊ | 1026/12313 [46:18<8:20:47, 2.66s/it] 8%|▊ | 1027/12313 [46:21<8:20:19, 2.66s/it] {'loss': 0.5492, 'grad_norm': 5.6514607039180005, 'learning_rate': 4.962758050141576e-06, 'epoch': 0.08} 8%|▊ | 1027/12313 [46:21<8:20:19, 2.66s/it] 8%|▊ | 1028/12313 [46:24<8:24:23, 2.68s/it] {'loss': 0.5683, 'grad_norm': 8.200634823957119, 'learning_rate': 4.962644877543729e-06, 'epoch': 0.08} 8%|▊ | 1028/12313 [46:24<8:24:23, 2.68s/it] 8%|▊ | 1029/12313 [46:26<8:22:23, 2.67s/it] {'loss': 0.7186, 'grad_norm': 4.089877382417978, 'learning_rate': 4.96253153454389e-06, 'epoch': 0.08} 8%|▊ | 1029/12313 [46:26<8:22:23, 2.67s/it] 8%|▊ | 1030/12313 [46:29<8:18:16, 2.65s/it] {'loss': 0.4817, 'grad_norm': 4.918100288171265, 'learning_rate': 4.9624180211499004e-06, 'epoch': 0.08} 8%|▊ | 1030/12313 [46:29<8:18:16, 2.65s/it] 8%|▊ | 1031/12313 [46:32<8:21:08, 2.67s/it] {'loss': 0.575, 'grad_norm': 5.979359137308161, 'learning_rate': 4.962304337369618e-06, 'epoch': 0.08} 8%|▊ | 1031/12313 [46:32<8:21:08, 2.67s/it] 8%|▊ | 1032/12313 [46:34<8:20:57, 2.66s/it] {'loss': 0.5979, 'grad_norm': 5.9147018999524645, 'learning_rate': 4.962190483210906e-06, 'epoch': 0.08} 8%|▊ | 1032/12313 [46:34<8:20:57, 2.66s/it] 8%|▊ | 1033/12313 [46:37<8:40:07, 2.77s/it] {'loss': 0.5231, 'grad_norm': 3.8882956927363614, 'learning_rate': 4.962076458681642e-06, 'epoch': 0.08} 8%|▊ | 1033/12313 [46:37<8:40:07, 2.77s/it] 8%|▊ | 1034/12313 [46:40<8:31:06, 2.72s/it] {'loss': 0.6006, 'grad_norm': 8.61999834460005, 'learning_rate': 4.96196226378972e-06, 'epoch': 0.08} 8%|▊ | 1034/12313 [46:40<8:31:06, 2.72s/it] 8%|▊ | 1035/12313 [46:43<8:27:45, 2.70s/it] {'loss': 0.7872, 'grad_norm': 4.813530840618122, 'learning_rate': 4.961847898543038e-06, 'epoch': 0.08} 8%|▊ | 1035/12313 [46:43<8:27:45, 2.70s/it] 8%|▊ | 1036/12313 [46:46<8:38:39, 2.76s/it] {'loss': 0.6882, 'grad_norm': 6.218523263233032, 'learning_rate': 4.96173336294951e-06, 'epoch': 0.08} 8%|▊ | 1036/12313 [46:46<8:38:39, 2.76s/it] 8%|▊ | 1037/12313 [46:48<8:42:04, 2.78s/it] {'loss': 0.828, 'grad_norm': 5.495336585299689, 'learning_rate': 4.961618657017063e-06, 'epoch': 0.08} 8%|▊ | 1037/12313 [46:48<8:42:04, 2.78s/it] 8%|▊ | 1038/12313 [46:51<8:48:59, 2.82s/it] {'loss': 0.8362, 'grad_norm': 6.714525494166789, 'learning_rate': 4.961503780753633e-06, 'epoch': 0.08} 8%|▊ | 1038/12313 [46:51<8:48:59, 2.82s/it] 8%|▊ | 1039/12313 [46:54<8:46:19, 2.80s/it] {'loss': 0.4224, 'grad_norm': 3.212772209979488, 'learning_rate': 4.9613887341671675e-06, 'epoch': 0.08} 8%|▊ | 1039/12313 [46:54<8:46:19, 2.80s/it] 8%|▊ | 1040/12313 [46:56<8:23:56, 2.68s/it] {'loss': 0.5646, 'grad_norm': 7.183836025951974, 'learning_rate': 4.961273517265629e-06, 'epoch': 0.08} 8%|▊ | 1040/12313 [46:56<8:23:56, 2.68s/it] 8%|▊ | 1041/12313 [46:59<8:16:20, 2.64s/it] {'loss': 0.5711, 'grad_norm': 6.286950241807605, 'learning_rate': 4.961158130056989e-06, 'epoch': 0.08} 8%|▊ | 1041/12313 [46:59<8:16:20, 2.64s/it] 8%|▊ | 1042/12313 [47:02<8:14:06, 2.63s/it] {'loss': 0.4819, 'grad_norm': 5.85967031646826, 'learning_rate': 4.961042572549232e-06, 'epoch': 0.08} 8%|▊ | 1042/12313 [47:02<8:14:06, 2.63s/it] 8%|▊ | 1043/12313 [47:04<8:16:52, 2.65s/it] {'loss': 0.5843, 'grad_norm': 4.788980234170179, 'learning_rate': 4.960926844750353e-06, 'epoch': 0.08} 8%|▊ | 1043/12313 [47:04<8:16:52, 2.65s/it] 8%|▊ | 1044/12313 [47:07<8:11:39, 2.62s/it] {'loss': 0.5618, 'grad_norm': 6.955485093136964, 'learning_rate': 4.960810946668362e-06, 'epoch': 0.08} 8%|▊ | 1044/12313 [47:07<8:11:39, 2.62s/it] 8%|▊ | 1045/12313 [47:09<8:11:48, 2.62s/it] {'loss': 0.6486, 'grad_norm': 5.484890003960757, 'learning_rate': 4.960694878311276e-06, 'epoch': 0.08} 8%|▊ | 1045/12313 [47:09<8:11:48, 2.62s/it] 8%|▊ | 1046/12313 [47:12<8:26:53, 2.70s/it] {'loss': 0.4978, 'grad_norm': 5.542258816933063, 'learning_rate': 4.960578639687129e-06, 'epoch': 0.08} 8%|▊ | 1046/12313 [47:12<8:26:53, 2.70s/it] 9%|▊ | 1047/12313 [47:15<8:36:21, 2.75s/it] {'loss': 0.6843, 'grad_norm': 5.931712433067937, 'learning_rate': 4.960462230803961e-06, 'epoch': 0.09} 9%|▊ | 1047/12313 [47:15<8:36:21, 2.75s/it] 9%|▊ | 1048/12313 [47:18<8:41:46, 2.78s/it] {'loss': 0.5678, 'grad_norm': 5.759837139562491, 'learning_rate': 4.960345651669829e-06, 'epoch': 0.09} 9%|▊ | 1048/12313 [47:18<8:41:46, 2.78s/it] 9%|▊ | 1049/12313 [47:21<8:38:30, 2.76s/it] {'loss': 0.5629, 'grad_norm': 8.019301092937635, 'learning_rate': 4.960228902292799e-06, 'epoch': 0.09} 9%|▊ | 1049/12313 [47:21<8:38:30, 2.76s/it] 9%|▊ | 1050/12313 [47:23<8:29:16, 2.71s/it] {'loss': 0.4966, 'grad_norm': 4.806150258046436, 'learning_rate': 4.96011198268095e-06, 'epoch': 0.09} 9%|▊ | 1050/12313 [47:23<8:29:16, 2.71s/it] 9%|▊ | 1051/12313 [47:26<8:20:23, 2.67s/it] {'loss': 0.6009, 'grad_norm': 6.278721005688989, 'learning_rate': 4.959994892842371e-06, 'epoch': 0.09} 9%|▊ | 1051/12313 [47:26<8:20:23, 2.67s/it] 9%|▊ | 1052/12313 [47:28<8:08:10, 2.60s/it] {'loss': 0.538, 'grad_norm': 4.006330485950786, 'learning_rate': 4.959877632785166e-06, 'epoch': 0.09} 9%|▊ | 1052/12313 [47:28<8:08:10, 2.60s/it] 9%|▊ | 1053/12313 [47:31<8:10:35, 2.61s/it] {'loss': 0.5693, 'grad_norm': 6.0015668910114, 'learning_rate': 4.959760202517446e-06, 'epoch': 0.09} 9%|▊ | 1053/12313 [47:31<8:10:35, 2.61s/it] 9%|▊ | 1054/12313 [47:34<8:05:01, 2.58s/it] {'loss': 0.5231, 'grad_norm': 17.347793241402837, 'learning_rate': 4.959642602047339e-06, 'epoch': 0.09} 9%|▊ | 1054/12313 [47:34<8:05:01, 2.58s/it] 9%|▊ | 1055/12313 [47:36<7:53:56, 2.53s/it] {'loss': 0.6282, 'grad_norm': 4.788609767567765, 'learning_rate': 4.959524831382981e-06, 'epoch': 0.09} 9%|▊ | 1055/12313 [47:36<7:53:56, 2.53s/it] 9%|▊ | 1056/12313 [47:39<8:04:17, 2.58s/it] {'loss': 0.7099, 'grad_norm': 10.272501989404129, 'learning_rate': 4.9594068905325225e-06, 'epoch': 0.09} 9%|▊ | 1056/12313 [47:39<8:04:17, 2.58s/it] 9%|▊ | 1057/12313 [47:41<7:57:17, 2.54s/it] {'loss': 0.5964, 'grad_norm': 5.360284497181169, 'learning_rate': 4.959288779504122e-06, 'epoch': 0.09} 9%|▊ | 1057/12313 [47:41<7:57:17, 2.54s/it] 9%|▊ | 1058/12313 [47:44<8:18:40, 2.66s/it] {'loss': 0.6034, 'grad_norm': 4.376816971311538, 'learning_rate': 4.959170498305955e-06, 'epoch': 0.09} 9%|▊ | 1058/12313 [47:44<8:18:40, 2.66s/it] 9%|▊ | 1059/12313 [47:47<8:37:10, 2.76s/it] {'loss': 0.5496, 'grad_norm': 7.410246905461928, 'learning_rate': 4.959052046946203e-06, 'epoch': 0.09} 9%|▊ | 1059/12313 [47:47<8:37:10, 2.76s/it] 9%|▊ | 1060/12313 [47:50<8:58:53, 2.87s/it] {'loss': 0.6964, 'grad_norm': 3.455502795284755, 'learning_rate': 4.958933425433065e-06, 'epoch': 0.09} 9%|▊ | 1060/12313 [47:50<8:58:53, 2.87s/it] 9%|▊ | 1061/12313 [47:53<8:44:33, 2.80s/it] {'loss': 0.5759, 'grad_norm': 5.754779871120552, 'learning_rate': 4.958814633774747e-06, 'epoch': 0.09} 9%|▊ | 1061/12313 [47:53<8:44:33, 2.80s/it] 9%|▊ | 1062/12313 [47:55<8:18:55, 2.66s/it] {'loss': 0.807, 'grad_norm': 5.304809027548407, 'learning_rate': 4.95869567197947e-06, 'epoch': 0.09} 9%|▊ | 1062/12313 [47:55<8:18:55, 2.66s/it] 9%|▊ | 1063/12313 [47:58<8:23:15, 2.68s/it] {'loss': 0.6127, 'grad_norm': 6.315153689798026, 'learning_rate': 4.958576540055464e-06, 'epoch': 0.09} 9%|▊ | 1063/12313 [47:58<8:23:15, 2.68s/it] 9%|▊ | 1064/12313 [48:00<8:14:35, 2.64s/it] {'loss': 0.5677, 'grad_norm': 5.159058407796672, 'learning_rate': 4.958457238010974e-06, 'epoch': 0.09} 9%|▊ | 1064/12313 [48:00<8:14:35, 2.64s/it] 9%|▊ | 1065/12313 [48:03<8:13:30, 2.63s/it] {'loss': 0.6875, 'grad_norm': 6.440705718742274, 'learning_rate': 4.958337765854254e-06, 'epoch': 0.09} 9%|▊ | 1065/12313 [48:03<8:13:30, 2.63s/it] 9%|▊ | 1066/12313 [48:06<8:11:42, 2.62s/it] {'loss': 0.7366, 'grad_norm': 5.084236342123366, 'learning_rate': 4.958218123593572e-06, 'epoch': 0.09} 9%|▊ | 1066/12313 [48:06<8:11:42, 2.62s/it] 9%|▊ | 1067/12313 [48:08<8:20:04, 2.67s/it] {'loss': 0.674, 'grad_norm': 3.7240825747904447, 'learning_rate': 4.958098311237205e-06, 'epoch': 0.09} 9%|▊ | 1067/12313 [48:08<8:20:04, 2.67s/it] 9%|▊ | 1068/12313 [48:11<8:14:33, 2.64s/it] {'loss': 0.6569, 'grad_norm': 5.058020669894491, 'learning_rate': 4.9579783287934445e-06, 'epoch': 0.09} 9%|▊ | 1068/12313 [48:11<8:14:33, 2.64s/it] 9%|▊ | 1069/12313 [48:13<8:08:39, 2.61s/it] {'loss': 0.6855, 'grad_norm': 4.0274556261885515, 'learning_rate': 4.957858176270591e-06, 'epoch': 0.09} 9%|▊ | 1069/12313 [48:13<8:08:39, 2.61s/it] 9%|▊ | 1070/12313 [48:16<8:15:38, 2.65s/it] {'loss': 0.58, 'grad_norm': 7.179073356157699, 'learning_rate': 4.957737853676961e-06, 'epoch': 0.09} 9%|▊ | 1070/12313 [48:16<8:15:38, 2.65s/it] 9%|▊ | 1071/12313 [48:19<8:07:06, 2.60s/it] {'loss': 0.515, 'grad_norm': 4.03973215146239, 'learning_rate': 4.957617361020879e-06, 'epoch': 0.09} 9%|▊ | 1071/12313 [48:19<8:07:06, 2.60s/it] 9%|▊ | 1072/12313 [48:21<8:11:01, 2.62s/it] {'loss': 0.7152, 'grad_norm': 7.191602127334415, 'learning_rate': 4.9574966983106824e-06, 'epoch': 0.09} 9%|▊ | 1072/12313 [48:21<8:11:01, 2.62s/it] 9%|▊ | 1073/12313 [48:24<8:16:39, 2.65s/it] {'loss': 0.6789, 'grad_norm': 4.274773215099363, 'learning_rate': 4.95737586555472e-06, 'epoch': 0.09} 9%|▊ | 1073/12313 [48:24<8:16:39, 2.65s/it] 9%|▊ | 1074/12313 [48:27<8:25:16, 2.70s/it] {'loss': 0.7913, 'grad_norm': 4.114285344199829, 'learning_rate': 4.957254862761354e-06, 'epoch': 0.09} 9%|▊ | 1074/12313 [48:27<8:25:16, 2.70s/it] 9%|▊ | 1075/12313 [48:29<8:15:16, 2.64s/it] {'loss': 0.7393, 'grad_norm': 6.861324021554996, 'learning_rate': 4.957133689938955e-06, 'epoch': 0.09} 9%|▊ | 1075/12313 [48:29<8:15:16, 2.64s/it] 9%|▊ | 1076/12313 [48:32<8:14:29, 2.64s/it] {'loss': 0.6303, 'grad_norm': 6.978671101537034, 'learning_rate': 4.95701234709591e-06, 'epoch': 0.09} 9%|▊ | 1076/12313 [48:32<8:14:29, 2.64s/it] 9%|▊ | 1077/12313 [48:34<7:59:32, 2.56s/it] {'loss': 0.6739, 'grad_norm': 9.578370134137916, 'learning_rate': 4.956890834240613e-06, 'epoch': 0.09} 9%|▊ | 1077/12313 [48:34<7:59:32, 2.56s/it] 9%|▉ | 1078/12313 [48:37<8:08:58, 2.61s/it] {'loss': 0.6609, 'grad_norm': 4.953780236460816, 'learning_rate': 4.956769151381474e-06, 'epoch': 0.09} 9%|▉ | 1078/12313 [48:37<8:08:58, 2.61s/it] 9%|▉ | 1079/12313 [48:40<8:09:03, 2.61s/it] {'loss': 0.5512, 'grad_norm': 6.102476761076953, 'learning_rate': 4.9566472985269125e-06, 'epoch': 0.09} 9%|▉ | 1079/12313 [48:40<8:09:03, 2.61s/it] 9%|▉ | 1080/12313 [48:42<8:10:18, 2.62s/it] {'loss': 0.5459, 'grad_norm': 4.373415052928376, 'learning_rate': 4.956525275685358e-06, 'epoch': 0.09} 9%|▉ | 1080/12313 [48:42<8:10:18, 2.62s/it] 9%|▉ | 1081/12313 [48:45<8:12:10, 2.63s/it] {'loss': 0.6867, 'grad_norm': 5.532793766061284, 'learning_rate': 4.9564030828652565e-06, 'epoch': 0.09} 9%|▉ | 1081/12313 [48:45<8:12:10, 2.63s/it] 9%|▉ | 1082/12313 [48:48<8:11:26, 2.63s/it] {'loss': 0.7538, 'grad_norm': 3.8285573439072254, 'learning_rate': 4.956280720075062e-06, 'epoch': 0.09} 9%|▉ | 1082/12313 [48:48<8:11:26, 2.63s/it] 9%|▉ | 1083/12313 [48:50<8:09:27, 2.62s/it] {'loss': 0.5405, 'grad_norm': 6.492766812771604, 'learning_rate': 4.9561581873232415e-06, 'epoch': 0.09} 9%|▉ | 1083/12313 [48:50<8:09:27, 2.62s/it] 9%|▉ | 1084/12313 [48:53<8:24:01, 2.69s/it] {'loss': 0.5867, 'grad_norm': 12.542445075286674, 'learning_rate': 4.956035484618272e-06, 'epoch': 0.09} 9%|▉ | 1084/12313 [48:53<8:24:01, 2.69s/it] 9%|▉ | 1085/12313 [48:56<8:20:15, 2.67s/it] {'loss': 0.5364, 'grad_norm': 5.8788039197140005, 'learning_rate': 4.955912611968648e-06, 'epoch': 0.09} 9%|▉ | 1085/12313 [48:56<8:20:15, 2.67s/it] 9%|▉ | 1086/12313 [48:58<8:17:46, 2.66s/it] {'loss': 0.5554, 'grad_norm': 6.820020186044584, 'learning_rate': 4.955789569382866e-06, 'epoch': 0.09} 9%|▉ | 1086/12313 [48:58<8:17:46, 2.66s/it] 9%|▉ | 1087/12313 [49:01<8:05:00, 2.59s/it] {'loss': 0.6424, 'grad_norm': 5.44097898708547, 'learning_rate': 4.955666356869445e-06, 'epoch': 0.09} 9%|▉ | 1087/12313 [49:01<8:05:00, 2.59s/it] 9%|▉ | 1088/12313 [49:04<8:10:14, 2.62s/it] {'loss': 0.6376, 'grad_norm': 6.451838410522676, 'learning_rate': 4.955542974436908e-06, 'epoch': 0.09} 9%|▉ | 1088/12313 [49:04<8:10:14, 2.62s/it] 9%|▉ | 1089/12313 [49:06<8:11:34, 2.63s/it] {'loss': 0.5683, 'grad_norm': 16.847875335305353, 'learning_rate': 4.955419422093792e-06, 'epoch': 0.09} 9%|▉ | 1089/12313 [49:06<8:11:34, 2.63s/it] 9%|▉ | 1090/12313 [49:09<8:14:15, 2.64s/it] {'loss': 0.5726, 'grad_norm': 5.250679262222032, 'learning_rate': 4.955295699848649e-06, 'epoch': 0.09} 9%|▉ | 1090/12313 [49:09<8:14:15, 2.64s/it] 9%|▉ | 1091/12313 [49:12<8:36:41, 2.76s/it] {'loss': 0.664, 'grad_norm': 3.708739891574932, 'learning_rate': 4.955171807710037e-06, 'epoch': 0.09} 9%|▉ | 1091/12313 [49:12<8:36:41, 2.76s/it] 9%|▉ | 1092/12313 [49:15<8:35:15, 2.76s/it] {'loss': 0.5727, 'grad_norm': 6.291250688681625, 'learning_rate': 4.955047745686529e-06, 'epoch': 0.09} 9%|▉ | 1092/12313 [49:15<8:35:15, 2.76s/it] 9%|▉ | 1093/12313 [49:17<8:27:10, 2.71s/it] {'loss': 0.6444, 'grad_norm': 9.32506667375153, 'learning_rate': 4.954923513786711e-06, 'epoch': 0.09} 9%|▉ | 1093/12313 [49:17<8:27:10, 2.71s/it] 9%|▉ | 1094/12313 [49:20<8:21:30, 2.68s/it] {'loss': 0.6713, 'grad_norm': 5.9130602901334735, 'learning_rate': 4.954799112019178e-06, 'epoch': 0.09} 9%|▉ | 1094/12313 [49:20<8:21:30, 2.68s/it] 9%|▉ | 1095/12313 [49:22<8:06:13, 2.60s/it] {'loss': 0.598, 'grad_norm': 7.27169259231534, 'learning_rate': 4.9546745403925385e-06, 'epoch': 0.09} 9%|▉ | 1095/12313 [49:22<8:06:13, 2.60s/it] 9%|▉ | 1096/12313 [49:25<8:08:06, 2.61s/it] {'loss': 0.5987, 'grad_norm': 4.39443693321864, 'learning_rate': 4.954549798915412e-06, 'epoch': 0.09} 9%|▉ | 1096/12313 [49:25<8:08:06, 2.61s/it] 9%|▉ | 1097/12313 [49:28<8:19:12, 2.67s/it] {'loss': 0.5794, 'grad_norm': 5.108191362159325, 'learning_rate': 4.95442488759643e-06, 'epoch': 0.09} 9%|▉ | 1097/12313 [49:28<8:19:12, 2.67s/it] 9%|▉ | 1098/12313 [49:30<8:18:09, 2.67s/it] {'loss': 0.6292, 'grad_norm': 4.394532094744748, 'learning_rate': 4.954299806444236e-06, 'epoch': 0.09} 9%|▉ | 1098/12313 [49:30<8:18:09, 2.67s/it] 9%|▉ | 1099/12313 [49:33<8:29:05, 2.72s/it] {'loss': 0.7976, 'grad_norm': 5.785973974044764, 'learning_rate': 4.954174555467484e-06, 'epoch': 0.09} 9%|▉ | 1099/12313 [49:33<8:29:05, 2.72s/it] 9%|▉ | 1100/12313 [49:36<8:24:35, 2.70s/it] {'loss': 0.4992, 'grad_norm': 8.854881338118322, 'learning_rate': 4.954049134674842e-06, 'epoch': 0.09} 9%|▉ | 1100/12313 [49:36<8:24:35, 2.70s/it] 9%|▉ | 1101/12313 [49:39<8:41:50, 2.79s/it] {'loss': 0.612, 'grad_norm': 10.211232540471157, 'learning_rate': 4.953923544074987e-06, 'epoch': 0.09} 9%|▉ | 1101/12313 [49:39<8:41:50, 2.79s/it] 9%|▉ | 1102/12313 [49:41<8:27:40, 2.72s/it] {'loss': 0.7663, 'grad_norm': 3.8402408710819724, 'learning_rate': 4.953797783676611e-06, 'epoch': 0.09} 9%|▉ | 1102/12313 [49:41<8:27:40, 2.72s/it] 9%|▉ | 1103/12313 [49:44<8:26:33, 2.71s/it] {'loss': 0.5334, 'grad_norm': 4.979965541200029, 'learning_rate': 4.9536718534884136e-06, 'epoch': 0.09} 9%|▉ | 1103/12313 [49:44<8:26:33, 2.71s/it] 9%|▉ | 1104/12313 [49:47<8:27:41, 2.72s/it] {'loss': 0.7872, 'grad_norm': 5.601550825200515, 'learning_rate': 4.9535457535191104e-06, 'epoch': 0.09} 9%|▉ | 1104/12313 [49:47<8:27:41, 2.72s/it] 9%|▉ | 1105/12313 [49:49<8:18:22, 2.67s/it] {'loss': 0.5551, 'grad_norm': 7.030742327798493, 'learning_rate': 4.953419483777427e-06, 'epoch': 0.09} 9%|▉ | 1105/12313 [49:49<8:18:22, 2.67s/it] 9%|▉ | 1106/12313 [49:52<8:21:14, 2.68s/it] {'loss': 0.5917, 'grad_norm': 11.753328029891325, 'learning_rate': 4.953293044272099e-06, 'epoch': 0.09} 9%|▉ | 1106/12313 [49:52<8:21:14, 2.68s/it] 9%|▉ | 1107/12313 [49:55<8:15:15, 2.65s/it] {'loss': 0.7211, 'grad_norm': 4.23164973424406, 'learning_rate': 4.953166435011876e-06, 'epoch': 0.09} 9%|▉ | 1107/12313 [49:55<8:15:15, 2.65s/it] 9%|▉ | 1108/12313 [49:58<8:30:10, 2.73s/it] {'loss': 0.725, 'grad_norm': 4.427467426240099, 'learning_rate': 4.953039656005519e-06, 'epoch': 0.09} 9%|▉ | 1108/12313 [49:58<8:30:10, 2.73s/it] 9%|▉ | 1109/12313 [50:01<8:42:13, 2.80s/it] {'loss': 0.7282, 'grad_norm': 6.656857308336458, 'learning_rate': 4.9529127072618e-06, 'epoch': 0.09} 9%|▉ | 1109/12313 [50:01<8:42:13, 2.80s/it] 9%|▉ | 1110/12313 [50:03<8:35:58, 2.76s/it] {'loss': 0.6008, 'grad_norm': 4.606494946768081, 'learning_rate': 4.952785588789504e-06, 'epoch': 0.09} 9%|▉ | 1110/12313 [50:03<8:35:58, 2.76s/it] 9%|▉ | 1111/12313 [50:06<8:34:07, 2.75s/it] {'loss': 0.7185, 'grad_norm': 4.861377291580884, 'learning_rate': 4.9526583005974275e-06, 'epoch': 0.09} 9%|▉ | 1111/12313 [50:06<8:34:07, 2.75s/it] 9%|▉ | 1112/12313 [50:09<8:27:27, 2.72s/it] {'loss': 0.451, 'grad_norm': 5.1172810841957785, 'learning_rate': 4.952530842694375e-06, 'epoch': 0.09} 9%|▉ | 1112/12313 [50:09<8:27:27, 2.72s/it] 9%|▉ | 1113/12313 [50:11<8:26:23, 2.71s/it] {'loss': 0.658, 'grad_norm': 5.226532353483265, 'learning_rate': 4.95240321508917e-06, 'epoch': 0.09} 9%|▉ | 1113/12313 [50:11<8:26:23, 2.71s/it] 9%|▉ | 1114/12313 [50:14<8:15:52, 2.66s/it] {'loss': 0.6415, 'grad_norm': 5.970819410829736, 'learning_rate': 4.952275417790641e-06, 'epoch': 0.09} 9%|▉ | 1114/12313 [50:14<8:15:52, 2.66s/it] 9%|▉ | 1115/12313 [50:16<8:15:54, 2.66s/it] {'loss': 0.5784, 'grad_norm': 5.235598768790702, 'learning_rate': 4.95214745080763e-06, 'epoch': 0.09} 9%|▉ | 1115/12313 [50:16<8:15:54, 2.66s/it] 9%|▉ | 1116/12313 [50:19<8:20:01, 2.68s/it] {'loss': 0.6458, 'grad_norm': 4.456877671742799, 'learning_rate': 4.952019314148995e-06, 'epoch': 0.09} 9%|▉ | 1116/12313 [50:19<8:20:01, 2.68s/it] 9%|▉ | 1117/12313 [50:22<8:14:56, 2.65s/it] {'loss': 0.7039, 'grad_norm': 5.531350494822164, 'learning_rate': 4.951891007823601e-06, 'epoch': 0.09} 9%|▉ | 1117/12313 [50:22<8:14:56, 2.65s/it] 9%|▉ | 1118/12313 [50:25<8:22:10, 2.69s/it] {'loss': 0.6661, 'grad_norm': 3.056795429652102, 'learning_rate': 4.951762531840325e-06, 'epoch': 0.09} 9%|▉ | 1118/12313 [50:25<8:22:10, 2.69s/it] 9%|▉ | 1119/12313 [50:27<8:14:56, 2.65s/it] {'loss': 0.591, 'grad_norm': 5.721632966434159, 'learning_rate': 4.951633886208057e-06, 'epoch': 0.09} 9%|▉ | 1119/12313 [50:27<8:14:56, 2.65s/it] 9%|▉ | 1120/12313 [50:30<8:14:31, 2.65s/it] {'loss': 0.6245, 'grad_norm': 8.349255186085697, 'learning_rate': 4.951505070935699e-06, 'epoch': 0.09} 9%|▉ | 1120/12313 [50:30<8:14:31, 2.65s/it] 9%|▉ | 1121/12313 [50:33<8:18:11, 2.67s/it] {'loss': 0.7326, 'grad_norm': 4.572535564011483, 'learning_rate': 4.951376086032166e-06, 'epoch': 0.09} 9%|▉ | 1121/12313 [50:33<8:18:11, 2.67s/it] 9%|▉ | 1122/12313 [50:35<8:03:35, 2.59s/it] {'loss': 0.6501, 'grad_norm': 4.793988248982188, 'learning_rate': 4.95124693150638e-06, 'epoch': 0.09} 9%|▉ | 1122/12313 [50:35<8:03:35, 2.59s/it] 9%|▉ | 1123/12313 [50:38<8:10:44, 2.63s/it] {'loss': 0.507, 'grad_norm': 4.303548964129903, 'learning_rate': 4.951117607367281e-06, 'epoch': 0.09} 9%|▉ | 1123/12313 [50:38<8:10:44, 2.63s/it] 9%|▉ | 1124/12313 [50:40<8:06:45, 2.61s/it] {'loss': 0.647, 'grad_norm': 3.9997657093256613, 'learning_rate': 4.9509881136238144e-06, 'epoch': 0.09} 9%|▉ | 1124/12313 [50:40<8:06:45, 2.61s/it] 9%|▉ | 1125/12313 [50:43<7:58:07, 2.56s/it] {'loss': 0.7264, 'grad_norm': 6.458649539085403, 'learning_rate': 4.950858450284943e-06, 'epoch': 0.09} 9%|▉ | 1125/12313 [50:43<7:58:07, 2.56s/it] 9%|▉ | 1126/12313 [50:45<8:05:02, 2.60s/it] {'loss': 0.7207, 'grad_norm': 8.005259138006465, 'learning_rate': 4.950728617359637e-06, 'epoch': 0.09} 9%|▉ | 1126/12313 [50:45<8:05:02, 2.60s/it] 9%|▉ | 1127/12313 [50:48<8:30:18, 2.74s/it] {'loss': 0.7295, 'grad_norm': 10.787406113255107, 'learning_rate': 4.950598614856882e-06, 'epoch': 0.09} 9%|▉ | 1127/12313 [50:48<8:30:18, 2.74s/it] 9%|▉ | 1128/12313 [50:51<8:29:05, 2.73s/it] {'loss': 0.7061, 'grad_norm': 7.384428417264019, 'learning_rate': 4.950468442785672e-06, 'epoch': 0.09} 9%|▉ | 1128/12313 [50:51<8:29:05, 2.73s/it] 9%|▉ | 1129/12313 [50:54<8:13:53, 2.65s/it] {'loss': 0.5785, 'grad_norm': 6.688743726208685, 'learning_rate': 4.9503381011550145e-06, 'epoch': 0.09} 9%|▉ | 1129/12313 [50:54<8:13:53, 2.65s/it] 9%|▉ | 1130/12313 [50:56<8:28:27, 2.73s/it] {'loss': 0.6196, 'grad_norm': 7.409666261351277, 'learning_rate': 4.950207589973929e-06, 'epoch': 0.09} 9%|▉ | 1130/12313 [50:56<8:28:27, 2.73s/it] 9%|▉ | 1131/12313 [51:00<8:48:26, 2.84s/it] {'loss': 0.6793, 'grad_norm': 4.151052125921888, 'learning_rate': 4.950076909251445e-06, 'epoch': 0.09} 9%|▉ | 1131/12313 [51:00<8:48:26, 2.84s/it] 9%|▉ | 1132/12313 [51:02<8:38:32, 2.78s/it] {'loss': 0.7033, 'grad_norm': 6.945429669541111, 'learning_rate': 4.949946058996606e-06, 'epoch': 0.09} 9%|▉ | 1132/12313 [51:02<8:38:32, 2.78s/it] 9%|▉ | 1133/12313 [51:05<8:32:37, 2.75s/it] {'loss': 0.673, 'grad_norm': 5.827624293232605, 'learning_rate': 4.949815039218467e-06, 'epoch': 0.09} 9%|▉ | 1133/12313 [51:05<8:32:37, 2.75s/it] 9%|▉ | 1134/12313 [51:08<8:29:07, 2.73s/it] {'loss': 0.5155, 'grad_norm': 7.170155218321833, 'learning_rate': 4.949683849926092e-06, 'epoch': 0.09} 9%|▉ | 1134/12313 [51:08<8:29:07, 2.73s/it] 9%|▉ | 1135/12313 [51:10<8:31:14, 2.74s/it] {'loss': 0.4783, 'grad_norm': 4.143518961770841, 'learning_rate': 4.949552491128559e-06, 'epoch': 0.09} 9%|▉ | 1135/12313 [51:10<8:31:14, 2.74s/it] 9%|▉ | 1136/12313 [51:13<8:15:02, 2.66s/it] {'loss': 0.7102, 'grad_norm': 4.759778607114406, 'learning_rate': 4.9494209628349585e-06, 'epoch': 0.09} 9%|▉ | 1136/12313 [51:13<8:15:02, 2.66s/it] 9%|▉ | 1137/12313 [51:16<8:30:30, 2.74s/it] {'loss': 0.7196, 'grad_norm': 4.218386626424805, 'learning_rate': 4.94928926505439e-06, 'epoch': 0.09} 9%|▉ | 1137/12313 [51:16<8:30:30, 2.74s/it] 9%|▉ | 1138/12313 [51:18<8:28:04, 2.73s/it] {'loss': 0.5415, 'grad_norm': 3.168695853330223, 'learning_rate': 4.949157397795967e-06, 'epoch': 0.09} 9%|▉ | 1138/12313 [51:18<8:28:04, 2.73s/it] 9%|▉ | 1139/12313 [51:21<8:19:56, 2.68s/it] {'loss': 0.7162, 'grad_norm': 3.826938431817392, 'learning_rate': 4.949025361068814e-06, 'epoch': 0.09} 9%|▉ | 1139/12313 [51:21<8:19:56, 2.68s/it] 9%|▉ | 1140/12313 [51:24<8:12:24, 2.64s/it] {'loss': 0.5282, 'grad_norm': 6.290926748077408, 'learning_rate': 4.9488931548820685e-06, 'epoch': 0.09} 9%|▉ | 1140/12313 [51:24<8:12:24, 2.64s/it] 9%|▉ | 1141/12313 [51:26<8:16:02, 2.66s/it] {'loss': 0.5308, 'grad_norm': 7.032750557333736, 'learning_rate': 4.9487607792448765e-06, 'epoch': 0.09} 9%|▉ | 1141/12313 [51:26<8:16:02, 2.66s/it] 9%|▉ | 1142/12313 [51:29<8:17:14, 2.67s/it] {'loss': 0.5358, 'grad_norm': 5.947078730213946, 'learning_rate': 4.948628234166398e-06, 'epoch': 0.09} 9%|▉ | 1142/12313 [51:29<8:17:14, 2.67s/it] 9%|▉ | 1143/12313 [51:32<8:17:00, 2.67s/it] {'loss': 0.5523, 'grad_norm': 5.843171180813921, 'learning_rate': 4.948495519655805e-06, 'epoch': 0.09} 9%|▉ | 1143/12313 [51:32<8:17:00, 2.67s/it] 9%|▉ | 1144/12313 [51:35<8:33:32, 2.76s/it] {'loss': 0.6363, 'grad_norm': 4.227961968923302, 'learning_rate': 4.948362635722281e-06, 'epoch': 0.09} 9%|▉ | 1144/12313 [51:35<8:33:32, 2.76s/it] 9%|▉ | 1145/12313 [51:37<8:22:17, 2.70s/it] {'loss': 0.5243, 'grad_norm': 7.599291580837134, 'learning_rate': 4.948229582375021e-06, 'epoch': 0.09} 9%|▉ | 1145/12313 [51:37<8:22:17, 2.70s/it] 9%|▉ | 1146/12313 [51:40<8:23:53, 2.71s/it] {'loss': 0.5535, 'grad_norm': 6.482980832600019, 'learning_rate': 4.948096359623229e-06, 'epoch': 0.09} 9%|▉ | 1146/12313 [51:40<8:23:53, 2.71s/it] 9%|▉ | 1147/12313 [51:43<8:18:57, 2.68s/it] {'loss': 0.603, 'grad_norm': 8.221446398590253, 'learning_rate': 4.9479629674761265e-06, 'epoch': 0.09} 9%|▉ | 1147/12313 [51:43<8:18:57, 2.68s/it] 9%|▉ | 1148/12313 [51:45<8:14:47, 2.66s/it] {'loss': 0.667, 'grad_norm': 3.448295224589768, 'learning_rate': 4.947829405942942e-06, 'epoch': 0.09} 9%|▉ | 1148/12313 [51:45<8:14:47, 2.66s/it] 9%|▉ | 1149/12313 [51:48<8:17:06, 2.67s/it] {'loss': 0.5841, 'grad_norm': 3.719043167127091, 'learning_rate': 4.947695675032919e-06, 'epoch': 0.09} 9%|▉ | 1149/12313 [51:48<8:17:06, 2.67s/it] 9%|▉ | 1150/12313 [51:51<8:45:28, 2.82s/it] {'loss': 0.6107, 'grad_norm': 4.156557050698605, 'learning_rate': 4.947561774755307e-06, 'epoch': 0.09} 9%|▉ | 1150/12313 [51:51<8:45:28, 2.82s/it] 9%|▉ | 1151/12313 [51:54<8:34:09, 2.76s/it] {'loss': 0.4772, 'grad_norm': 4.192630167978793, 'learning_rate': 4.947427705119375e-06, 'epoch': 0.09} 9%|▉ | 1151/12313 [51:54<8:34:09, 2.76s/it] 9%|▉ | 1152/12313 [51:56<8:14:27, 2.66s/it] {'loss': 0.6472, 'grad_norm': 5.508607665282113, 'learning_rate': 4.947293466134399e-06, 'epoch': 0.09} 9%|▉ | 1152/12313 [51:56<8:14:27, 2.66s/it] 9%|▉ | 1153/12313 [51:59<8:04:43, 2.61s/it] {'loss': 0.5252, 'grad_norm': 6.1116074527140185, 'learning_rate': 4.947159057809668e-06, 'epoch': 0.09} 9%|▉ | 1153/12313 [51:59<8:04:43, 2.61s/it] 9%|▉ | 1154/12313 [52:01<8:19:49, 2.69s/it] {'loss': 0.6287, 'grad_norm': 4.348443564825216, 'learning_rate': 4.9470244801544794e-06, 'epoch': 0.09} 9%|▉ | 1154/12313 [52:01<8:19:49, 2.69s/it] 9%|▉ | 1155/12313 [52:04<8:13:51, 2.66s/it] {'loss': 0.6383, 'grad_norm': 4.398778822812722, 'learning_rate': 4.94688973317815e-06, 'epoch': 0.09} 9%|▉ | 1155/12313 [52:04<8:13:51, 2.66s/it] 9%|▉ | 1156/12313 [52:07<8:57:53, 2.89s/it] {'loss': 0.6607, 'grad_norm': 8.481255729214475, 'learning_rate': 4.946754816889999e-06, 'epoch': 0.09} 9%|▉ | 1156/12313 [52:07<8:57:53, 2.89s/it] 9%|▉ | 1157/12313 [52:10<8:37:15, 2.78s/it] {'loss': 0.5745, 'grad_norm': 8.572467423080703, 'learning_rate': 4.946619731299365e-06, 'epoch': 0.09} 9%|▉ | 1157/12313 [52:10<8:37:15, 2.78s/it] 9%|▉ | 1158/12313 [52:13<8:25:37, 2.72s/it] {'loss': 0.5236, 'grad_norm': 7.588248988044493, 'learning_rate': 4.946484476415593e-06, 'epoch': 0.09} 9%|▉ | 1158/12313 [52:13<8:25:37, 2.72s/it] 9%|▉ | 1159/12313 [52:15<8:19:47, 2.69s/it] {'loss': 0.4852, 'grad_norm': 6.803366010970415, 'learning_rate': 4.946349052248044e-06, 'epoch': 0.09} 9%|▉ | 1159/12313 [52:15<8:19:47, 2.69s/it] 9%|▉ | 1160/12313 [52:18<8:11:47, 2.65s/it] {'loss': 0.4897, 'grad_norm': 3.614463840238927, 'learning_rate': 4.946213458806088e-06, 'epoch': 0.09} 9%|▉ | 1160/12313 [52:18<8:11:47, 2.65s/it] 9%|▉ | 1161/12313 [52:20<8:18:06, 2.68s/it] {'loss': 0.6462, 'grad_norm': 4.806863459042568, 'learning_rate': 4.946077696099107e-06, 'epoch': 0.09} 9%|▉ | 1161/12313 [52:20<8:18:06, 2.68s/it] 9%|▉ | 1162/12313 [52:23<8:19:20, 2.69s/it] {'loss': 0.5871, 'grad_norm': 5.367629362476872, 'learning_rate': 4.945941764136494e-06, 'epoch': 0.09} 9%|▉ | 1162/12313 [52:23<8:19:20, 2.69s/it] 9%|▉ | 1163/12313 [52:26<8:47:22, 2.84s/it] {'loss': 0.5799, 'grad_norm': 3.4996137215165826, 'learning_rate': 4.945805662927657e-06, 'epoch': 0.09} 9%|▉ | 1163/12313 [52:26<8:47:22, 2.84s/it] 9%|▉ | 1164/12313 [52:29<8:37:16, 2.78s/it] {'loss': 0.672, 'grad_norm': 5.909531771722533, 'learning_rate': 4.9456693924820124e-06, 'epoch': 0.09} 9%|▉ | 1164/12313 [52:29<8:37:16, 2.78s/it] 9%|▉ | 1165/12313 [52:32<8:51:24, 2.86s/it] {'loss': 0.5327, 'grad_norm': 11.436424144343768, 'learning_rate': 4.945532952808989e-06, 'epoch': 0.09} 9%|▉ | 1165/12313 [52:32<8:51:24, 2.86s/it] 9%|▉ | 1166/12313 [52:35<8:40:19, 2.80s/it] {'loss': 0.6624, 'grad_norm': 3.885963408729381, 'learning_rate': 4.945396343918027e-06, 'epoch': 0.09} 9%|▉ | 1166/12313 [52:35<8:40:19, 2.80s/it] 9%|▉ | 1167/12313 [52:37<8:36:55, 2.78s/it] {'loss': 0.5427, 'grad_norm': 6.229973134745262, 'learning_rate': 4.945259565818582e-06, 'epoch': 0.09} 9%|▉ | 1167/12313 [52:37<8:36:55, 2.78s/it] 9%|▉ | 1168/12313 [52:40<8:32:29, 2.76s/it] {'loss': 0.5436, 'grad_norm': 4.17803636708481, 'learning_rate': 4.9451226185201155e-06, 'epoch': 0.09} 9%|▉ | 1168/12313 [52:40<8:32:29, 2.76s/it] 9%|▉ | 1169/12313 [52:43<8:14:38, 2.66s/it] {'loss': 0.6648, 'grad_norm': 3.7870816927723228, 'learning_rate': 4.9449855020321045e-06, 'epoch': 0.09} 9%|▉ | 1169/12313 [52:43<8:14:38, 2.66s/it] 10%|▉ | 1170/12313 [52:45<8:24:08, 2.71s/it] {'loss': 0.6312, 'grad_norm': 5.063697581248885, 'learning_rate': 4.944848216364036e-06, 'epoch': 0.1} 10%|▉ | 1170/12313 [52:45<8:24:08, 2.71s/it] 10%|▉ | 1171/12313 [52:48<8:26:18, 2.73s/it] {'loss': 0.62, 'grad_norm': 7.690605712577253, 'learning_rate': 4.944710761525411e-06, 'epoch': 0.1} 10%|▉ | 1171/12313 [52:48<8:26:18, 2.73s/it] 10%|▉ | 1172/12313 [52:51<8:20:23, 2.69s/it] {'loss': 0.5703, 'grad_norm': 5.020085939206364, 'learning_rate': 4.94457313752574e-06, 'epoch': 0.1} 10%|▉ | 1172/12313 [52:51<8:20:23, 2.69s/it] 10%|▉ | 1173/12313 [52:53<8:10:36, 2.64s/it] {'loss': 0.7024, 'grad_norm': 4.5490891920736525, 'learning_rate': 4.944435344374544e-06, 'epoch': 0.1} 10%|▉ | 1173/12313 [52:53<8:10:36, 2.64s/it] 10%|▉ | 1174/12313 [52:56<8:08:42, 2.63s/it] {'loss': 0.5042, 'grad_norm': 5.913707485877068, 'learning_rate': 4.944297382081361e-06, 'epoch': 0.1} 10%|▉ | 1174/12313 [52:56<8:08:42, 2.63s/it] 10%|▉ | 1175/12313 [52:59<8:32:33, 2.76s/it] {'loss': 0.7242, 'grad_norm': 3.759445496054164, 'learning_rate': 4.944159250655734e-06, 'epoch': 0.1} 10%|▉ | 1175/12313 [52:59<8:32:33, 2.76s/it] 10%|▉ | 1176/12313 [53:02<8:27:35, 2.73s/it] {'loss': 0.6527, 'grad_norm': 4.721355611086034, 'learning_rate': 4.944020950107224e-06, 'epoch': 0.1} 10%|▉ | 1176/12313 [53:02<8:27:35, 2.73s/it] 10%|▉ | 1177/12313 [53:04<8:15:04, 2.67s/it] {'loss': 0.6651, 'grad_norm': 4.718530932065967, 'learning_rate': 4.943882480445398e-06, 'epoch': 0.1} 10%|▉ | 1177/12313 [53:04<8:15:04, 2.67s/it] 10%|▉ | 1178/12313 [53:07<8:33:01, 2.76s/it] {'loss': 0.4774, 'grad_norm': 3.7323028514817764, 'learning_rate': 4.943743841679839e-06, 'epoch': 0.1} 10%|▉ | 1178/12313 [53:07<8:33:01, 2.76s/it] 10%|▉ | 1179/12313 [53:10<8:22:14, 2.71s/it] {'loss': 0.5687, 'grad_norm': 7.764398624185714, 'learning_rate': 4.943605033820138e-06, 'epoch': 0.1} 10%|▉ | 1179/12313 [53:10<8:22:14, 2.71s/it] 10%|▉ | 1180/12313 [53:13<8:40:46, 2.81s/it] {'loss': 0.6266, 'grad_norm': 5.7460830112551715, 'learning_rate': 4.943466056875903e-06, 'epoch': 0.1} 10%|▉ | 1180/12313 [53:13<8:40:46, 2.81s/it] 10%|▉ | 1181/12313 [53:15<8:25:26, 2.72s/it] {'loss': 0.9285, 'grad_norm': 6.12456544392407, 'learning_rate': 4.943326910856749e-06, 'epoch': 0.1} 10%|▉ | 1181/12313 [53:15<8:25:26, 2.72s/it] 10%|▉ | 1182/12313 [53:18<8:28:15, 2.74s/it] {'loss': 0.572, 'grad_norm': 5.9339500045832505, 'learning_rate': 4.943187595772302e-06, 'epoch': 0.1} 10%|▉ | 1182/12313 [53:18<8:28:15, 2.74s/it] 10%|▉ | 1183/12313 [53:21<8:29:11, 2.74s/it] {'loss': 0.7426, 'grad_norm': 5.327929545992472, 'learning_rate': 4.943048111632205e-06, 'epoch': 0.1} 10%|▉ | 1183/12313 [53:21<8:29:11, 2.74s/it] 10%|▉ | 1184/12313 [53:23<8:19:28, 2.69s/it] {'loss': 0.5256, 'grad_norm': 4.422671758582798, 'learning_rate': 4.942908458446107e-06, 'epoch': 0.1} 10%|▉ | 1184/12313 [53:23<8:19:28, 2.69s/it] 10%|▉ | 1185/12313 [53:26<8:19:09, 2.69s/it] {'loss': 0.6544, 'grad_norm': 4.210710659594965, 'learning_rate': 4.942768636223674e-06, 'epoch': 0.1} 10%|▉ | 1185/12313 [53:26<8:19:09, 2.69s/it] 10%|▉ | 1186/12313 [53:29<8:38:49, 2.80s/it] {'loss': 0.6285, 'grad_norm': 5.86594483722807, 'learning_rate': 4.94262864497458e-06, 'epoch': 0.1} 10%|▉ | 1186/12313 [53:29<8:38:49, 2.80s/it] 10%|▉ | 1187/12313 [53:32<8:42:08, 2.82s/it] {'loss': 0.5907, 'grad_norm': 11.95231315351679, 'learning_rate': 4.94248848470851e-06, 'epoch': 0.1} 10%|▉ | 1187/12313 [53:32<8:42:08, 2.82s/it] 10%|▉ | 1188/12313 [53:35<8:34:28, 2.77s/it] {'loss': 0.5891, 'grad_norm': 4.610476439616043, 'learning_rate': 4.9423481554351636e-06, 'epoch': 0.1} 10%|▉ | 1188/12313 [53:35<8:34:28, 2.77s/it] 10%|▉ | 1189/12313 [53:37<8:33:52, 2.77s/it] {'loss': 0.6007, 'grad_norm': 15.02936414694867, 'learning_rate': 4.9422076571642516e-06, 'epoch': 0.1} 10%|▉ | 1189/12313 [53:37<8:33:52, 2.77s/it] 10%|▉ | 1190/12313 [53:40<8:13:58, 2.66s/it] {'loss': 0.5263, 'grad_norm': 7.408105632703543, 'learning_rate': 4.942066989905494e-06, 'epoch': 0.1} 10%|▉ | 1190/12313 [53:40<8:13:58, 2.66s/it] 10%|▉ | 1191/12313 [53:42<8:10:45, 2.65s/it] {'loss': 0.6209, 'grad_norm': 6.6983744546854105, 'learning_rate': 4.941926153668626e-06, 'epoch': 0.1} 10%|▉ | 1191/12313 [53:42<8:10:45, 2.65s/it] 10%|▉ | 1192/12313 [53:45<8:08:56, 2.64s/it] {'loss': 0.5479, 'grad_norm': 12.37964028499769, 'learning_rate': 4.941785148463391e-06, 'epoch': 0.1} 10%|▉ | 1192/12313 [53:45<8:08:56, 2.64s/it] 10%|▉ | 1193/12313 [53:48<8:05:29, 2.62s/it] {'loss': 0.6356, 'grad_norm': 7.949353882372964, 'learning_rate': 4.941643974299547e-06, 'epoch': 0.1} 10%|▉ | 1193/12313 [53:48<8:05:29, 2.62s/it] 10%|▉ | 1194/12313 [53:50<8:01:35, 2.60s/it] {'loss': 0.5876, 'grad_norm': 12.259554197141354, 'learning_rate': 4.941502631186863e-06, 'epoch': 0.1} 10%|▉ | 1194/12313 [53:50<8:01:35, 2.60s/it] 10%|▉ | 1195/12313 [53:53<7:58:15, 2.58s/it] {'loss': 0.6081, 'grad_norm': 4.5151134917162254, 'learning_rate': 4.941361119135118e-06, 'epoch': 0.1} 10%|▉ | 1195/12313 [53:53<7:58:15, 2.58s/it] 10%|▉ | 1196/12313 [53:55<7:56:39, 2.57s/it] {'loss': 0.7848, 'grad_norm': 5.027871122019048, 'learning_rate': 4.941219438154103e-06, 'epoch': 0.1} 10%|▉ | 1196/12313 [53:55<7:56:39, 2.57s/it] 10%|▉ | 1197/12313 [53:58<7:53:13, 2.55s/it] {'loss': 0.6873, 'grad_norm': 6.044100876833535, 'learning_rate': 4.941077588253624e-06, 'epoch': 0.1} 10%|▉ | 1197/12313 [53:58<7:53:13, 2.55s/it] 10%|▉ | 1198/12313 [54:00<7:54:53, 2.56s/it] {'loss': 0.5557, 'grad_norm': 5.889644758237491, 'learning_rate': 4.940935569443496e-06, 'epoch': 0.1} 10%|▉ | 1198/12313 [54:00<7:54:53, 2.56s/it] 10%|▉ | 1199/12313 [54:03<7:50:28, 2.54s/it] {'loss': 0.547, 'grad_norm': 15.297541275971485, 'learning_rate': 4.940793381733544e-06, 'epoch': 0.1} 10%|▉ | 1199/12313 [54:03<7:50:28, 2.54s/it] 10%|▉ | 1200/12313 [54:05<7:51:33, 2.55s/it] {'loss': 0.7011, 'grad_norm': 12.207281968202853, 'learning_rate': 4.940651025133607e-06, 'epoch': 0.1} 10%|▉ | 1200/12313 [54:05<7:51:33, 2.55s/it] 10%|▉ | 1201/12313 [54:08<7:50:28, 2.54s/it] {'loss': 0.7094, 'grad_norm': 5.195830064818431, 'learning_rate': 4.9405084996535376e-06, 'epoch': 0.1} 10%|▉ | 1201/12313 [54:08<7:50:28, 2.54s/it] 10%|▉ | 1202/12313 [54:10<7:43:49, 2.50s/it] {'loss': 0.5373, 'grad_norm': 4.519169304548104, 'learning_rate': 4.940365805303195e-06, 'epoch': 0.1} 10%|▉ | 1202/12313 [54:10<7:43:49, 2.50s/it] 10%|▉ | 1203/12313 [54:13<7:55:29, 2.57s/it] {'loss': 0.5373, 'grad_norm': 4.18578492619036, 'learning_rate': 4.940222942092455e-06, 'epoch': 0.1} 10%|▉ | 1203/12313 [54:13<7:55:29, 2.57s/it] 10%|▉ | 1204/12313 [54:16<8:07:35, 2.63s/it] {'loss': 0.6525, 'grad_norm': 5.871433014218688, 'learning_rate': 4.940079910031201e-06, 'epoch': 0.1} 10%|▉ | 1204/12313 [54:16<8:07:35, 2.63s/it] 10%|▉ | 1205/12313 [54:18<8:04:25, 2.62s/it] {'loss': 0.6541, 'grad_norm': 7.770798201538124, 'learning_rate': 4.939936709129333e-06, 'epoch': 0.1} 10%|▉ | 1205/12313 [54:18<8:04:25, 2.62s/it] 10%|▉ | 1206/12313 [54:21<8:03:18, 2.61s/it] {'loss': 0.6166, 'grad_norm': 10.00163900865208, 'learning_rate': 4.939793339396756e-06, 'epoch': 0.1} 10%|▉ | 1206/12313 [54:21<8:03:18, 2.61s/it] 10%|▉ | 1207/12313 [54:24<8:12:19, 2.66s/it] {'loss': 0.5834, 'grad_norm': 6.793365007116543, 'learning_rate': 4.939649800843394e-06, 'epoch': 0.1} 10%|▉ | 1207/12313 [54:24<8:12:19, 2.66s/it] 10%|▉ | 1208/12313 [54:27<8:13:43, 2.67s/it] {'loss': 0.654, 'grad_norm': 3.8998039709967927, 'learning_rate': 4.939506093479176e-06, 'epoch': 0.1} 10%|▉ | 1208/12313 [54:27<8:13:43, 2.67s/it] 10%|▉ | 1209/12313 [54:29<8:12:21, 2.66s/it] {'loss': 0.5947, 'grad_norm': 3.5977727046620473, 'learning_rate': 4.939362217314048e-06, 'epoch': 0.1} 10%|▉ | 1209/12313 [54:29<8:12:21, 2.66s/it] 10%|▉ | 1210/12313 [54:32<8:00:40, 2.60s/it] {'loss': 0.4971, 'grad_norm': 3.430672786433987, 'learning_rate': 4.939218172357965e-06, 'epoch': 0.1} 10%|▉ | 1210/12313 [54:32<8:00:40, 2.60s/it] 10%|▉ | 1211/12313 [54:34<8:05:26, 2.62s/it] {'loss': 0.6256, 'grad_norm': 4.703232976940314, 'learning_rate': 4.9390739586208926e-06, 'epoch': 0.1} 10%|▉ | 1211/12313 [54:34<8:05:26, 2.62s/it] 10%|▉ | 1212/12313 [54:37<8:18:49, 2.70s/it] {'loss': 0.6425, 'grad_norm': 8.889711877291688, 'learning_rate': 4.938929576112812e-06, 'epoch': 0.1} 10%|▉ | 1212/12313 [54:37<8:18:49, 2.70s/it] 10%|▉ | 1213/12313 [54:40<8:16:53, 2.69s/it] {'loss': 0.5402, 'grad_norm': 5.215270092203213, 'learning_rate': 4.938785024843712e-06, 'epoch': 0.1} 10%|▉ | 1213/12313 [54:40<8:16:53, 2.69s/it] 10%|▉ | 1214/12313 [54:42<8:10:31, 2.65s/it] {'loss': 0.4592, 'grad_norm': 4.480490370978996, 'learning_rate': 4.938640304823596e-06, 'epoch': 0.1} 10%|▉ | 1214/12313 [54:42<8:10:31, 2.65s/it] 10%|▉ | 1215/12313 [54:45<8:00:19, 2.60s/it] {'loss': 0.5733, 'grad_norm': 4.5775356603291995, 'learning_rate': 4.938495416062477e-06, 'epoch': 0.1} 10%|▉ | 1215/12313 [54:45<8:00:19, 2.60s/it] 10%|▉ | 1216/12313 [54:48<8:02:23, 2.61s/it] {'loss': 0.4699, 'grad_norm': 5.374607118889353, 'learning_rate': 4.93835035857038e-06, 'epoch': 0.1} 10%|▉ | 1216/12313 [54:48<8:02:23, 2.61s/it] 10%|▉ | 1217/12313 [54:50<8:04:10, 2.62s/it] {'loss': 0.6582, 'grad_norm': 8.062779666258928, 'learning_rate': 4.938205132357344e-06, 'epoch': 0.1} 10%|▉ | 1217/12313 [54:50<8:04:10, 2.62s/it] 10%|▉ | 1218/12313 [54:53<8:11:11, 2.66s/it] {'loss': 0.4957, 'grad_norm': 5.276936843727084, 'learning_rate': 4.938059737433416e-06, 'epoch': 0.1} 10%|▉ | 1218/12313 [54:53<8:11:11, 2.66s/it] 10%|▉ | 1219/12313 [54:55<8:03:40, 2.62s/it] {'loss': 0.5664, 'grad_norm': 5.321609718799565, 'learning_rate': 4.9379141738086575e-06, 'epoch': 0.1} 10%|▉ | 1219/12313 [54:55<8:03:40, 2.62s/it] 10%|▉ | 1220/12313 [54:58<8:04:45, 2.62s/it] {'loss': 0.7467, 'grad_norm': 4.479754054431104, 'learning_rate': 4.9377684414931415e-06, 'epoch': 0.1} 10%|▉ | 1220/12313 [54:58<8:04:45, 2.62s/it] 10%|▉ | 1221/12313 [55:01<8:14:00, 2.67s/it] {'loss': 0.5793, 'grad_norm': 4.74529856971756, 'learning_rate': 4.937622540496951e-06, 'epoch': 0.1} 10%|▉ | 1221/12313 [55:01<8:14:00, 2.67s/it] 10%|▉ | 1222/12313 [55:03<7:58:03, 2.59s/it] {'loss': 0.6115, 'grad_norm': 3.911535284660022, 'learning_rate': 4.937476470830181e-06, 'epoch': 0.1} 10%|▉ | 1222/12313 [55:03<7:58:03, 2.59s/it] 10%|▉ | 1223/12313 [55:06<8:01:55, 2.61s/it] {'loss': 0.5927, 'grad_norm': 4.166632824497643, 'learning_rate': 4.937330232502939e-06, 'epoch': 0.1} 10%|▉ | 1223/12313 [55:06<8:01:55, 2.61s/it] 10%|▉ | 1224/12313 [55:09<8:09:32, 2.65s/it] {'loss': 0.9838, 'grad_norm': 5.587700034937294, 'learning_rate': 4.937183825525346e-06, 'epoch': 0.1} 10%|▉ | 1224/12313 [55:09<8:09:32, 2.65s/it] 10%|▉ | 1225/12313 [55:11<8:04:39, 2.62s/it] {'loss': 0.6088, 'grad_norm': 4.01441412316951, 'learning_rate': 4.937037249907529e-06, 'epoch': 0.1} 10%|▉ | 1225/12313 [55:11<8:04:39, 2.62s/it] 10%|▉ | 1226/12313 [55:14<8:10:57, 2.66s/it] {'loss': 0.8051, 'grad_norm': 5.422702256253332, 'learning_rate': 4.9368905056596336e-06, 'epoch': 0.1} 10%|▉ | 1226/12313 [55:14<8:10:57, 2.66s/it] 10%|▉ | 1227/12313 [55:17<8:09:37, 2.65s/it] {'loss': 0.8022, 'grad_norm': 4.730421753529755, 'learning_rate': 4.936743592791812e-06, 'epoch': 0.1} 10%|▉ | 1227/12313 [55:17<8:09:37, 2.65s/it] 10%|▉ | 1228/12313 [55:19<8:12:44, 2.67s/it] {'loss': 0.7359, 'grad_norm': 6.417481219598529, 'learning_rate': 4.936596511314229e-06, 'epoch': 0.1} 10%|▉ | 1228/12313 [55:19<8:12:44, 2.67s/it] 10%|▉ | 1229/12313 [55:22<8:13:48, 2.67s/it] {'loss': 0.4835, 'grad_norm': 5.936442260399057, 'learning_rate': 4.936449261237064e-06, 'epoch': 0.1} 10%|▉ | 1229/12313 [55:22<8:13:48, 2.67s/it] 10%|▉ | 1230/12313 [55:25<8:09:55, 2.65s/it] {'loss': 0.6098, 'grad_norm': 5.161367234189236, 'learning_rate': 4.936301842570505e-06, 'epoch': 0.1} 10%|▉ | 1230/12313 [55:25<8:09:55, 2.65s/it] 10%|▉ | 1231/12313 [55:27<8:04:16, 2.62s/it] {'loss': 0.5208, 'grad_norm': 6.171812579141409, 'learning_rate': 4.936154255324751e-06, 'epoch': 0.1} 10%|▉ | 1231/12313 [55:27<8:04:16, 2.62s/it] 10%|█ | 1232/12313 [55:30<8:10:47, 2.66s/it] {'loss': 0.6272, 'grad_norm': 6.898043493723934, 'learning_rate': 4.936006499510016e-06, 'epoch': 0.1} 10%|█ | 1232/12313 [55:30<8:10:47, 2.66s/it] 10%|█ | 1233/12313 [55:32<8:03:00, 2.62s/it] {'loss': 0.6761, 'grad_norm': 5.279799808361505, 'learning_rate': 4.935858575136525e-06, 'epoch': 0.1} 10%|█ | 1233/12313 [55:32<8:03:00, 2.62s/it] 10%|█ | 1234/12313 [55:35<8:01:31, 2.61s/it] {'loss': 0.5666, 'grad_norm': 6.1780863372983, 'learning_rate': 4.935710482214512e-06, 'epoch': 0.1} 10%|█ | 1234/12313 [55:35<8:01:31, 2.61s/it] 10%|█ | 1235/12313 [55:37<7:48:53, 2.54s/it] {'loss': 0.7762, 'grad_norm': 4.127005405020764, 'learning_rate': 4.935562220754224e-06, 'epoch': 0.1} 10%|█ | 1235/12313 [55:37<7:48:53, 2.54s/it] 10%|█ | 1236/12313 [55:40<7:56:02, 2.58s/it] {'loss': 0.5601, 'grad_norm': 6.939945712344684, 'learning_rate': 4.935413790765919e-06, 'epoch': 0.1} 10%|█ | 1236/12313 [55:40<7:56:02, 2.58s/it] 10%|█ | 1237/12313 [55:43<8:02:23, 2.61s/it] {'loss': 0.5489, 'grad_norm': 5.4373785364212965, 'learning_rate': 4.935265192259871e-06, 'epoch': 0.1} 10%|█ | 1237/12313 [55:43<8:02:23, 2.61s/it] 10%|█ | 1238/12313 [55:46<8:24:51, 2.74s/it] {'loss': 0.6456, 'grad_norm': 3.19887214936069, 'learning_rate': 4.935116425246359e-06, 'epoch': 0.1} 10%|█ | 1238/12313 [55:46<8:24:51, 2.74s/it] 10%|█ | 1239/12313 [55:49<8:29:14, 2.76s/it] {'loss': 0.5061, 'grad_norm': 7.543707007832977, 'learning_rate': 4.934967489735679e-06, 'epoch': 0.1} 10%|█ | 1239/12313 [55:49<8:29:14, 2.76s/it] 10%|█ | 1240/12313 [55:51<8:13:34, 2.67s/it] {'loss': 0.6719, 'grad_norm': 4.008848618860738, 'learning_rate': 4.934818385738135e-06, 'epoch': 0.1} 10%|█ | 1240/12313 [55:51<8:13:34, 2.67s/it] 10%|█ | 1241/12313 [55:54<8:29:13, 2.76s/it] {'loss': 0.6852, 'grad_norm': 4.006099731057698, 'learning_rate': 4.934669113264044e-06, 'epoch': 0.1} 10%|█ | 1241/12313 [55:54<8:29:13, 2.76s/it] 10%|█ | 1242/12313 [55:57<8:26:31, 2.75s/it] {'loss': 0.5916, 'grad_norm': 8.793769739342025, 'learning_rate': 4.934519672323737e-06, 'epoch': 0.1} 10%|█ | 1242/12313 [55:57<8:26:31, 2.75s/it] 10%|█ | 1243/12313 [55:59<8:17:11, 2.69s/it] {'loss': 0.4989, 'grad_norm': 5.1218716406597835, 'learning_rate': 4.9343700629275525e-06, 'epoch': 0.1} 10%|█ | 1243/12313 [55:59<8:17:11, 2.69s/it] 10%|█ | 1244/12313 [56:02<8:08:47, 2.65s/it] {'loss': 0.5374, 'grad_norm': 4.3513298577507245, 'learning_rate': 4.934220285085843e-06, 'epoch': 0.1} 10%|█ | 1244/12313 [56:02<8:08:47, 2.65s/it] 10%|█ | 1245/12313 [56:04<7:56:50, 2.58s/it] {'loss': 0.4365, 'grad_norm': 5.858277317975228, 'learning_rate': 4.934070338808974e-06, 'epoch': 0.1} 10%|█ | 1245/12313 [56:04<7:56:50, 2.58s/it] 10%|█ | 1246/12313 [56:07<7:55:29, 2.58s/it] {'loss': 0.6175, 'grad_norm': 4.971141537823225, 'learning_rate': 4.933920224107319e-06, 'epoch': 0.1} 10%|█ | 1246/12313 [56:07<7:55:29, 2.58s/it] 10%|█ | 1247/12313 [56:09<7:54:09, 2.57s/it] {'loss': 0.6484, 'grad_norm': 7.572976492490089, 'learning_rate': 4.933769940991266e-06, 'epoch': 0.1} 10%|█ | 1247/12313 [56:09<7:54:09, 2.57s/it] 10%|█ | 1248/12313 [56:12<7:53:49, 2.57s/it] {'loss': 0.6078, 'grad_norm': 7.466443178328625, 'learning_rate': 4.933619489471213e-06, 'epoch': 0.1} 10%|█ | 1248/12313 [56:12<7:53:49, 2.57s/it] 10%|█ | 1249/12313 [56:15<8:06:57, 2.64s/it] {'loss': 0.672, 'grad_norm': 3.837978031503154, 'learning_rate': 4.933468869557572e-06, 'epoch': 0.1} 10%|█ | 1249/12313 [56:15<8:06:57, 2.64s/it] 10%|█ | 1250/12313 [56:17<8:03:14, 2.62s/it] {'loss': 0.6828, 'grad_norm': 4.587774401604302, 'learning_rate': 4.933318081260763e-06, 'epoch': 0.1} 10%|█ | 1250/12313 [56:17<8:03:14, 2.62s/it] 10%|█ | 1251/12313 [56:20<8:06:59, 2.64s/it] {'loss': 0.5304, 'grad_norm': 5.728533677177494, 'learning_rate': 4.933167124591222e-06, 'epoch': 0.1} 10%|█ | 1251/12313 [56:20<8:06:59, 2.64s/it] 10%|█ | 1252/12313 [56:22<7:55:11, 2.58s/it] {'loss': 0.6125, 'grad_norm': 7.693599571430989, 'learning_rate': 4.9330159995593926e-06, 'epoch': 0.1} 10%|█ | 1252/12313 [56:22<7:55:11, 2.58s/it] 10%|█ | 1253/12313 [56:25<7:59:23, 2.60s/it] {'loss': 0.6306, 'grad_norm': 5.284936139752091, 'learning_rate': 4.9328647061757326e-06, 'epoch': 0.1} 10%|█ | 1253/12313 [56:25<7:59:23, 2.60s/it] 10%|█ | 1254/12313 [56:28<8:08:27, 2.65s/it] {'loss': 0.6503, 'grad_norm': 4.502661220996693, 'learning_rate': 4.932713244450712e-06, 'epoch': 0.1} 10%|█ | 1254/12313 [56:28<8:08:27, 2.65s/it] 10%|█ | 1255/12313 [56:30<8:05:39, 2.64s/it] {'loss': 0.4843, 'grad_norm': 30.435010504189158, 'learning_rate': 4.932561614394809e-06, 'epoch': 0.1} 10%|█ | 1255/12313 [56:30<8:05:39, 2.64s/it] 10%|█ | 1256/12313 [56:33<8:14:07, 2.68s/it] {'loss': 0.5335, 'grad_norm': 3.2680414669887226, 'learning_rate': 4.932409816018516e-06, 'epoch': 0.1} 10%|█ | 1256/12313 [56:33<8:14:07, 2.68s/it] 10%|█ | 1257/12313 [56:36<8:15:07, 2.69s/it] {'loss': 0.6274, 'grad_norm': 5.550536415470609, 'learning_rate': 4.932257849332337e-06, 'epoch': 0.1} 10%|█ | 1257/12313 [56:36<8:15:07, 2.69s/it] 10%|█ | 1258/12313 [56:39<8:14:13, 2.68s/it] {'loss': 0.7055, 'grad_norm': 6.490325484791516, 'learning_rate': 4.932105714346788e-06, 'epoch': 0.1} 10%|█ | 1258/12313 [56:39<8:14:13, 2.68s/it] 10%|█ | 1259/12313 [56:41<8:12:40, 2.67s/it] {'loss': 0.7677, 'grad_norm': 4.635391717749872, 'learning_rate': 4.931953411072395e-06, 'epoch': 0.1} 10%|█ | 1259/12313 [56:41<8:12:40, 2.67s/it] 10%|█ | 1260/12313 [56:44<8:08:51, 2.65s/it] {'loss': 0.6283, 'grad_norm': 5.213698736457178, 'learning_rate': 4.931800939519697e-06, 'epoch': 0.1} 10%|█ | 1260/12313 [56:44<8:08:51, 2.65s/it] 10%|█ | 1261/12313 [56:46<8:06:25, 2.64s/it] {'loss': 0.5411, 'grad_norm': 12.81670585201401, 'learning_rate': 4.931648299699245e-06, 'epoch': 0.1} 10%|█ | 1261/12313 [56:46<8:06:25, 2.64s/it] 10%|█ | 1262/12313 [56:49<8:23:51, 2.74s/it] {'loss': 0.6, 'grad_norm': 4.7979729375829425, 'learning_rate': 4.931495491621598e-06, 'epoch': 0.1} 10%|█ | 1262/12313 [56:49<8:23:51, 2.74s/it] 10%|█ | 1263/12313 [56:52<8:20:25, 2.72s/it] {'loss': 0.6029, 'grad_norm': 5.738885766711591, 'learning_rate': 4.931342515297333e-06, 'epoch': 0.1} 10%|█ | 1263/12313 [56:52<8:20:25, 2.72s/it] 10%|█ | 1264/12313 [56:55<8:10:43, 2.66s/it] {'loss': 0.6877, 'grad_norm': 4.5708865991196586, 'learning_rate': 4.931189370737033e-06, 'epoch': 0.1} 10%|█ | 1264/12313 [56:55<8:10:43, 2.66s/it] 10%|█ | 1265/12313 [56:57<8:08:08, 2.65s/it] {'loss': 0.6054, 'grad_norm': 4.292203823138975, 'learning_rate': 4.931036057951295e-06, 'epoch': 0.1} 10%|█ | 1265/12313 [56:57<8:08:08, 2.65s/it] 10%|█ | 1266/12313 [57:00<7:54:35, 2.58s/it] {'loss': 1.016, 'grad_norm': 5.094248884490348, 'learning_rate': 4.930882576950728e-06, 'epoch': 0.1} 10%|█ | 1266/12313 [57:00<7:54:35, 2.58s/it] 10%|█ | 1267/12313 [57:02<8:05:47, 2.64s/it] {'loss': 0.6266, 'grad_norm': 7.74744023190108, 'learning_rate': 4.930728927745954e-06, 'epoch': 0.1} 10%|█ | 1267/12313 [57:02<8:05:47, 2.64s/it] 10%|█ | 1268/12313 [57:05<8:10:27, 2.66s/it] {'loss': 0.5436, 'grad_norm': 7.178060412399686, 'learning_rate': 4.930575110347601e-06, 'epoch': 0.1} 10%|█ | 1268/12313 [57:05<8:10:27, 2.66s/it] 10%|█ | 1269/12313 [57:08<8:09:44, 2.66s/it] {'loss': 0.6069, 'grad_norm': 4.8955814375005, 'learning_rate': 4.9304211247663135e-06, 'epoch': 0.1} 10%|█ | 1269/12313 [57:08<8:09:44, 2.66s/it][2024-12-05 13:30:35,989] [WARNING] [stage3.py:1949:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time 10%|█ | 1270/12313 [57:12<9:09:44, 2.99s/it] {'loss': 0.5739, 'grad_norm': 6.4873010558747675, 'learning_rate': 4.930266971012748e-06, 'epoch': 0.1} 10%|█ | 1270/12313 [57:12<9:09:44, 2.99s/it] 10%|█ | 1271/12313 [57:14<8:49:28, 2.88s/it] {'loss': 0.7317, 'grad_norm': 5.62659346934576, 'learning_rate': 4.930112649097569e-06, 'epoch': 0.1} 10%|█ | 1271/12313 [57:14<8:49:28, 2.88s/it] 10%|█ | 1272/12313 [57:17<8:29:16, 2.77s/it] {'loss': 0.4922, 'grad_norm': 5.596039021432254, 'learning_rate': 4.929958159031457e-06, 'epoch': 0.1} 10%|█ | 1272/12313 [57:17<8:29:16, 2.77s/it] 10%|█ | 1273/12313 [57:19<8:18:00, 2.71s/it] {'loss': 0.5276, 'grad_norm': 5.488922386676459, 'learning_rate': 4.9298035008251e-06, 'epoch': 0.1} 10%|█ | 1273/12313 [57:19<8:18:00, 2.71s/it] 10%|█ | 1274/12313 [57:22<8:18:39, 2.71s/it] {'loss': 0.5529, 'grad_norm': 5.79593069742807, 'learning_rate': 4.929648674489201e-06, 'epoch': 0.1} 10%|█ | 1274/12313 [57:22<8:18:39, 2.71s/it] 10%|█ | 1275/12313 [57:24<8:02:20, 2.62s/it] {'loss': 0.6647, 'grad_norm': 9.122388238453604, 'learning_rate': 4.929493680034472e-06, 'epoch': 0.1} 10%|█ | 1275/12313 [57:24<8:02:20, 2.62s/it] 10%|█ | 1276/12313 [57:27<8:05:40, 2.64s/it] {'loss': 0.6721, 'grad_norm': 4.654226716442984, 'learning_rate': 4.929338517471638e-06, 'epoch': 0.1} 10%|█ | 1276/12313 [57:27<8:05:40, 2.64s/it] 10%|█ | 1277/12313 [57:30<8:20:38, 2.72s/it] {'loss': 0.4812, 'grad_norm': 4.921506870025976, 'learning_rate': 4.929183186811436e-06, 'epoch': 0.1} 10%|█ | 1277/12313 [57:30<8:20:38, 2.72s/it] 10%|█ | 1278/12313 [57:33<8:11:56, 2.67s/it] {'loss': 0.6183, 'grad_norm': 8.206987541104693, 'learning_rate': 4.9290276880646144e-06, 'epoch': 0.1} 10%|█ | 1278/12313 [57:33<8:11:56, 2.67s/it] 10%|█ | 1279/12313 [57:35<8:15:17, 2.69s/it] {'loss': 0.5676, 'grad_norm': 6.664251212109854, 'learning_rate': 4.928872021241932e-06, 'epoch': 0.1} 10%|█ | 1279/12313 [57:35<8:15:17, 2.69s/it] 10%|█ | 1280/12313 [57:38<8:17:10, 2.70s/it] {'loss': 0.5516, 'grad_norm': 5.059968838894893, 'learning_rate': 4.92871618635416e-06, 'epoch': 0.1} 10%|█ | 1280/12313 [57:38<8:17:10, 2.70s/it] 10%|█ | 1281/12313 [57:40<8:01:22, 2.62s/it] {'loss': 0.637, 'grad_norm': 4.462582725976913, 'learning_rate': 4.928560183412081e-06, 'epoch': 0.1} 10%|█ | 1281/12313 [57:40<8:01:22, 2.62s/it] 10%|█ | 1282/12313 [57:43<7:50:42, 2.56s/it] {'loss': 0.585, 'grad_norm': 13.013107165252704, 'learning_rate': 4.928404012426491e-06, 'epoch': 0.1} 10%|█ | 1282/12313 [57:43<7:50:42, 2.56s/it] 10%|█ | 1283/12313 [57:45<7:48:37, 2.55s/it] {'loss': 0.4416, 'grad_norm': 4.877077629277337, 'learning_rate': 4.9282476734081955e-06, 'epoch': 0.1} 10%|█ | 1283/12313 [57:45<7:48:37, 2.55s/it] 10%|█ | 1284/12313 [57:48<7:41:24, 2.51s/it] {'loss': 0.534, 'grad_norm': 5.339858862218509, 'learning_rate': 4.928091166368013e-06, 'epoch': 0.1} 10%|█ | 1284/12313 [57:48<7:41:24, 2.51s/it] 10%|█ | 1285/12313 [57:50<7:45:56, 2.54s/it] {'loss': 0.5402, 'grad_norm': 4.638726269664911, 'learning_rate': 4.927934491316771e-06, 'epoch': 0.1} 10%|█ | 1285/12313 [57:50<7:45:56, 2.54s/it] 10%|█ | 1286/12313 [57:53<7:51:30, 2.57s/it] {'loss': 0.8809, 'grad_norm': 5.279866806349571, 'learning_rate': 4.927777648265313e-06, 'epoch': 0.1} 10%|█ | 1286/12313 [57:53<7:51:30, 2.57s/it] 10%|█ | 1287/12313 [57:56<8:10:23, 2.67s/it] {'loss': 0.6804, 'grad_norm': 10.172012498738642, 'learning_rate': 4.927620637224489e-06, 'epoch': 0.1} 10%|█ | 1287/12313 [57:56<8:10:23, 2.67s/it] 10%|█ | 1288/12313 [57:59<8:07:05, 2.65s/it] {'loss': 0.484, 'grad_norm': 4.0125921901692605, 'learning_rate': 4.927463458205167e-06, 'epoch': 0.1} 10%|█ | 1288/12313 [57:59<8:07:05, 2.65s/it] 10%|█ | 1289/12313 [58:01<8:10:45, 2.67s/it] {'loss': 0.4823, 'grad_norm': 5.038605450511745, 'learning_rate': 4.9273061112182195e-06, 'epoch': 0.1} 10%|█ | 1289/12313 [58:01<8:10:45, 2.67s/it] 10%|█ | 1290/12313 [58:04<8:05:31, 2.64s/it] {'loss': 0.5736, 'grad_norm': 11.514242344873601, 'learning_rate': 4.9271485962745356e-06, 'epoch': 0.1} 10%|█ | 1290/12313 [58:04<8:05:31, 2.64s/it] 10%|█ | 1291/12313 [58:07<8:17:53, 2.71s/it] {'loss': 0.4464, 'grad_norm': 5.2507131812652705, 'learning_rate': 4.9269909133850146e-06, 'epoch': 0.1} 10%|█ | 1291/12313 [58:07<8:17:53, 2.71s/it] 10%|█ | 1292/12313 [58:10<8:36:46, 2.81s/it] {'loss': 0.5488, 'grad_norm': 3.587842597108573, 'learning_rate': 4.926833062560566e-06, 'epoch': 0.1} 10%|█ | 1292/12313 [58:10<8:36:46, 2.81s/it] 11%|█ | 1293/12313 [58:12<8:29:58, 2.78s/it] {'loss': 0.7937, 'grad_norm': 4.9325898595523405, 'learning_rate': 4.926675043812115e-06, 'epoch': 0.11} 11%|█ | 1293/12313 [58:12<8:29:58, 2.78s/it] 11%|█ | 1294/12313 [58:15<8:26:44, 2.76s/it] {'loss': 0.6927, 'grad_norm': 4.089069618738186, 'learning_rate': 4.926516857150593e-06, 'epoch': 0.11} 11%|█ | 1294/12313 [58:15<8:26:44, 2.76s/it] 11%|█ | 1295/12313 [58:18<8:20:24, 2.73s/it] {'loss': 0.6499, 'grad_norm': 4.157543455041517, 'learning_rate': 4.926358502586948e-06, 'epoch': 0.11} 11%|█ | 1295/12313 [58:18<8:20:24, 2.73s/it] 11%|█ | 1296/12313 [58:21<8:23:46, 2.74s/it] {'loss': 0.6702, 'grad_norm': 4.591475297442821, 'learning_rate': 4.9261999801321345e-06, 'epoch': 0.11} 11%|█ | 1296/12313 [58:21<8:23:46, 2.74s/it] 11%|█ | 1297/12313 [58:23<8:08:59, 2.66s/it] {'loss': 0.5649, 'grad_norm': 5.076099157718857, 'learning_rate': 4.9260412897971225e-06, 'epoch': 0.11} 11%|█ | 1297/12313 [58:23<8:08:59, 2.66s/it] 11%|█ | 1298/12313 [58:26<7:56:54, 2.60s/it] {'loss': 0.6207, 'grad_norm': 5.92574396887693, 'learning_rate': 4.9258824315928935e-06, 'epoch': 0.11} 11%|█ | 1298/12313 [58:26<7:56:54, 2.60s/it] 11%|█ | 1299/12313 [58:28<8:05:22, 2.64s/it] {'loss': 0.6171, 'grad_norm': 4.295889777812483, 'learning_rate': 4.925723405530439e-06, 'epoch': 0.11} 11%|█ | 1299/12313 [58:28<8:05:22, 2.64s/it] 11%|█ | 1300/12313 [58:31<7:56:58, 2.60s/it] {'loss': 0.6708, 'grad_norm': 6.0185008543132055, 'learning_rate': 4.925564211620764e-06, 'epoch': 0.11} 11%|█ | 1300/12313 [58:31<7:56:58, 2.60s/it] 11%|█ | 1301/12313 [58:33<7:47:01, 2.54s/it] {'loss': 0.5312, 'grad_norm': 4.288628260503543, 'learning_rate': 4.9254048498748804e-06, 'epoch': 0.11} 11%|█ | 1301/12313 [58:33<7:47:01, 2.54s/it] 11%|█ | 1302/12313 [58:36<7:49:17, 2.56s/it] {'loss': 0.6256, 'grad_norm': 5.912622580277487, 'learning_rate': 4.925245320303819e-06, 'epoch': 0.11} 11%|█ | 1302/12313 [58:36<7:49:17, 2.56s/it] 11%|█ | 1303/12313 [58:38<7:40:07, 2.51s/it] {'loss': 0.6512, 'grad_norm': 4.036381733012363, 'learning_rate': 4.925085622918618e-06, 'epoch': 0.11} 11%|█ | 1303/12313 [58:38<7:40:07, 2.51s/it] 11%|█ | 1304/12313 [58:41<7:45:35, 2.54s/it] {'loss': 0.6243, 'grad_norm': 11.770792105747262, 'learning_rate': 4.924925757730324e-06, 'epoch': 0.11} 11%|█ | 1304/12313 [58:41<7:45:35, 2.54s/it] 11%|█ | 1305/12313 [58:44<7:55:45, 2.59s/it] {'loss': 0.5521, 'grad_norm': 7.6314249401022085, 'learning_rate': 4.924765724750002e-06, 'epoch': 0.11} 11%|█ | 1305/12313 [58:44<7:55:45, 2.59s/it] 11%|█ | 1306/12313 [58:46<7:56:11, 2.60s/it] {'loss': 0.626, 'grad_norm': 4.238420473450789, 'learning_rate': 4.9246055239887255e-06, 'epoch': 0.11} 11%|█ | 1306/12313 [58:46<7:56:11, 2.60s/it] 11%|█ | 1307/12313 [58:49<8:11:10, 2.68s/it] {'loss': 0.7469, 'grad_norm': 5.023366885034705, 'learning_rate': 4.924445155457578e-06, 'epoch': 0.11} 11%|█ | 1307/12313 [58:49<8:11:10, 2.68s/it] 11%|█ | 1308/12313 [58:52<8:11:08, 2.68s/it] {'loss': 0.6673, 'grad_norm': 5.927044022062121, 'learning_rate': 4.924284619167657e-06, 'epoch': 0.11} 11%|█ | 1308/12313 [58:52<8:11:08, 2.68s/it] 11%|█ | 1309/12313 [58:54<8:01:57, 2.63s/it] {'loss': 0.4591, 'grad_norm': 27.667866009821907, 'learning_rate': 4.924123915130072e-06, 'epoch': 0.11} 11%|█ | 1309/12313 [58:54<8:01:57, 2.63s/it] 11%|█ | 1310/12313 [58:57<8:00:13, 2.62s/it] {'loss': 0.7129, 'grad_norm': 6.067033293424752, 'learning_rate': 4.92396304335594e-06, 'epoch': 0.11} 11%|█ | 1310/12313 [58:57<8:00:13, 2.62s/it] 11%|█ | 1311/12313 [58:59<8:04:56, 2.64s/it] {'loss': 0.7251, 'grad_norm': 4.48662582291358, 'learning_rate': 4.923802003856395e-06, 'epoch': 0.11} 11%|█ | 1311/12313 [59:00<8:04:56, 2.64s/it] 11%|█ | 1312/12313 [59:02<8:06:38, 2.65s/it] {'loss': 0.5621, 'grad_norm': 8.910396684118473, 'learning_rate': 4.923640796642578e-06, 'epoch': 0.11} 11%|█ | 1312/12313 [59:02<8:06:38, 2.65s/it] 11%|█ | 1313/12313 [59:05<8:17:04, 2.71s/it] {'loss': 0.7027, 'grad_norm': 10.709244615053429, 'learning_rate': 4.923479421725646e-06, 'epoch': 0.11} 11%|█ | 1313/12313 [59:05<8:17:04, 2.71s/it] 11%|█ | 1314/12313 [59:08<8:14:04, 2.70s/it] {'loss': 0.6991, 'grad_norm': 6.558186461825374, 'learning_rate': 4.923317879116764e-06, 'epoch': 0.11} 11%|█ | 1314/12313 [59:08<8:14:04, 2.70s/it] 11%|█ | 1315/12313 [59:10<8:13:54, 2.69s/it] {'loss': 0.5193, 'grad_norm': 4.402808201885828, 'learning_rate': 4.923156168827109e-06, 'epoch': 0.11} 11%|█ | 1315/12313 [59:10<8:13:54, 2.69s/it] 11%|█ | 1316/12313 [59:13<8:10:39, 2.68s/it] {'loss': 0.5984, 'grad_norm': 3.9531074248473677, 'learning_rate': 4.922994290867872e-06, 'epoch': 0.11} 11%|█ | 1316/12313 [59:13<8:10:39, 2.68s/it] 11%|█ | 1317/12313 [59:16<8:12:37, 2.69s/it] {'loss': 0.5759, 'grad_norm': 5.634957726444024, 'learning_rate': 4.922832245250254e-06, 'epoch': 0.11} 11%|█ | 1317/12313 [59:16<8:12:37, 2.69s/it] 11%|█ | 1318/12313 [59:19<8:36:58, 2.82s/it] {'loss': 0.5291, 'grad_norm': 4.687012040425469, 'learning_rate': 4.922670031985467e-06, 'epoch': 0.11} 11%|█ | 1318/12313 [59:19<8:36:58, 2.82s/it] 11%|█ | 1319/12313 [59:22<8:40:39, 2.84s/it] {'loss': 0.631, 'grad_norm': 7.331234418642, 'learning_rate': 4.922507651084736e-06, 'epoch': 0.11} 11%|█ | 1319/12313 [59:22<8:40:39, 2.84s/it] 11%|█ | 1320/12313 [59:24<8:20:57, 2.73s/it] {'loss': 0.5047, 'grad_norm': 6.179571341056749, 'learning_rate': 4.9223451025592965e-06, 'epoch': 0.11} 11%|█ | 1320/12313 [59:24<8:20:57, 2.73s/it] 11%|█ | 1321/12313 [59:27<8:14:55, 2.70s/it] {'loss': 0.5616, 'grad_norm': 4.797755420619416, 'learning_rate': 4.9221823864203955e-06, 'epoch': 0.11} 11%|█ | 1321/12313 [59:27<8:14:55, 2.70s/it] 11%|█ | 1322/12313 [59:29<8:11:10, 2.68s/it] {'loss': 0.7038, 'grad_norm': 4.7796380411767885, 'learning_rate': 4.922019502679292e-06, 'epoch': 0.11} 11%|█ | 1322/12313 [59:29<8:11:10, 2.68s/it] 11%|█ | 1323/12313 [59:32<8:05:12, 2.65s/it] {'loss': 0.8187, 'grad_norm': 4.880297830435732, 'learning_rate': 4.921856451347258e-06, 'epoch': 0.11} 11%|█ | 1323/12313 [59:32<8:05:12, 2.65s/it] 11%|█ | 1324/12313 [59:35<8:23:47, 2.75s/it] {'loss': 0.6288, 'grad_norm': 4.4975518069828295, 'learning_rate': 4.9216932324355755e-06, 'epoch': 0.11} 11%|█ | 1324/12313 [59:35<8:23:47, 2.75s/it] 11%|█ | 1325/12313 [59:38<8:17:02, 2.71s/it] {'loss': 0.5948, 'grad_norm': 5.693686439938185, 'learning_rate': 4.921529845955537e-06, 'epoch': 0.11} 11%|█ | 1325/12313 [59:38<8:17:02, 2.71s/it] 11%|█ | 1326/12313 [59:40<8:14:57, 2.70s/it] {'loss': 0.7244, 'grad_norm': 9.014157893199854, 'learning_rate': 4.9213662919184495e-06, 'epoch': 0.11} 11%|█ | 1326/12313 [59:40<8:14:57, 2.70s/it] 11%|█ | 1327/12313 [59:43<8:21:29, 2.74s/it] {'loss': 0.5731, 'grad_norm': 17.656772040857238, 'learning_rate': 4.921202570335629e-06, 'epoch': 0.11} 11%|█ | 1327/12313 [59:43<8:21:29, 2.74s/it] 11%|█ | 1328/12313 [59:46<8:23:29, 2.75s/it] {'loss': 0.5082, 'grad_norm': 7.338056929014862, 'learning_rate': 4.921038681218405e-06, 'epoch': 0.11} 11%|█ | 1328/12313 [59:46<8:23:29, 2.75s/it] 11%|█ | 1329/12313 [59:49<8:20:45, 2.74s/it] {'loss': 0.7288, 'grad_norm': 4.851712422566327, 'learning_rate': 4.920874624578118e-06, 'epoch': 0.11} 11%|█ | 1329/12313 [59:49<8:20:45, 2.74s/it] 11%|█ | 1330/12313 [59:51<8:17:17, 2.72s/it] {'loss': 0.4265, 'grad_norm': 5.876176027514485, 'learning_rate': 4.920710400426118e-06, 'epoch': 0.11} 11%|█ | 1330/12313 [59:51<8:17:17, 2.72s/it] 11%|█ | 1331/12313 [59:54<8:08:41, 2.67s/it] {'loss': 0.4902, 'grad_norm': 5.880882787741879, 'learning_rate': 4.920546008773771e-06, 'epoch': 0.11} 11%|█ | 1331/12313 [59:54<8:08:41, 2.67s/it] 11%|█ | 1332/12313 [59:57<8:05:54, 2.66s/it] {'loss': 0.7027, 'grad_norm': 5.58193279698704, 'learning_rate': 4.920381449632451e-06, 'epoch': 0.11} 11%|█ | 1332/12313 [59:57<8:05:54, 2.66s/it] 11%|█ | 1333/12313 [59:59<8:12:28, 2.69s/it] {'loss': 0.646, 'grad_norm': 5.173776451487993, 'learning_rate': 4.920216723013544e-06, 'epoch': 0.11} 11%|█ | 1333/12313 [59:59<8:12:28, 2.69s/it] 11%|█ | 1334/12313 [1:00:02<8:15:50, 2.71s/it] {'loss': 0.4977, 'grad_norm': 4.388587188343882, 'learning_rate': 4.920051828928448e-06, 'epoch': 0.11} 11%|█ | 1334/12313 [1:00:02<8:15:50, 2.71s/it] 11%|█ | 1335/12313 [1:00:05<8:15:29, 2.71s/it] {'loss': 0.5767, 'grad_norm': 4.776839443402457, 'learning_rate': 4.919886767388573e-06, 'epoch': 0.11} 11%|█ | 1335/12313 [1:00:05<8:15:29, 2.71s/it] 11%|█ | 1336/12313 [1:00:07<8:09:49, 2.68s/it] {'loss': 0.533, 'grad_norm': 3.4824258344375725, 'learning_rate': 4.919721538405341e-06, 'epoch': 0.11} 11%|█ | 1336/12313 [1:00:07<8:09:49, 2.68s/it] 11%|█ | 1337/12313 [1:00:10<8:11:28, 2.69s/it] {'loss': 0.6688, 'grad_norm': 4.742639309949176, 'learning_rate': 4.919556141990186e-06, 'epoch': 0.11} 11%|█ | 1337/12313 [1:00:10<8:11:28, 2.69s/it] 11%|█ | 1338/12313 [1:00:13<8:06:03, 2.66s/it] {'loss': 0.631, 'grad_norm': 4.653988617546264, 'learning_rate': 4.919390578154551e-06, 'epoch': 0.11} 11%|█ | 1338/12313 [1:00:13<8:06:03, 2.66s/it] 11%|█ | 1339/12313 [1:00:15<8:14:37, 2.70s/it] {'loss': 0.5437, 'grad_norm': 5.238136207553621, 'learning_rate': 4.919224846909891e-06, 'epoch': 0.11} 11%|█ | 1339/12313 [1:00:15<8:14:37, 2.70s/it] 11%|█ | 1340/12313 [1:00:18<8:14:34, 2.70s/it] {'loss': 0.8166, 'grad_norm': 6.815351329648775, 'learning_rate': 4.919058948267677e-06, 'epoch': 0.11} 11%|█ | 1340/12313 [1:00:18<8:14:34, 2.70s/it] 11%|█ | 1341/12313 [1:00:21<8:31:17, 2.80s/it] {'loss': 0.6044, 'grad_norm': 5.2640299180243355, 'learning_rate': 4.918892882239384e-06, 'epoch': 0.11} 11%|█ | 1341/12313 [1:00:21<8:31:17, 2.80s/it] 11%|█ | 1342/12313 [1:00:24<8:11:56, 2.69s/it] {'loss': 0.7873, 'grad_norm': 4.358517148619114, 'learning_rate': 4.918726648836507e-06, 'epoch': 0.11} 11%|█ | 1342/12313 [1:00:24<8:11:56, 2.69s/it] 11%|█ | 1343/12313 [1:00:26<8:03:14, 2.64s/it] {'loss': 0.5615, 'grad_norm': 6.052945574950095, 'learning_rate': 4.918560248070547e-06, 'epoch': 0.11} 11%|█ | 1343/12313 [1:00:26<8:03:14, 2.64s/it] 11%|█ | 1344/12313 [1:00:29<8:04:32, 2.65s/it] {'loss': 0.5893, 'grad_norm': 4.566253758686044, 'learning_rate': 4.918393679953018e-06, 'epoch': 0.11} 11%|█ | 1344/12313 [1:00:29<8:04:32, 2.65s/it] 11%|█ | 1345/12313 [1:00:31<7:59:30, 2.62s/it] {'loss': 0.6582, 'grad_norm': 3.497997250474178, 'learning_rate': 4.918226944495445e-06, 'epoch': 0.11} 11%|█ | 1345/12313 [1:00:31<7:59:30, 2.62s/it] 11%|█ | 1346/12313 [1:00:34<8:07:51, 2.67s/it] {'loss': 0.563, 'grad_norm': 6.920939485791569, 'learning_rate': 4.918060041709366e-06, 'epoch': 0.11} 11%|█ | 1346/12313 [1:00:34<8:07:51, 2.67s/it] 11%|█ | 1347/12313 [1:00:37<7:56:29, 2.61s/it] {'loss': 0.5954, 'grad_norm': 13.15024709863077, 'learning_rate': 4.917892971606329e-06, 'epoch': 0.11} 11%|█ | 1347/12313 [1:00:37<7:56:29, 2.61s/it] 11%|█ | 1348/12313 [1:00:39<7:46:15, 2.55s/it] {'loss': 0.548, 'grad_norm': 4.208801375695993, 'learning_rate': 4.917725734197896e-06, 'epoch': 0.11} 11%|█ | 1348/12313 [1:00:39<7:46:15, 2.55s/it] 11%|█ | 1349/12313 [1:00:42<7:49:50, 2.57s/it] {'loss': 0.6602, 'grad_norm': 4.648874348064858, 'learning_rate': 4.917558329495636e-06, 'epoch': 0.11} 11%|█ | 1349/12313 [1:00:42<7:49:50, 2.57s/it] 11%|█ | 1350/12313 [1:00:45<8:08:04, 2.67s/it] {'loss': 0.5401, 'grad_norm': 4.308899285098474, 'learning_rate': 4.917390757511136e-06, 'epoch': 0.11} 11%|█ | 1350/12313 [1:00:45<8:08:04, 2.67s/it] 11%|█ | 1351/12313 [1:00:47<8:08:53, 2.68s/it] {'loss': 0.5193, 'grad_norm': 5.129516631257819, 'learning_rate': 4.917223018255989e-06, 'epoch': 0.11} 11%|█ | 1351/12313 [1:00:47<8:08:53, 2.68s/it] 11%|█ | 1352/12313 [1:00:50<8:08:56, 2.68s/it] {'loss': 0.5366, 'grad_norm': 6.107816197257616, 'learning_rate': 4.917055111741802e-06, 'epoch': 0.11} 11%|█ | 1352/12313 [1:00:50<8:08:56, 2.68s/it] 11%|█ | 1353/12313 [1:00:53<8:12:42, 2.70s/it] {'loss': 0.5354, 'grad_norm': 4.707618481744731, 'learning_rate': 4.916887037980193e-06, 'epoch': 0.11} 11%|█ | 1353/12313 [1:00:53<8:12:42, 2.70s/it] 11%|█ | 1354/12313 [1:00:55<8:04:25, 2.65s/it] {'loss': 0.7129, 'grad_norm': 4.1837375718082255, 'learning_rate': 4.916718796982793e-06, 'epoch': 0.11} 11%|█ | 1354/12313 [1:00:55<8:04:25, 2.65s/it] 11%|█ | 1355/12313 [1:00:58<7:51:13, 2.58s/it] {'loss': 0.5567, 'grad_norm': 3.4672179946418975, 'learning_rate': 4.916550388761242e-06, 'epoch': 0.11} 11%|█ | 1355/12313 [1:00:58<7:51:13, 2.58s/it] 11%|█ | 1356/12313 [1:01:00<7:56:42, 2.61s/it] {'loss': 0.5612, 'grad_norm': 4.01249160395805, 'learning_rate': 4.916381813327194e-06, 'epoch': 0.11} 11%|█ | 1356/12313 [1:01:00<7:56:42, 2.61s/it] 11%|█ | 1357/12313 [1:01:03<8:03:36, 2.65s/it] {'loss': 0.5274, 'grad_norm': 4.478566912854505, 'learning_rate': 4.916213070692312e-06, 'epoch': 0.11} 11%|█ | 1357/12313 [1:01:03<8:03:36, 2.65s/it] 11%|█ | 1358/12313 [1:01:06<8:17:58, 2.73s/it] {'loss': 0.5645, 'grad_norm': 7.752567209797171, 'learning_rate': 4.916044160868273e-06, 'epoch': 0.11} 11%|█ | 1358/12313 [1:01:06<8:17:58, 2.73s/it] 11%|█ | 1359/12313 [1:01:09<8:16:22, 2.72s/it] {'loss': 0.5816, 'grad_norm': 4.915949482614922, 'learning_rate': 4.915875083866766e-06, 'epoch': 0.11} 11%|█ | 1359/12313 [1:01:09<8:16:22, 2.72s/it] 11%|█ | 1360/12313 [1:01:11<8:01:06, 2.64s/it] {'loss': 0.5693, 'grad_norm': 4.872300208909787, 'learning_rate': 4.915705839699488e-06, 'epoch': 0.11} 11%|█ | 1360/12313 [1:01:11<8:01:06, 2.64s/it] 11%|█ | 1361/12313 [1:01:14<8:00:01, 2.63s/it] {'loss': 0.6373, 'grad_norm': 6.8669686866426645, 'learning_rate': 4.915536428378152e-06, 'epoch': 0.11} 11%|█ | 1361/12313 [1:01:14<8:00:01, 2.63s/it] 11%|█ | 1362/12313 [1:01:16<8:02:58, 2.65s/it] {'loss': 0.5895, 'grad_norm': 9.217173762118762, 'learning_rate': 4.915366849914479e-06, 'epoch': 0.11} 11%|█ | 1362/12313 [1:01:16<8:02:58, 2.65s/it] 11%|█ | 1363/12313 [1:01:19<7:59:16, 2.63s/it] {'loss': 0.6099, 'grad_norm': 5.851967895961872, 'learning_rate': 4.915197104320203e-06, 'epoch': 0.11} 11%|█ | 1363/12313 [1:01:19<7:59:16, 2.63s/it] 11%|█ | 1364/12313 [1:01:22<8:01:17, 2.64s/it] {'loss': 0.6278, 'grad_norm': 6.051580089476815, 'learning_rate': 4.915027191607069e-06, 'epoch': 0.11} 11%|█ | 1364/12313 [1:01:22<8:01:17, 2.64s/it] 11%|█ | 1365/12313 [1:01:25<8:46:39, 2.89s/it] {'loss': 0.7995, 'grad_norm': 4.182706621669257, 'learning_rate': 4.914857111786835e-06, 'epoch': 0.11} 11%|█ | 1365/12313 [1:01:25<8:46:39, 2.89s/it] 11%|█ | 1366/12313 [1:01:28<8:35:19, 2.82s/it] {'loss': 0.5338, 'grad_norm': 6.4377665961998, 'learning_rate': 4.9146868648712694e-06, 'epoch': 0.11} 11%|█ | 1366/12313 [1:01:28<8:35:19, 2.82s/it] 11%|█ | 1367/12313 [1:01:31<8:35:13, 2.82s/it] {'loss': 0.4989, 'grad_norm': 4.1795950899223255, 'learning_rate': 4.914516450872152e-06, 'epoch': 0.11} 11%|█ | 1367/12313 [1:01:31<8:35:13, 2.82s/it] 11%|█ | 1368/12313 [1:01:33<8:33:35, 2.82s/it] {'loss': 0.5973, 'grad_norm': 5.590546828289337, 'learning_rate': 4.914345869801276e-06, 'epoch': 0.11} 11%|█ | 1368/12313 [1:01:33<8:33:35, 2.82s/it] 11%|█ | 1369/12313 [1:01:36<8:16:38, 2.72s/it] {'loss': 0.5177, 'grad_norm': 6.538514315603108, 'learning_rate': 4.914175121670443e-06, 'epoch': 0.11} 11%|█ | 1369/12313 [1:01:36<8:16:38, 2.72s/it] 11%|█ | 1370/12313 [1:01:39<8:33:06, 2.81s/it] {'loss': 0.5844, 'grad_norm': 3.691433117466898, 'learning_rate': 4.914004206491467e-06, 'epoch': 0.11} 11%|█ | 1370/12313 [1:01:39<8:33:06, 2.81s/it] 11%|█ | 1371/12313 [1:01:42<8:46:53, 2.89s/it] {'loss': 0.5717, 'grad_norm': 4.522420652396943, 'learning_rate': 4.913833124276177e-06, 'epoch': 0.11} 11%|█ | 1371/12313 [1:01:42<8:46:53, 2.89s/it] 11%|█ | 1372/12313 [1:01:45<8:35:00, 2.82s/it] {'loss': 0.5929, 'grad_norm': 5.582588983452335, 'learning_rate': 4.9136618750364105e-06, 'epoch': 0.11} 11%|█ | 1372/12313 [1:01:45<8:35:00, 2.82s/it] 11%|█ | 1373/12313 [1:01:47<8:24:44, 2.77s/it] {'loss': 0.6584, 'grad_norm': 3.5822825901443145, 'learning_rate': 4.913490458784016e-06, 'epoch': 0.11} 11%|█ | 1373/12313 [1:01:47<8:24:44, 2.77s/it] 11%|█ | 1374/12313 [1:01:50<8:11:09, 2.69s/it] {'loss': 0.7448, 'grad_norm': 6.955323322713761, 'learning_rate': 4.913318875530855e-06, 'epoch': 0.11} 11%|█ | 1374/12313 [1:01:50<8:11:09, 2.69s/it] 11%|█ | 1375/12313 [1:01:52<8:04:25, 2.66s/it] {'loss': 0.5943, 'grad_norm': 7.999732286808689, 'learning_rate': 4.9131471252887995e-06, 'epoch': 0.11} 11%|█ | 1375/12313 [1:01:52<8:04:25, 2.66s/it] 11%|█ | 1376/12313 [1:01:55<8:01:26, 2.64s/it] {'loss': 0.5888, 'grad_norm': 5.699553884482589, 'learning_rate': 4.912975208069735e-06, 'epoch': 0.11} 11%|█ | 1376/12313 [1:01:55<8:01:26, 2.64s/it] 11%|█ | 1377/12313 [1:01:58<8:04:19, 2.66s/it] {'loss': 0.579, 'grad_norm': 9.937720162090995, 'learning_rate': 4.912803123885555e-06, 'epoch': 0.11} 11%|█ | 1377/12313 [1:01:58<8:04:19, 2.66s/it] 11%|█ | 1378/12313 [1:02:01<8:31:45, 2.81s/it] {'loss': 0.4428, 'grad_norm': 6.426480257301145, 'learning_rate': 4.912630872748171e-06, 'epoch': 0.11} 11%|█ | 1378/12313 [1:02:01<8:31:45, 2.81s/it] 11%|█ | 1379/12313 [1:02:04<8:31:17, 2.81s/it] {'loss': 0.5904, 'grad_norm': 4.6331109760310625, 'learning_rate': 4.912458454669498e-06, 'epoch': 0.11} 11%|█ | 1379/12313 [1:02:04<8:31:17, 2.81s/it] 11%|█ | 1380/12313 [1:02:06<8:32:02, 2.81s/it] {'loss': 0.5371, 'grad_norm': 5.28940520266137, 'learning_rate': 4.912285869661467e-06, 'epoch': 0.11} 11%|█ | 1380/12313 [1:02:06<8:32:02, 2.81s/it] 11%|█ | 1381/12313 [1:02:09<8:21:49, 2.75s/it] {'loss': 0.7556, 'grad_norm': 3.8233787152045418, 'learning_rate': 4.912113117736022e-06, 'epoch': 0.11} 11%|█ | 1381/12313 [1:02:09<8:21:49, 2.75s/it] 11%|█ | 1382/12313 [1:02:12<8:19:51, 2.74s/it] {'loss': 0.6213, 'grad_norm': 5.065689586582994, 'learning_rate': 4.911940198905114e-06, 'epoch': 0.11} 11%|█ | 1382/12313 [1:02:12<8:19:51, 2.74s/it] 11%|█ | 1383/12313 [1:02:15<8:17:57, 2.73s/it] {'loss': 0.6723, 'grad_norm': 6.1114483413633245, 'learning_rate': 4.91176711318071e-06, 'epoch': 0.11} 11%|█ | 1383/12313 [1:02:15<8:17:57, 2.73s/it] 11%|█ | 1384/12313 [1:02:17<8:16:24, 2.73s/it] {'loss': 0.5786, 'grad_norm': 7.783537136958568, 'learning_rate': 4.911593860574785e-06, 'epoch': 0.11} 11%|█ | 1384/12313 [1:02:17<8:16:24, 2.73s/it] 11%|█ | 1385/12313 [1:02:20<8:10:12, 2.69s/it] {'loss': 0.6476, 'grad_norm': 8.312210011930208, 'learning_rate': 4.911420441099329e-06, 'epoch': 0.11} 11%|█ | 1385/12313 [1:02:20<8:10:12, 2.69s/it] 11%|█▏ | 1386/12313 [1:02:22<7:54:29, 2.61s/it] {'loss': 0.686, 'grad_norm': 4.230328963619588, 'learning_rate': 4.911246854766341e-06, 'epoch': 0.11} 11%|█▏ | 1386/12313 [1:02:22<7:54:29, 2.61s/it] 11%|█▏ | 1387/12313 [1:02:25<7:49:00, 2.58s/it] {'loss': 0.4509, 'grad_norm': 12.499607763263779, 'learning_rate': 4.911073101587831e-06, 'epoch': 0.11} 11%|█▏ | 1387/12313 [1:02:25<7:49:00, 2.58s/it] 11%|█▏ | 1388/12313 [1:02:27<7:49:32, 2.58s/it] {'loss': 0.8017, 'grad_norm': 4.97136722337538, 'learning_rate': 4.9108991815758225e-06, 'epoch': 0.11} 11%|█▏ | 1388/12313 [1:02:27<7:49:32, 2.58s/it] 11%|█▏ | 1389/12313 [1:02:30<7:57:47, 2.62s/it] {'loss': 0.574, 'grad_norm': 4.68367623760023, 'learning_rate': 4.9107250947423516e-06, 'epoch': 0.11} 11%|█▏ | 1389/12313 [1:02:30<7:57:47, 2.62s/it] 11%|█▏ | 1390/12313 [1:02:33<7:56:10, 2.62s/it] {'loss': 0.6234, 'grad_norm': 4.5482183003098, 'learning_rate': 4.910550841099462e-06, 'epoch': 0.11} 11%|█▏ | 1390/12313 [1:02:33<7:56:10, 2.62s/it] 11%|█▏ | 1391/12313 [1:02:35<7:51:23, 2.59s/it] {'loss': 0.7674, 'grad_norm': 3.734134582087491, 'learning_rate': 4.910376420659211e-06, 'epoch': 0.11} 11%|█▏ | 1391/12313 [1:02:35<7:51:23, 2.59s/it] 11%|█▏ | 1392/12313 [1:02:38<7:45:20, 2.56s/it] {'loss': 0.6176, 'grad_norm': 9.222540748041622, 'learning_rate': 4.91020183343367e-06, 'epoch': 0.11} 11%|█▏ | 1392/12313 [1:02:38<7:45:20, 2.56s/it] 11%|█▏ | 1393/12313 [1:02:40<7:36:04, 2.51s/it] {'loss': 0.7806, 'grad_norm': 7.2087300631114095, 'learning_rate': 4.910027079434917e-06, 'epoch': 0.11} 11%|█▏ | 1393/12313 [1:02:40<7:36:04, 2.51s/it] 11%|█▏ | 1394/12313 [1:02:43<7:35:39, 2.50s/it] {'loss': 0.6478, 'grad_norm': 4.209780015944576, 'learning_rate': 4.909852158675045e-06, 'epoch': 0.11} 11%|█▏ | 1394/12313 [1:02:43<7:35:39, 2.50s/it] 11%|█▏ | 1395/12313 [1:02:45<7:56:08, 2.62s/it] {'loss': 0.7015, 'grad_norm': 4.672108189183994, 'learning_rate': 4.9096770711661575e-06, 'epoch': 0.11} 11%|█▏ | 1395/12313 [1:02:45<7:56:08, 2.62s/it] 11%|█▏ | 1396/12313 [1:02:48<7:47:17, 2.57s/it] {'loss': 0.6014, 'grad_norm': 5.391522206407413, 'learning_rate': 4.90950181692037e-06, 'epoch': 0.11} 11%|█▏ | 1396/12313 [1:02:48<7:47:17, 2.57s/it] 11%|█▏ | 1397/12313 [1:02:51<7:56:08, 2.62s/it] {'loss': 0.6083, 'grad_norm': 6.512018993656574, 'learning_rate': 4.909326395949809e-06, 'epoch': 0.11} 11%|█▏ | 1397/12313 [1:02:51<7:56:08, 2.62s/it] 11%|█▏ | 1398/12313 [1:02:54<8:22:29, 2.76s/it] {'loss': 0.6495, 'grad_norm': 5.621883708458305, 'learning_rate': 4.909150808266613e-06, 'epoch': 0.11} 11%|█▏ | 1398/12313 [1:02:54<8:22:29, 2.76s/it] 11%|█▏ | 1399/12313 [1:02:56<8:15:06, 2.72s/it] {'loss': 0.5368, 'grad_norm': 7.16365741465048, 'learning_rate': 4.908975053882931e-06, 'epoch': 0.11} 11%|█▏ | 1399/12313 [1:02:56<8:15:06, 2.72s/it] 11%|█▏ | 1400/12313 [1:02:59<8:14:35, 2.72s/it] {'loss': 0.7422, 'grad_norm': 5.860792942986273, 'learning_rate': 4.908799132810924e-06, 'epoch': 0.11} 11%|█▏ | 1400/12313 [1:02:59<8:14:35, 2.72s/it] 11%|█▏ | 1401/12313 [1:03:02<8:11:25, 2.70s/it] {'loss': 0.61, 'grad_norm': 3.673990330967191, 'learning_rate': 4.9086230450627655e-06, 'epoch': 0.11} 11%|█▏ | 1401/12313 [1:03:02<8:11:25, 2.70s/it] 11%|█▏ | 1402/12313 [1:03:04<8:06:20, 2.67s/it] {'loss': 0.7079, 'grad_norm': 3.927055438430478, 'learning_rate': 4.908446790650641e-06, 'epoch': 0.11} 11%|█▏ | 1402/12313 [1:03:04<8:06:20, 2.67s/it] 11%|█▏ | 1403/12313 [1:03:07<8:05:58, 2.67s/it] {'loss': 0.5993, 'grad_norm': 7.144929200227564, 'learning_rate': 4.908270369586744e-06, 'epoch': 0.11} 11%|█▏ | 1403/12313 [1:03:07<8:05:58, 2.67s/it] 11%|█▏ | 1404/12313 [1:03:10<8:13:35, 2.71s/it] {'loss': 0.7028, 'grad_norm': 4.458920511681134, 'learning_rate': 4.908093781883283e-06, 'epoch': 0.11} 11%|█▏ | 1404/12313 [1:03:10<8:13:35, 2.71s/it] 11%|█▏ | 1405/12313 [1:03:12<7:58:31, 2.63s/it] {'loss': 0.5911, 'grad_norm': 4.699619928678632, 'learning_rate': 4.9079170275524765e-06, 'epoch': 0.11} 11%|█▏ | 1405/12313 [1:03:12<7:58:31, 2.63s/it] 11%|█▏ | 1406/12313 [1:03:15<8:17:22, 2.74s/it] {'loss': 0.5615, 'grad_norm': 5.941174948945086, 'learning_rate': 4.907740106606557e-06, 'epoch': 0.11} 11%|█▏ | 1406/12313 [1:03:15<8:17:22, 2.74s/it] 11%|█▏ | 1407/12313 [1:03:18<8:02:15, 2.65s/it] {'loss': 0.5378, 'grad_norm': 6.471908401673319, 'learning_rate': 4.9075630190577634e-06, 'epoch': 0.11} 11%|█▏ | 1407/12313 [1:03:18<8:02:15, 2.65s/it] 11%|█▏ | 1408/12313 [1:03:20<8:06:12, 2.68s/it] {'loss': 0.6547, 'grad_norm': 3.9643546125770825, 'learning_rate': 4.907385764918351e-06, 'epoch': 0.11} 11%|█▏ | 1408/12313 [1:03:20<8:06:12, 2.68s/it] 11%|█▏ | 1409/12313 [1:03:23<7:47:29, 2.57s/it] {'loss': 0.6994, 'grad_norm': 4.878410036784967, 'learning_rate': 4.907208344200585e-06, 'epoch': 0.11} 11%|█▏ | 1409/12313 [1:03:23<7:47:29, 2.57s/it] 11%|█▏ | 1410/12313 [1:03:25<7:47:13, 2.57s/it] {'loss': 0.5712, 'grad_norm': 5.156488131990337, 'learning_rate': 4.907030756916741e-06, 'epoch': 0.11} 11%|█▏ | 1410/12313 [1:03:25<7:47:13, 2.57s/it] 11%|█▏ | 1411/12313 [1:03:28<7:48:03, 2.58s/it] {'loss': 0.6316, 'grad_norm': 4.790335390127153, 'learning_rate': 4.906853003079108e-06, 'epoch': 0.11} 11%|█▏ | 1411/12313 [1:03:28<7:48:03, 2.58s/it] 11%|█▏ | 1412/12313 [1:03:31<7:58:22, 2.63s/it] {'loss': 0.6906, 'grad_norm': 4.153017717675372, 'learning_rate': 4.9066750826999855e-06, 'epoch': 0.11} 11%|█▏ | 1412/12313 [1:03:31<7:58:22, 2.63s/it] 11%|█▏ | 1413/12313 [1:03:34<8:10:36, 2.70s/it] {'loss': 0.512, 'grad_norm': 3.87389513168029, 'learning_rate': 4.906496995791684e-06, 'epoch': 0.11} 11%|█▏ | 1413/12313 [1:03:34<8:10:36, 2.70s/it] 11%|█▏ | 1414/12313 [1:03:36<8:13:32, 2.72s/it] {'loss': 0.5193, 'grad_norm': 4.564186801844, 'learning_rate': 4.906318742366527e-06, 'epoch': 0.11} 11%|█▏ | 1414/12313 [1:03:36<8:13:32, 2.72s/it] 11%|█▏ | 1415/12313 [1:03:39<8:08:02, 2.69s/it] {'loss': 0.7675, 'grad_norm': 5.555381562592889, 'learning_rate': 4.906140322436849e-06, 'epoch': 0.11} 11%|█▏ | 1415/12313 [1:03:39<8:08:02, 2.69s/it] 12%|█▏ | 1416/12313 [1:03:42<8:04:06, 2.67s/it] {'loss': 0.4897, 'grad_norm': 8.412815182964685, 'learning_rate': 4.9059617360149936e-06, 'epoch': 0.12} 12%|█▏ | 1416/12313 [1:03:42<8:04:06, 2.67s/it] 12%|█▏ | 1417/12313 [1:03:44<7:54:13, 2.61s/it] {'loss': 0.8408, 'grad_norm': 5.9497894212896325, 'learning_rate': 4.905782983113321e-06, 'epoch': 0.12} 12%|█▏ | 1417/12313 [1:03:44<7:54:13, 2.61s/it] 12%|█▏ | 1418/12313 [1:03:47<8:02:42, 2.66s/it] {'loss': 0.6732, 'grad_norm': 8.78483062159324, 'learning_rate': 4.905604063744197e-06, 'epoch': 0.12} 12%|█▏ | 1418/12313 [1:03:47<8:02:42, 2.66s/it] 12%|█▏ | 1419/12313 [1:03:50<8:18:51, 2.75s/it] {'loss': 0.532, 'grad_norm': 9.970264715659578, 'learning_rate': 4.905424977920004e-06, 'epoch': 0.12} 12%|█▏ | 1419/12313 [1:03:50<8:18:51, 2.75s/it] 12%|█▏ | 1420/12313 [1:03:52<8:17:51, 2.74s/it] {'loss': 0.5852, 'grad_norm': 5.671192067715544, 'learning_rate': 4.9052457256531325e-06, 'epoch': 0.12} 12%|█▏ | 1420/12313 [1:03:52<8:17:51, 2.74s/it] 12%|█▏ | 1421/12313 [1:03:55<8:18:46, 2.75s/it] {'loss': 0.6486, 'grad_norm': 7.150271447974999, 'learning_rate': 4.905066306955986e-06, 'epoch': 0.12} 12%|█▏ | 1421/12313 [1:03:55<8:18:46, 2.75s/it] 12%|█▏ | 1422/12313 [1:03:58<8:17:15, 2.74s/it] {'loss': 0.6012, 'grad_norm': 6.958143840228272, 'learning_rate': 4.904886721840981e-06, 'epoch': 0.12} 12%|█▏ | 1422/12313 [1:03:58<8:17:15, 2.74s/it] 12%|█▏ | 1423/12313 [1:04:01<8:08:04, 2.69s/it] {'loss': 0.6706, 'grad_norm': 4.760774004700414, 'learning_rate': 4.904706970320542e-06, 'epoch': 0.12} 12%|█▏ | 1423/12313 [1:04:01<8:08:04, 2.69s/it] 12%|█▏ | 1424/12313 [1:04:03<7:58:52, 2.64s/it] {'loss': 0.633, 'grad_norm': 4.299376204301504, 'learning_rate': 4.904527052407107e-06, 'epoch': 0.12} 12%|█▏ | 1424/12313 [1:04:03<7:58:52, 2.64s/it] 12%|█▏ | 1425/12313 [1:04:05<7:47:46, 2.58s/it] {'loss': 0.8379, 'grad_norm': 4.095629515769702, 'learning_rate': 4.904346968113126e-06, 'epoch': 0.12} 12%|█▏ | 1425/12313 [1:04:05<7:47:46, 2.58s/it] 12%|█▏ | 1426/12313 [1:04:08<7:49:46, 2.59s/it] {'loss': 0.7148, 'grad_norm': 6.181276965379973, 'learning_rate': 4.904166717451059e-06, 'epoch': 0.12} 12%|█▏ | 1426/12313 [1:04:08<7:49:46, 2.59s/it] 12%|█▏ | 1427/12313 [1:04:11<7:58:03, 2.63s/it] {'loss': 0.6601, 'grad_norm': 3.8573979500957676, 'learning_rate': 4.90398630043338e-06, 'epoch': 0.12} 12%|█▏ | 1427/12313 [1:04:11<7:58:03, 2.63s/it] 12%|█▏ | 1428/12313 [1:04:14<8:02:29, 2.66s/it] {'loss': 0.8698, 'grad_norm': 5.572617436631759, 'learning_rate': 4.903805717072572e-06, 'epoch': 0.12} 12%|█▏ | 1428/12313 [1:04:14<8:02:29, 2.66s/it] 12%|█▏ | 1429/12313 [1:04:16<7:49:43, 2.59s/it] {'loss': 0.5288, 'grad_norm': 6.9707268541736775, 'learning_rate': 4.90362496738113e-06, 'epoch': 0.12} 12%|█▏ | 1429/12313 [1:04:16<7:49:43, 2.59s/it] 12%|█▏ | 1430/12313 [1:04:19<7:57:58, 2.64s/it] {'loss': 0.4828, 'grad_norm': 4.5564529770959945, 'learning_rate': 4.9034440513715605e-06, 'epoch': 0.12} 12%|█▏ | 1430/12313 [1:04:19<7:57:58, 2.64s/it] 12%|█▏ | 1431/12313 [1:04:21<7:58:50, 2.64s/it] {'loss': 0.7356, 'grad_norm': 4.109199531053192, 'learning_rate': 4.9032629690563835e-06, 'epoch': 0.12} 12%|█▏ | 1431/12313 [1:04:21<7:58:50, 2.64s/it] 12%|█▏ | 1432/12313 [1:04:24<7:56:44, 2.63s/it] {'loss': 0.5222, 'grad_norm': 4.437378918259062, 'learning_rate': 4.903081720448128e-06, 'epoch': 0.12} 12%|█▏ | 1432/12313 [1:04:24<7:56:44, 2.63s/it] 12%|█▏ | 1433/12313 [1:04:27<7:51:24, 2.60s/it] {'loss': 0.5954, 'grad_norm': 6.399358574915106, 'learning_rate': 4.902900305559336e-06, 'epoch': 0.12} 12%|█▏ | 1433/12313 [1:04:27<7:51:24, 2.60s/it] 12%|█▏ | 1434/12313 [1:04:29<7:56:26, 2.63s/it] {'loss': 0.5096, 'grad_norm': 4.343118949951745, 'learning_rate': 4.9027187244025594e-06, 'epoch': 0.12} 12%|█▏ | 1434/12313 [1:04:29<7:56:26, 2.63s/it] 12%|█▏ | 1435/12313 [1:04:32<8:06:48, 2.69s/it] {'loss': 0.5396, 'grad_norm': 4.3409440129123515, 'learning_rate': 4.902536976990364e-06, 'epoch': 0.12} 12%|█▏ | 1435/12313 [1:04:32<8:06:48, 2.69s/it] 12%|█▏ | 1436/12313 [1:04:35<7:57:30, 2.63s/it] {'loss': 0.5902, 'grad_norm': 6.602474988405446, 'learning_rate': 4.902355063335324e-06, 'epoch': 0.12} 12%|█▏ | 1436/12313 [1:04:35<7:57:30, 2.63s/it] 12%|█▏ | 1437/12313 [1:04:37<8:01:11, 2.65s/it] {'loss': 0.8389, 'grad_norm': 5.5514520331211425, 'learning_rate': 4.902172983450029e-06, 'epoch': 0.12} 12%|█▏ | 1437/12313 [1:04:37<8:01:11, 2.65s/it] 12%|█▏ | 1438/12313 [1:04:40<8:03:52, 2.67s/it] {'loss': 0.8458, 'grad_norm': 6.703176030649621, 'learning_rate': 4.901990737347076e-06, 'epoch': 0.12} 12%|█▏ | 1438/12313 [1:04:40<8:03:52, 2.67s/it] 12%|█▏ | 1439/12313 [1:04:42<7:56:31, 2.63s/it] {'loss': 0.6968, 'grad_norm': 4.454570008772595, 'learning_rate': 4.901808325039077e-06, 'epoch': 0.12} 12%|█▏ | 1439/12313 [1:04:42<7:56:31, 2.63s/it] 12%|█▏ | 1440/12313 [1:04:45<7:53:11, 2.61s/it] {'loss': 0.5741, 'grad_norm': 4.842565250482216, 'learning_rate': 4.901625746538653e-06, 'epoch': 0.12} 12%|█▏ | 1440/12313 [1:04:45<7:53:11, 2.61s/it] 12%|█▏ | 1441/12313 [1:04:48<7:52:40, 2.61s/it] {'loss': 0.5164, 'grad_norm': 8.75110385587914, 'learning_rate': 4.901443001858438e-06, 'epoch': 0.12} 12%|█▏ | 1441/12313 [1:04:48<7:52:40, 2.61s/it] 12%|█▏ | 1442/12313 [1:04:51<8:08:41, 2.70s/it] {'loss': 0.6103, 'grad_norm': 4.468121329238175, 'learning_rate': 4.901260091011076e-06, 'epoch': 0.12} 12%|█▏ | 1442/12313 [1:04:51<8:08:41, 2.70s/it] 12%|█▏ | 1443/12313 [1:04:53<8:04:38, 2.68s/it] {'loss': 0.4323, 'grad_norm': 7.4646000814738045, 'learning_rate': 4.901077014009225e-06, 'epoch': 0.12} 12%|█▏ | 1443/12313 [1:04:53<8:04:38, 2.68s/it] 12%|█▏ | 1444/12313 [1:04:56<8:09:32, 2.70s/it] {'loss': 0.5174, 'grad_norm': 3.3258631715598055, 'learning_rate': 4.900893770865552e-06, 'epoch': 0.12} 12%|█▏ | 1444/12313 [1:04:56<8:09:32, 2.70s/it] 12%|█▏ | 1445/12313 [1:04:59<8:08:11, 2.70s/it] {'loss': 0.6525, 'grad_norm': 8.792100552657674, 'learning_rate': 4.900710361592737e-06, 'epoch': 0.12} 12%|█▏ | 1445/12313 [1:04:59<8:08:11, 2.70s/it] 12%|█▏ | 1446/12313 [1:05:01<8:15:19, 2.73s/it] {'loss': 0.6754, 'grad_norm': 4.961731173202635, 'learning_rate': 4.9005267862034695e-06, 'epoch': 0.12} 12%|█▏ | 1446/12313 [1:05:01<8:15:19, 2.73s/it] 12%|█▏ | 1447/12313 [1:05:04<8:08:01, 2.69s/it] {'loss': 0.6968, 'grad_norm': 3.814651002868712, 'learning_rate': 4.900343044710453e-06, 'epoch': 0.12} 12%|█▏ | 1447/12313 [1:05:04<8:08:01, 2.69s/it] 12%|█▏ | 1448/12313 [1:05:07<7:57:05, 2.63s/it] {'loss': 0.5924, 'grad_norm': 4.911489896369547, 'learning_rate': 4.900159137126402e-06, 'epoch': 0.12} 12%|█▏ | 1448/12313 [1:05:07<7:57:05, 2.63s/it] 12%|█▏ | 1449/12313 [1:05:09<7:46:19, 2.58s/it] {'loss': 0.5496, 'grad_norm': 7.2633083972944945, 'learning_rate': 4.899975063464042e-06, 'epoch': 0.12} 12%|█▏ | 1449/12313 [1:05:09<7:46:19, 2.58s/it] 12%|█▏ | 1450/12313 [1:05:12<7:49:54, 2.60s/it] {'loss': 0.6644, 'grad_norm': 5.005595908510799, 'learning_rate': 4.899790823736108e-06, 'epoch': 0.12} 12%|█▏ | 1450/12313 [1:05:12<7:49:54, 2.60s/it] 12%|█▏ | 1451/12313 [1:05:14<7:43:24, 2.56s/it] {'loss': 0.8283, 'grad_norm': 4.702266439098655, 'learning_rate': 4.89960641795535e-06, 'epoch': 0.12} 12%|█▏ | 1451/12313 [1:05:14<7:43:24, 2.56s/it] 12%|█▏ | 1452/12313 [1:05:17<7:41:19, 2.55s/it] {'loss': 0.5909, 'grad_norm': 6.422288029967876, 'learning_rate': 4.899421846134529e-06, 'epoch': 0.12} 12%|█▏ | 1452/12313 [1:05:17<7:41:19, 2.55s/it] 12%|█▏ | 1453/12313 [1:05:19<7:35:03, 2.51s/it] {'loss': 0.499, 'grad_norm': 6.425496839266026, 'learning_rate': 4.899237108286414e-06, 'epoch': 0.12} 12%|█▏ | 1453/12313 [1:05:19<7:35:03, 2.51s/it] 12%|█▏ | 1454/12313 [1:05:22<7:38:36, 2.53s/it] {'loss': 0.5702, 'grad_norm': 3.4591295878059247, 'learning_rate': 4.8990522044237884e-06, 'epoch': 0.12} 12%|█▏ | 1454/12313 [1:05:22<7:38:36, 2.53s/it] 12%|█▏ | 1455/12313 [1:05:25<7:58:38, 2.64s/it] {'loss': 0.4269, 'grad_norm': 4.321892903241153, 'learning_rate': 4.898867134559448e-06, 'epoch': 0.12} 12%|█▏ | 1455/12313 [1:05:25<7:58:38, 2.64s/it] 12%|█▏ | 1456/12313 [1:05:27<7:52:25, 2.61s/it] {'loss': 0.5002, 'grad_norm': 6.352156069028349, 'learning_rate': 4.898681898706197e-06, 'epoch': 0.12} 12%|█▏ | 1456/12313 [1:05:27<7:52:25, 2.61s/it] 12%|█▏ | 1457/12313 [1:05:30<7:43:50, 2.56s/it] {'loss': 0.7199, 'grad_norm': 4.739299344449501, 'learning_rate': 4.898496496876854e-06, 'epoch': 0.12} 12%|█▏ | 1457/12313 [1:05:30<7:43:50, 2.56s/it] 12%|█▏ | 1458/12313 [1:05:32<7:39:27, 2.54s/it] {'loss': 0.6848, 'grad_norm': 5.475577877089898, 'learning_rate': 4.898310929084247e-06, 'epoch': 0.12} 12%|█▏ | 1458/12313 [1:05:32<7:39:27, 2.54s/it] 12%|█▏ | 1459/12313 [1:05:35<7:40:15, 2.54s/it] {'loss': 0.6698, 'grad_norm': 4.312752315143542, 'learning_rate': 4.898125195341217e-06, 'epoch': 0.12} 12%|█▏ | 1459/12313 [1:05:35<7:40:15, 2.54s/it] 12%|█▏ | 1460/12313 [1:05:37<7:51:51, 2.61s/it] {'loss': 0.5399, 'grad_norm': 6.101152761282113, 'learning_rate': 4.897939295660615e-06, 'epoch': 0.12} 12%|█▏ | 1460/12313 [1:05:37<7:51:51, 2.61s/it] 12%|█▏ | 1461/12313 [1:05:40<8:18:35, 2.76s/it] {'loss': 0.5391, 'grad_norm': 5.730964159782436, 'learning_rate': 4.897753230055304e-06, 'epoch': 0.12} 12%|█▏ | 1461/12313 [1:05:40<8:18:35, 2.76s/it] 12%|█▏ | 1462/12313 [1:05:43<8:06:09, 2.69s/it] {'loss': 0.6377, 'grad_norm': 4.602835724556902, 'learning_rate': 4.89756699853816e-06, 'epoch': 0.12} 12%|█▏ | 1462/12313 [1:05:43<8:06:09, 2.69s/it] 12%|█▏ | 1463/12313 [1:05:46<8:13:42, 2.73s/it] {'loss': 0.6413, 'grad_norm': 5.577323364531521, 'learning_rate': 4.8973806011220695e-06, 'epoch': 0.12} 12%|█▏ | 1463/12313 [1:05:46<8:13:42, 2.73s/it] 12%|█▏ | 1464/12313 [1:05:49<8:15:33, 2.74s/it] {'loss': 0.5454, 'grad_norm': 6.109634102169282, 'learning_rate': 4.897194037819928e-06, 'epoch': 0.12} 12%|█▏ | 1464/12313 [1:05:49<8:15:33, 2.74s/it] 12%|█▏ | 1465/12313 [1:05:51<8:12:16, 2.72s/it] {'loss': 0.5516, 'grad_norm': 4.473657871957378, 'learning_rate': 4.897007308644647e-06, 'epoch': 0.12} 12%|█▏ | 1465/12313 [1:05:51<8:12:16, 2.72s/it] 12%|█▏ | 1466/12313 [1:05:54<8:05:43, 2.69s/it] {'loss': 0.6565, 'grad_norm': 4.6381132522690995, 'learning_rate': 4.896820413609146e-06, 'epoch': 0.12} 12%|█▏ | 1466/12313 [1:05:54<8:05:43, 2.69s/it] 12%|█▏ | 1467/12313 [1:05:56<7:56:54, 2.64s/it] {'loss': 0.7063, 'grad_norm': 5.386818421013147, 'learning_rate': 4.896633352726357e-06, 'epoch': 0.12} 12%|█▏ | 1467/12313 [1:05:56<7:56:54, 2.64s/it] 12%|█▏ | 1468/12313 [1:05:59<8:10:10, 2.71s/it] {'loss': 0.7565, 'grad_norm': 3.1581048679010646, 'learning_rate': 4.896446126009224e-06, 'epoch': 0.12} 12%|█▏ | 1468/12313 [1:05:59<8:10:10, 2.71s/it] 12%|█▏ | 1469/12313 [1:06:02<8:05:19, 2.69s/it] {'loss': 0.5026, 'grad_norm': 6.401004722074878, 'learning_rate': 4.896258733470702e-06, 'epoch': 0.12} 12%|█▏ | 1469/12313 [1:06:02<8:05:19, 2.69s/it] 12%|█▏ | 1470/12313 [1:06:04<7:51:08, 2.61s/it] {'loss': 0.6255, 'grad_norm': 2.7519458408541797, 'learning_rate': 4.896071175123758e-06, 'epoch': 0.12} 12%|█▏ | 1470/12313 [1:06:04<7:51:08, 2.61s/it] 12%|█▏ | 1471/12313 [1:06:07<7:54:43, 2.63s/it] {'loss': 0.5504, 'grad_norm': 4.404610542747091, 'learning_rate': 4.8958834509813706e-06, 'epoch': 0.12} 12%|█▏ | 1471/12313 [1:06:07<7:54:43, 2.63s/it] 12%|█▏ | 1472/12313 [1:06:10<7:59:09, 2.65s/it] {'loss': 0.6899, 'grad_norm': 6.077687424149937, 'learning_rate': 4.8956955610565275e-06, 'epoch': 0.12} 12%|█▏ | 1472/12313 [1:06:10<7:59:09, 2.65s/it] 12%|█▏ | 1473/12313 [1:06:12<7:54:24, 2.63s/it] {'loss': 0.5743, 'grad_norm': 5.523332768156323, 'learning_rate': 4.895507505362231e-06, 'epoch': 0.12} 12%|█▏ | 1473/12313 [1:06:12<7:54:24, 2.63s/it] 12%|█▏ | 1474/12313 [1:06:15<7:52:22, 2.61s/it] {'loss': 0.4483, 'grad_norm': 5.840235758929547, 'learning_rate': 4.895319283911492e-06, 'epoch': 0.12} 12%|█▏ | 1474/12313 [1:06:15<7:52:22, 2.61s/it] 12%|█▏ | 1475/12313 [1:06:17<7:50:15, 2.60s/it] {'loss': 0.6511, 'grad_norm': 3.9853018352679177, 'learning_rate': 4.895130896717336e-06, 'epoch': 0.12} 12%|█▏ | 1475/12313 [1:06:17<7:50:15, 2.60s/it] 12%|█▏ | 1476/12313 [1:06:20<7:57:12, 2.64s/it] {'loss': 0.6048, 'grad_norm': 4.087084936629081, 'learning_rate': 4.894942343792799e-06, 'epoch': 0.12} 12%|█▏ | 1476/12313 [1:06:20<7:57:12, 2.64s/it] 12%|█▏ | 1477/12313 [1:06:23<7:44:48, 2.57s/it] {'loss': 0.5193, 'grad_norm': 4.9948060996575325, 'learning_rate': 4.894753625150927e-06, 'epoch': 0.12} 12%|█▏ | 1477/12313 [1:06:23<7:44:48, 2.57s/it] 12%|█▏ | 1478/12313 [1:06:25<7:42:37, 2.56s/it] {'loss': 0.4947, 'grad_norm': 5.148303769586792, 'learning_rate': 4.894564740804777e-06, 'epoch': 0.12} 12%|█▏ | 1478/12313 [1:06:25<7:42:37, 2.56s/it] 12%|█▏ | 1479/12313 [1:06:28<7:46:50, 2.59s/it] {'loss': 0.641, 'grad_norm': 4.926289780739354, 'learning_rate': 4.89437569076742e-06, 'epoch': 0.12} 12%|█▏ | 1479/12313 [1:06:28<7:46:50, 2.59s/it] 12%|█▏ | 1480/12313 [1:06:30<7:44:15, 2.57s/it] {'loss': 0.6427, 'grad_norm': 4.420640360809511, 'learning_rate': 4.894186475051938e-06, 'epoch': 0.12} 12%|█▏ | 1480/12313 [1:06:30<7:44:15, 2.57s/it] 12%|█▏ | 1481/12313 [1:06:33<8:07:05, 2.70s/it] {'loss': 0.6684, 'grad_norm': 4.552192514027042, 'learning_rate': 4.893997093671422e-06, 'epoch': 0.12} 12%|█▏ | 1481/12313 [1:06:33<8:07:05, 2.70s/it] 12%|█▏ | 1482/12313 [1:06:37<8:46:59, 2.92s/it] {'loss': 0.6778, 'grad_norm': 3.6702893426104306, 'learning_rate': 4.893807546638979e-06, 'epoch': 0.12} 12%|█▏ | 1482/12313 [1:06:37<8:46:59, 2.92s/it] 12%|█▏ | 1483/12313 [1:06:40<8:42:36, 2.90s/it] {'loss': 0.6479, 'grad_norm': 7.860150046179448, 'learning_rate': 4.893617833967721e-06, 'epoch': 0.12} 12%|█▏ | 1483/12313 [1:06:40<8:42:36, 2.90s/it] 12%|█▏ | 1484/12313 [1:06:42<8:31:45, 2.84s/it] {'loss': 0.5258, 'grad_norm': 3.6225218661319083, 'learning_rate': 4.893427955670778e-06, 'epoch': 0.12} 12%|█▏ | 1484/12313 [1:06:42<8:31:45, 2.84s/it] 12%|█▏ | 1485/12313 [1:06:45<8:11:26, 2.72s/it] {'loss': 0.5499, 'grad_norm': 7.286179190372548, 'learning_rate': 4.893237911761287e-06, 'epoch': 0.12} 12%|█▏ | 1485/12313 [1:06:45<8:11:26, 2.72s/it] 12%|█▏ | 1486/12313 [1:06:47<8:03:23, 2.68s/it] {'loss': 0.4595, 'grad_norm': 5.426351678323337, 'learning_rate': 4.893047702252399e-06, 'epoch': 0.12} 12%|█▏ | 1486/12313 [1:06:47<8:03:23, 2.68s/it] 12%|█▏ | 1487/12313 [1:06:50<7:49:30, 2.60s/it] {'loss': 0.7437, 'grad_norm': 4.641027529381159, 'learning_rate': 4.892857327157275e-06, 'epoch': 0.12} 12%|█▏ | 1487/12313 [1:06:50<7:49:30, 2.60s/it] 12%|█▏ | 1488/12313 [1:06:52<7:54:21, 2.63s/it] {'loss': 0.5187, 'grad_norm': 5.910005928653577, 'learning_rate': 4.892666786489087e-06, 'epoch': 0.12} 12%|█▏ | 1488/12313 [1:06:52<7:54:21, 2.63s/it] 12%|█▏ | 1489/12313 [1:06:55<7:55:51, 2.64s/it] {'loss': 0.6245, 'grad_norm': 4.433359476670774, 'learning_rate': 4.8924760802610215e-06, 'epoch': 0.12} 12%|█▏ | 1489/12313 [1:06:55<7:55:51, 2.64s/it] 12%|█▏ | 1490/12313 [1:06:58<7:54:02, 2.63s/it] {'loss': 0.6885, 'grad_norm': 16.649957190107745, 'learning_rate': 4.8922852084862734e-06, 'epoch': 0.12} 12%|█▏ | 1490/12313 [1:06:58<7:54:02, 2.63s/it] 12%|█▏ | 1491/12313 [1:07:00<7:54:21, 2.63s/it] {'loss': 0.4662, 'grad_norm': 5.101531362441216, 'learning_rate': 4.892094171178049e-06, 'epoch': 0.12} 12%|█▏ | 1491/12313 [1:07:00<7:54:21, 2.63s/it] 12%|█▏ | 1492/12313 [1:07:03<7:57:41, 2.65s/it] {'loss': 0.5948, 'grad_norm': 5.1657125457609725, 'learning_rate': 4.891902968349568e-06, 'epoch': 0.12} 12%|█▏ | 1492/12313 [1:07:03<7:57:41, 2.65s/it] 12%|█▏ | 1493/12313 [1:07:06<7:57:00, 2.65s/it] {'loss': 0.5409, 'grad_norm': 4.342894827579331, 'learning_rate': 4.8917116000140614e-06, 'epoch': 0.12} 12%|█▏ | 1493/12313 [1:07:06<7:57:00, 2.65s/it] 12%|█▏ | 1494/12313 [1:07:08<7:59:26, 2.66s/it] {'loss': 0.52, 'grad_norm': 3.8330951289725124, 'learning_rate': 4.8915200661847695e-06, 'epoch': 0.12} 12%|█▏ | 1494/12313 [1:07:08<7:59:26, 2.66s/it] 12%|█▏ | 1495/12313 [1:07:11<8:14:25, 2.74s/it] {'loss': 0.5504, 'grad_norm': 5.179925909022087, 'learning_rate': 4.891328366874946e-06, 'epoch': 0.12} 12%|█▏ | 1495/12313 [1:07:11<8:14:25, 2.74s/it] 12%|█▏ | 1496/12313 [1:07:14<8:03:34, 2.68s/it] {'loss': 0.6273, 'grad_norm': 5.330823987896905, 'learning_rate': 4.891136502097855e-06, 'epoch': 0.12} 12%|█▏ | 1496/12313 [1:07:14<8:03:34, 2.68s/it] 12%|█▏ | 1497/12313 [1:07:16<8:03:11, 2.68s/it] {'loss': 0.6236, 'grad_norm': 5.585945845508732, 'learning_rate': 4.890944471866774e-06, 'epoch': 0.12} 12%|█▏ | 1497/12313 [1:07:16<8:03:11, 2.68s/it] 12%|█▏ | 1498/12313 [1:07:19<7:48:25, 2.60s/it] {'loss': 0.7202, 'grad_norm': 3.83199043102917, 'learning_rate': 4.890752276194989e-06, 'epoch': 0.12} 12%|█▏ | 1498/12313 [1:07:19<7:48:25, 2.60s/it] 12%|█▏ | 1499/12313 [1:07:22<7:54:32, 2.63s/it] {'loss': 0.5213, 'grad_norm': 3.6411956112594677, 'learning_rate': 4.890559915095798e-06, 'epoch': 0.12} 12%|█▏ | 1499/12313 [1:07:22<7:54:32, 2.63s/it] 12%|█▏ | 1500/12313 [1:07:24<7:51:46, 2.62s/it] {'loss': 0.9004, 'grad_norm': 4.786252322922742, 'learning_rate': 4.890367388582514e-06, 'epoch': 0.12} 12%|█▏ | 1500/12313 [1:07:24<7:51:46, 2.62s/it] 12%|█▏ | 1501/12313 [1:07:27<8:00:19, 2.67s/it] {'loss': 0.6452, 'grad_norm': 4.979342968709159, 'learning_rate': 4.890174696668458e-06, 'epoch': 0.12} 12%|█▏ | 1501/12313 [1:07:27<8:00:19, 2.67s/it] 12%|█▏ | 1502/12313 [1:07:29<7:45:45, 2.58s/it] {'loss': 0.4785, 'grad_norm': 4.11868570254399, 'learning_rate': 4.889981839366962e-06, 'epoch': 0.12} 12%|█▏ | 1502/12313 [1:07:29<7:45:45, 2.58s/it] 12%|█▏ | 1503/12313 [1:07:32<7:37:08, 2.54s/it] {'loss': 0.6467, 'grad_norm': 4.908815740621781, 'learning_rate': 4.889788816691372e-06, 'epoch': 0.12} 12%|█▏ | 1503/12313 [1:07:32<7:37:08, 2.54s/it] 12%|█▏ | 1504/12313 [1:07:34<7:40:23, 2.56s/it] {'loss': 0.61, 'grad_norm': 6.3622155832910625, 'learning_rate': 4.889595628655044e-06, 'epoch': 0.12} 12%|█▏ | 1504/12313 [1:07:34<7:40:23, 2.56s/it] 12%|█▏ | 1505/12313 [1:07:37<8:05:30, 2.70s/it] {'loss': 0.7134, 'grad_norm': 3.6609655168066064, 'learning_rate': 4.8894022752713445e-06, 'epoch': 0.12} 12%|█▏ | 1505/12313 [1:07:37<8:05:30, 2.70s/it] 12%|█▏ | 1506/12313 [1:07:40<8:07:29, 2.71s/it] {'loss': 0.6646, 'grad_norm': 4.846573857319596, 'learning_rate': 4.8892087565536535e-06, 'epoch': 0.12} 12%|█▏ | 1506/12313 [1:07:40<8:07:29, 2.71s/it] 12%|█▏ | 1507/12313 [1:07:43<8:06:00, 2.70s/it] {'loss': 0.7547, 'grad_norm': 5.788899246359694, 'learning_rate': 4.889015072515361e-06, 'epoch': 0.12} 12%|█▏ | 1507/12313 [1:07:43<8:06:00, 2.70s/it] 12%|█▏ | 1508/12313 [1:07:45<8:04:24, 2.69s/it] {'loss': 0.5206, 'grad_norm': 4.583902093938661, 'learning_rate': 4.888821223169869e-06, 'epoch': 0.12} 12%|█▏ | 1508/12313 [1:07:45<8:04:24, 2.69s/it] 12%|█▏ | 1509/12313 [1:07:49<8:48:41, 2.94s/it] {'loss': 0.5928, 'grad_norm': 3.3647325384813915, 'learning_rate': 4.888627208530592e-06, 'epoch': 0.12} 12%|█▏ | 1509/12313 [1:07:49<8:48:41, 2.94s/it] 12%|█▏ | 1510/12313 [1:07:52<8:35:39, 2.86s/it] {'loss': 0.5397, 'grad_norm': 3.8539584897978654, 'learning_rate': 4.8884330286109535e-06, 'epoch': 0.12} 12%|█▏ | 1510/12313 [1:07:52<8:35:39, 2.86s/it] 12%|█▏ | 1511/12313 [1:07:54<8:14:16, 2.75s/it] {'loss': 0.6754, 'grad_norm': 15.266199610075871, 'learning_rate': 4.88823868342439e-06, 'epoch': 0.12} 12%|█▏ | 1511/12313 [1:07:54<8:14:16, 2.75s/it] 12%|█▏ | 1512/12313 [1:07:57<8:10:05, 2.72s/it] {'loss': 0.7183, 'grad_norm': 5.095675930247071, 'learning_rate': 4.888044172984349e-06, 'epoch': 0.12} 12%|█▏ | 1512/12313 [1:07:57<8:10:05, 2.72s/it] 12%|█▏ | 1513/12313 [1:07:59<8:00:56, 2.67s/it] {'loss': 0.7005, 'grad_norm': 5.589937550753531, 'learning_rate': 4.887849497304289e-06, 'epoch': 0.12} 12%|█▏ | 1513/12313 [1:07:59<8:00:56, 2.67s/it] 12%|█▏ | 1514/12313 [1:08:02<7:55:38, 2.64s/it] {'loss': 0.6316, 'grad_norm': 6.394170257036341, 'learning_rate': 4.8876546563976825e-06, 'epoch': 0.12} 12%|█▏ | 1514/12313 [1:08:02<7:55:38, 2.64s/it] 12%|█▏ | 1515/12313 [1:08:05<7:54:33, 2.64s/it] {'loss': 0.6913, 'grad_norm': 5.521671875850051, 'learning_rate': 4.88745965027801e-06, 'epoch': 0.12} 12%|█▏ | 1515/12313 [1:08:05<7:54:33, 2.64s/it] 12%|█▏ | 1516/12313 [1:08:07<7:57:19, 2.65s/it] {'loss': 0.5295, 'grad_norm': 4.969659762897969, 'learning_rate': 4.887264478958765e-06, 'epoch': 0.12} 12%|█▏ | 1516/12313 [1:08:07<7:57:19, 2.65s/it] 12%|█▏ | 1517/12313 [1:08:10<7:54:13, 2.64s/it] {'loss': 0.6538, 'grad_norm': 6.2253045417503845, 'learning_rate': 4.887069142453453e-06, 'epoch': 0.12} 12%|█▏ | 1517/12313 [1:08:10<7:54:13, 2.64s/it] 12%|█▏ | 1518/12313 [1:08:13<7:55:36, 2.64s/it] {'loss': 0.5829, 'grad_norm': 9.570082889106901, 'learning_rate': 4.886873640775588e-06, 'epoch': 0.12} 12%|█▏ | 1518/12313 [1:08:13<7:55:36, 2.64s/it] 12%|█▏ | 1519/12313 [1:08:15<8:00:23, 2.67s/it] {'loss': 0.5647, 'grad_norm': 5.609163036466005, 'learning_rate': 4.886677973938701e-06, 'epoch': 0.12} 12%|█▏ | 1519/12313 [1:08:15<8:00:23, 2.67s/it] 12%|█▏ | 1520/12313 [1:08:18<7:57:20, 2.65s/it] {'loss': 0.6357, 'grad_norm': 7.818406834616618, 'learning_rate': 4.886482141956329e-06, 'epoch': 0.12} 12%|█▏ | 1520/12313 [1:08:18<7:57:20, 2.65s/it] 12%|█▏ | 1521/12313 [1:08:21<8:24:02, 2.80s/it] {'loss': 0.5878, 'grad_norm': 3.1695857831672067, 'learning_rate': 4.8862861448420234e-06, 'epoch': 0.12} 12%|█▏ | 1521/12313 [1:08:21<8:24:02, 2.80s/it] 12%|█▏ | 1522/12313 [1:08:24<8:19:53, 2.78s/it] {'loss': 0.5771, 'grad_norm': 4.384307163664372, 'learning_rate': 4.886089982609345e-06, 'epoch': 0.12} 12%|█▏ | 1522/12313 [1:08:24<8:19:53, 2.78s/it] 12%|█▏ | 1523/12313 [1:08:26<8:12:09, 2.74s/it] {'loss': 0.5124, 'grad_norm': 4.469922202011447, 'learning_rate': 4.885893655271869e-06, 'epoch': 0.12} 12%|█▏ | 1523/12313 [1:08:26<8:12:09, 2.74s/it] 12%|█▏ | 1524/12313 [1:08:29<8:05:04, 2.70s/it] {'loss': 0.6882, 'grad_norm': 4.594952235845705, 'learning_rate': 4.885697162843179e-06, 'epoch': 0.12} 12%|█▏ | 1524/12313 [1:08:29<8:05:04, 2.70s/it] 12%|█▏ | 1525/12313 [1:08:31<7:55:01, 2.64s/it] {'loss': 0.5141, 'grad_norm': 11.952868311490398, 'learning_rate': 4.8855005053368715e-06, 'epoch': 0.12} 12%|█▏ | 1525/12313 [1:08:31<7:55:01, 2.64s/it] 12%|█▏ | 1526/12313 [1:08:34<7:51:06, 2.62s/it] {'loss': 0.6077, 'grad_norm': 5.107326813624944, 'learning_rate': 4.885303682766554e-06, 'epoch': 0.12} 12%|█▏ | 1526/12313 [1:08:34<7:51:06, 2.62s/it] 12%|█▏ | 1527/12313 [1:08:37<8:08:28, 2.72s/it] {'loss': 0.739, 'grad_norm': 3.90801909322924, 'learning_rate': 4.885106695145846e-06, 'epoch': 0.12} 12%|█▏ | 1527/12313 [1:08:37<8:08:28, 2.72s/it] 12%|█▏ | 1528/12313 [1:08:40<8:05:23, 2.70s/it] {'loss': 0.4853, 'grad_norm': 4.458856274877279, 'learning_rate': 4.884909542488377e-06, 'epoch': 0.12} 12%|█▏ | 1528/12313 [1:08:40<8:05:23, 2.70s/it] 12%|█▏ | 1529/12313 [1:08:42<7:57:30, 2.66s/it] {'loss': 0.5267, 'grad_norm': 7.488057909444421, 'learning_rate': 4.88471222480779e-06, 'epoch': 0.12} 12%|█▏ | 1529/12313 [1:08:42<7:57:30, 2.66s/it] 12%|█▏ | 1530/12313 [1:08:45<7:51:38, 2.62s/it] {'loss': 0.5767, 'grad_norm': 10.709236752099061, 'learning_rate': 4.8845147421177375e-06, 'epoch': 0.12} 12%|█▏ | 1530/12313 [1:08:45<7:51:38, 2.62s/it] 12%|█▏ | 1531/12313 [1:08:47<7:49:05, 2.61s/it] {'loss': 0.6582, 'grad_norm': 4.227280902938617, 'learning_rate': 4.8843170944318855e-06, 'epoch': 0.12} 12%|█▏ | 1531/12313 [1:08:47<7:49:05, 2.61s/it] 12%|█▏ | 1532/12313 [1:08:50<7:44:26, 2.58s/it] {'loss': 0.4994, 'grad_norm': 5.304472983158379, 'learning_rate': 4.88411928176391e-06, 'epoch': 0.12} 12%|█▏ | 1532/12313 [1:08:50<7:44:26, 2.58s/it] 12%|█▏ | 1533/12313 [1:08:53<7:50:45, 2.62s/it] {'loss': 0.454, 'grad_norm': 6.083933660892841, 'learning_rate': 4.8839213041274955e-06, 'epoch': 0.12} 12%|█▏ | 1533/12313 [1:08:53<7:50:45, 2.62s/it] 12%|█▏ | 1534/12313 [1:08:55<7:49:43, 2.61s/it] {'loss': 0.7561, 'grad_norm': 7.150062978858339, 'learning_rate': 4.8837231615363455e-06, 'epoch': 0.12} 12%|█▏ | 1534/12313 [1:08:55<7:49:43, 2.61s/it] 12%|█▏ | 1535/12313 [1:08:58<7:51:01, 2.62s/it] {'loss': 0.6597, 'grad_norm': 6.602096765289932, 'learning_rate': 4.883524854004168e-06, 'epoch': 0.12} 12%|█▏ | 1535/12313 [1:08:58<7:51:01, 2.62s/it] 12%|█▏ | 1536/12313 [1:09:00<7:41:17, 2.57s/it] {'loss': 0.5989, 'grad_norm': 4.853497195307292, 'learning_rate': 4.883326381544686e-06, 'epoch': 0.12} 12%|█▏ | 1536/12313 [1:09:00<7:41:17, 2.57s/it] 12%|█▏ | 1537/12313 [1:09:03<7:55:08, 2.65s/it] {'loss': 0.555, 'grad_norm': 5.57662629799743, 'learning_rate': 4.88312774417163e-06, 'epoch': 0.12} 12%|█▏ | 1537/12313 [1:09:03<7:55:08, 2.65s/it] 12%|█▏ | 1538/12313 [1:09:06<7:58:29, 2.66s/it] {'loss': 0.5814, 'grad_norm': 4.64783655989287, 'learning_rate': 4.882928941898748e-06, 'epoch': 0.12} 12%|█▏ | 1538/12313 [1:09:06<7:58:29, 2.66s/it] 12%|█▏ | 1539/12313 [1:09:08<7:45:11, 2.59s/it] {'loss': 0.5567, 'grad_norm': 8.657184018704008, 'learning_rate': 4.882729974739794e-06, 'epoch': 0.12} 12%|█▏ | 1539/12313 [1:09:08<7:45:11, 2.59s/it] 13%|█▎ | 1540/12313 [1:09:11<7:52:03, 2.63s/it] {'loss': 0.5428, 'grad_norm': 9.124561847833686, 'learning_rate': 4.882530842708537e-06, 'epoch': 0.13} 13%|█▎ | 1540/12313 [1:09:11<7:52:03, 2.63s/it] 13%|█▎ | 1541/12313 [1:09:14<7:57:51, 2.66s/it] {'loss': 0.5641, 'grad_norm': 5.93162727377288, 'learning_rate': 4.882331545818755e-06, 'epoch': 0.13} 13%|█▎ | 1541/12313 [1:09:14<7:57:51, 2.66s/it] 13%|█▎ | 1542/12313 [1:09:17<8:11:29, 2.74s/it] {'loss': 0.6554, 'grad_norm': 4.417596995333459, 'learning_rate': 4.882132084084238e-06, 'epoch': 0.13} 13%|█▎ | 1542/12313 [1:09:17<8:11:29, 2.74s/it] 13%|█▎ | 1543/12313 [1:09:19<8:09:54, 2.73s/it] {'loss': 0.8369, 'grad_norm': 5.238384789515402, 'learning_rate': 4.8819324575187875e-06, 'epoch': 0.13} 13%|█▎ | 1543/12313 [1:09:19<8:09:54, 2.73s/it] 13%|█▎ | 1544/12313 [1:09:22<8:03:18, 2.69s/it] {'loss': 0.7737, 'grad_norm': 5.490158000040072, 'learning_rate': 4.881732666136217e-06, 'epoch': 0.13} 13%|█▎ | 1544/12313 [1:09:22<8:03:18, 2.69s/it] 13%|█▎ | 1545/12313 [1:09:25<8:09:42, 2.73s/it] {'loss': 0.5733, 'grad_norm': 18.610112814184124, 'learning_rate': 4.881532709950352e-06, 'epoch': 0.13} 13%|█▎ | 1545/12313 [1:09:25<8:09:42, 2.73s/it] 13%|█▎ | 1546/12313 [1:09:27<8:12:53, 2.75s/it] {'loss': 0.518, 'grad_norm': 4.3346186230095585, 'learning_rate': 4.8813325889750275e-06, 'epoch': 0.13} 13%|█▎ | 1546/12313 [1:09:27<8:12:53, 2.75s/it] 13%|█▎ | 1547/12313 [1:09:31<8:44:56, 2.93s/it] {'loss': 0.4973, 'grad_norm': 5.394377171648292, 'learning_rate': 4.881132303224091e-06, 'epoch': 0.13} 13%|█▎ | 1547/12313 [1:09:31<8:44:56, 2.93s/it] 13%|█▎ | 1548/12313 [1:09:33<8:26:09, 2.82s/it] {'loss': 0.53, 'grad_norm': 6.281282652469185, 'learning_rate': 4.880931852711401e-06, 'epoch': 0.13} 13%|█▎ | 1548/12313 [1:09:33<8:26:09, 2.82s/it] 13%|█▎ | 1549/12313 [1:09:36<8:29:13, 2.84s/it] {'loss': 0.543, 'grad_norm': 5.063604963463662, 'learning_rate': 4.880731237450828e-06, 'epoch': 0.13} 13%|█▎ | 1549/12313 [1:09:36<8:29:13, 2.84s/it] 13%|█▎ | 1550/12313 [1:09:39<8:31:48, 2.85s/it] {'loss': 0.5307, 'grad_norm': 5.589066320656067, 'learning_rate': 4.880530457456252e-06, 'epoch': 0.13} 13%|█▎ | 1550/12313 [1:09:39<8:31:48, 2.85s/it] 13%|█▎ | 1551/12313 [1:09:42<8:42:12, 2.91s/it] {'loss': 0.6096, 'grad_norm': 5.313990606526498, 'learning_rate': 4.880329512741568e-06, 'epoch': 0.13} 13%|█▎ | 1551/12313 [1:09:42<8:42:12, 2.91s/it] 13%|█▎ | 1552/12313 [1:09:45<8:23:06, 2.81s/it] {'loss': 0.6667, 'grad_norm': 5.093580296509499, 'learning_rate': 4.88012840332068e-06, 'epoch': 0.13} 13%|█▎ | 1552/12313 [1:09:45<8:23:06, 2.81s/it] 13%|█▎ | 1553/12313 [1:09:47<8:01:32, 2.69s/it] {'loss': 0.669, 'grad_norm': 3.074951611991778, 'learning_rate': 4.879927129207502e-06, 'epoch': 0.13} 13%|█▎ | 1553/12313 [1:09:47<8:01:32, 2.69s/it] 13%|█▎ | 1554/12313 [1:09:50<7:50:54, 2.63s/it] {'loss': 0.7399, 'grad_norm': 9.209748336487731, 'learning_rate': 4.8797256904159625e-06, 'epoch': 0.13} 13%|█▎ | 1554/12313 [1:09:50<7:50:54, 2.63s/it] 13%|█▎ | 1555/12313 [1:09:52<7:50:59, 2.63s/it] {'loss': 0.6407, 'grad_norm': 5.6697335990100965, 'learning_rate': 4.87952408696e-06, 'epoch': 0.13} 13%|█▎ | 1555/12313 [1:09:52<7:50:59, 2.63s/it] 13%|█▎ | 1556/12313 [1:09:55<7:57:56, 2.67s/it] {'loss': 0.5582, 'grad_norm': 8.545213357048508, 'learning_rate': 4.879322318853564e-06, 'epoch': 0.13} 13%|█▎ | 1556/12313 [1:09:55<7:57:56, 2.67s/it] 13%|█▎ | 1557/12313 [1:09:58<8:00:02, 2.68s/it] {'loss': 0.6649, 'grad_norm': 7.569125385036642, 'learning_rate': 4.879120386110616e-06, 'epoch': 0.13} 13%|█▎ | 1557/12313 [1:09:58<8:00:02, 2.68s/it] 13%|█▎ | 1558/12313 [1:10:01<8:03:50, 2.70s/it] {'loss': 0.6249, 'grad_norm': 16.126868678635603, 'learning_rate': 4.878918288745128e-06, 'epoch': 0.13} 13%|█▎ | 1558/12313 [1:10:01<8:03:50, 2.70s/it] 13%|█▎ | 1559/12313 [1:10:03<8:05:17, 2.71s/it] {'loss': 0.5061, 'grad_norm': 5.161708427963795, 'learning_rate': 4.878716026771086e-06, 'epoch': 0.13} 13%|█▎ | 1559/12313 [1:10:03<8:05:17, 2.71s/it] 13%|█▎ | 1560/12313 [1:10:06<8:03:08, 2.70s/it] {'loss': 0.6149, 'grad_norm': 4.94640934937331, 'learning_rate': 4.878513600202483e-06, 'epoch': 0.13} 13%|█▎ | 1560/12313 [1:10:06<8:03:08, 2.70s/it] 13%|█▎ | 1561/12313 [1:10:10<8:55:30, 2.99s/it] {'loss': 0.5789, 'grad_norm': 5.628454265223888, 'learning_rate': 4.878311009053328e-06, 'epoch': 0.13} 13%|█▎ | 1561/12313 [1:10:10<8:55:30, 2.99s/it] 13%|█▎ | 1562/12313 [1:10:12<8:33:12, 2.86s/it] {'loss': 0.6344, 'grad_norm': 5.099342067120917, 'learning_rate': 4.878108253337638e-06, 'epoch': 0.13} 13%|█▎ | 1562/12313 [1:10:12<8:33:12, 2.86s/it] 13%|█▎ | 1563/12313 [1:10:15<8:17:21, 2.78s/it] {'loss': 0.5775, 'grad_norm': 5.258000831985147, 'learning_rate': 4.877905333069442e-06, 'epoch': 0.13} 13%|█▎ | 1563/12313 [1:10:15<8:17:21, 2.78s/it] 13%|█▎ | 1564/12313 [1:10:17<8:15:25, 2.77s/it] {'loss': 0.5334, 'grad_norm': 6.236383483926228, 'learning_rate': 4.877702248262782e-06, 'epoch': 0.13} 13%|█▎ | 1564/12313 [1:10:17<8:15:25, 2.77s/it] 13%|█▎ | 1565/12313 [1:10:20<8:12:43, 2.75s/it] {'loss': 0.7184, 'grad_norm': 4.048534369160581, 'learning_rate': 4.87749899893171e-06, 'epoch': 0.13} 13%|█▎ | 1565/12313 [1:10:20<8:12:43, 2.75s/it] 13%|█▎ | 1566/12313 [1:10:23<7:57:43, 2.67s/it] {'loss': 0.6671, 'grad_norm': 4.398427223579315, 'learning_rate': 4.8772955850902914e-06, 'epoch': 0.13} 13%|█▎ | 1566/12313 [1:10:23<7:57:43, 2.67s/it] 13%|█▎ | 1567/12313 [1:10:25<7:50:52, 2.63s/it] {'loss': 0.621, 'grad_norm': 5.637044630053773, 'learning_rate': 4.877092006752599e-06, 'epoch': 0.13} 13%|█▎ | 1567/12313 [1:10:25<7:50:52, 2.63s/it] 13%|█▎ | 1568/12313 [1:10:28<8:02:33, 2.69s/it] {'loss': 0.5828, 'grad_norm': 6.7178341766236125, 'learning_rate': 4.876888263932721e-06, 'epoch': 0.13} 13%|█▎ | 1568/12313 [1:10:28<8:02:33, 2.69s/it] 13%|█▎ | 1569/12313 [1:10:31<8:02:45, 2.70s/it] {'loss': 0.6779, 'grad_norm': 5.624169626907889, 'learning_rate': 4.876684356644754e-06, 'epoch': 0.13} 13%|█▎ | 1569/12313 [1:10:31<8:02:45, 2.70s/it] 13%|█▎ | 1570/12313 [1:10:33<7:52:38, 2.64s/it] {'loss': 0.4934, 'grad_norm': 4.476842415269882, 'learning_rate': 4.876480284902807e-06, 'epoch': 0.13} 13%|█▎ | 1570/12313 [1:10:33<7:52:38, 2.64s/it] 13%|█▎ | 1571/12313 [1:10:36<7:53:58, 2.65s/it] {'loss': 0.5301, 'grad_norm': 3.7336585247616885, 'learning_rate': 4.8762760487210035e-06, 'epoch': 0.13} 13%|█▎ | 1571/12313 [1:10:36<7:53:58, 2.65s/it] 13%|█▎ | 1572/12313 [1:10:38<7:42:23, 2.58s/it] {'loss': 0.6757, 'grad_norm': 8.61154812995492, 'learning_rate': 4.876071648113473e-06, 'epoch': 0.13} 13%|█▎ | 1572/12313 [1:10:38<7:42:23, 2.58s/it] 13%|█▎ | 1573/12313 [1:10:41<7:54:52, 2.65s/it] {'loss': 0.7971, 'grad_norm': 6.969888194514512, 'learning_rate': 4.875867083094359e-06, 'epoch': 0.13} 13%|█▎ | 1573/12313 [1:10:41<7:54:52, 2.65s/it] 13%|█▎ | 1574/12313 [1:10:44<7:47:04, 2.61s/it] {'loss': 0.4918, 'grad_norm': 7.7890484350762765, 'learning_rate': 4.875662353677818e-06, 'epoch': 0.13} 13%|█▎ | 1574/12313 [1:10:44<7:47:04, 2.61s/it] 13%|█▎ | 1575/12313 [1:10:46<7:47:42, 2.61s/it] {'loss': 0.6069, 'grad_norm': 4.545675942015323, 'learning_rate': 4.875457459878014e-06, 'epoch': 0.13} 13%|█▎ | 1575/12313 [1:10:46<7:47:42, 2.61s/it] 13%|█▎ | 1576/12313 [1:10:49<7:53:50, 2.65s/it] {'loss': 0.5996, 'grad_norm': 7.615402665241847, 'learning_rate': 4.875252401709126e-06, 'epoch': 0.13} 13%|█▎ | 1576/12313 [1:10:49<7:53:50, 2.65s/it] 13%|█▎ | 1577/12313 [1:10:52<7:57:56, 2.67s/it] {'loss': 0.6422, 'grad_norm': 5.198690570704765, 'learning_rate': 4.8750471791853436e-06, 'epoch': 0.13} 13%|█▎ | 1577/12313 [1:10:52<7:57:56, 2.67s/it] 13%|█▎ | 1578/12313 [1:10:55<8:03:17, 2.70s/it] {'loss': 0.4701, 'grad_norm': 5.875388513113994, 'learning_rate': 4.874841792320865e-06, 'epoch': 0.13} 13%|█▎ | 1578/12313 [1:10:55<8:03:17, 2.70s/it] 13%|█▎ | 1579/12313 [1:10:57<8:10:51, 2.74s/it] {'loss': 0.5815, 'grad_norm': 6.491950200516407, 'learning_rate': 4.874636241129904e-06, 'epoch': 0.13} 13%|█▎ | 1579/12313 [1:10:57<8:10:51, 2.74s/it] 13%|█▎ | 1580/12313 [1:11:00<8:08:01, 2.73s/it] {'loss': 0.6365, 'grad_norm': 4.293195229926921, 'learning_rate': 4.874430525626682e-06, 'epoch': 0.13} 13%|█▎ | 1580/12313 [1:11:00<8:08:01, 2.73s/it] 13%|█▎ | 1581/12313 [1:11:03<8:06:12, 2.72s/it] {'loss': 0.4979, 'grad_norm': 8.402150286140357, 'learning_rate': 4.874224645825435e-06, 'epoch': 0.13} 13%|█▎ | 1581/12313 [1:11:03<8:06:12, 2.72s/it] 13%|█▎ | 1582/12313 [1:11:05<7:57:40, 2.67s/it] {'loss': 0.504, 'grad_norm': 6.3159806141726, 'learning_rate': 4.874018601740407e-06, 'epoch': 0.13} 13%|█▎ | 1582/12313 [1:11:05<7:57:40, 2.67s/it] 13%|█▎ | 1583/12313 [1:11:08<8:03:28, 2.70s/it] {'loss': 0.5678, 'grad_norm': 5.865695583703645, 'learning_rate': 4.873812393385856e-06, 'epoch': 0.13} 13%|█▎ | 1583/12313 [1:11:08<8:03:28, 2.70s/it] 13%|█▎ | 1584/12313 [1:11:11<8:32:33, 2.87s/it] {'loss': 0.637, 'grad_norm': 4.259448849973863, 'learning_rate': 4.873606020776051e-06, 'epoch': 0.13} 13%|█▎ | 1584/12313 [1:11:11<8:32:33, 2.87s/it] 13%|█▎ | 1585/12313 [1:11:14<8:08:26, 2.73s/it] {'loss': 0.6518, 'grad_norm': 4.795031800681299, 'learning_rate': 4.873399483925272e-06, 'epoch': 0.13} 13%|█▎ | 1585/12313 [1:11:14<8:08:26, 2.73s/it] 13%|█▎ | 1586/12313 [1:11:16<8:07:43, 2.73s/it] {'loss': 0.5015, 'grad_norm': 6.982468505996596, 'learning_rate': 4.8731927828478085e-06, 'epoch': 0.13} 13%|█▎ | 1586/12313 [1:11:16<8:07:43, 2.73s/it] 13%|█▎ | 1587/12313 [1:11:19<8:02:28, 2.70s/it] {'loss': 0.6631, 'grad_norm': 4.194815922907224, 'learning_rate': 4.872985917557965e-06, 'epoch': 0.13} 13%|█▎ | 1587/12313 [1:11:19<8:02:28, 2.70s/it] 13%|█▎ | 1588/12313 [1:11:22<7:57:05, 2.67s/it] {'loss': 0.477, 'grad_norm': 6.979913270961702, 'learning_rate': 4.872778888070055e-06, 'epoch': 0.13} 13%|█▎ | 1588/12313 [1:11:22<7:57:05, 2.67s/it] 13%|█▎ | 1589/12313 [1:11:24<7:59:39, 2.68s/it] {'loss': 0.5583, 'grad_norm': 6.539545690632459, 'learning_rate': 4.872571694398403e-06, 'epoch': 0.13} 13%|█▎ | 1589/12313 [1:11:24<7:59:39, 2.68s/it] 13%|█▎ | 1590/12313 [1:11:27<7:53:48, 2.65s/it] {'loss': 0.6374, 'grad_norm': 7.207244340353426, 'learning_rate': 4.872364336557348e-06, 'epoch': 0.13} 13%|█▎ | 1590/12313 [1:11:27<7:53:48, 2.65s/it] 13%|█▎ | 1591/12313 [1:11:30<7:56:22, 2.67s/it] {'loss': 0.5826, 'grad_norm': 5.271822346774599, 'learning_rate': 4.8721568145612355e-06, 'epoch': 0.13} 13%|█▎ | 1591/12313 [1:11:30<7:56:22, 2.67s/it] 13%|█▎ | 1592/12313 [1:11:32<8:01:41, 2.70s/it] {'loss': 0.5468, 'grad_norm': 6.699431722524916, 'learning_rate': 4.8719491284244256e-06, 'epoch': 0.13} 13%|█▎ | 1592/12313 [1:11:32<8:01:41, 2.70s/it] 13%|█▎ | 1593/12313 [1:11:35<8:08:19, 2.73s/it] {'loss': 0.5465, 'grad_norm': 4.303099582925413, 'learning_rate': 4.871741278161291e-06, 'epoch': 0.13} 13%|█▎ | 1593/12313 [1:11:35<8:08:19, 2.73s/it] 13%|█▎ | 1594/12313 [1:11:38<8:17:51, 2.79s/it] {'loss': 0.5562, 'grad_norm': 3.318090123286297, 'learning_rate': 4.87153326378621e-06, 'epoch': 0.13} 13%|█▎ | 1594/12313 [1:11:38<8:17:51, 2.79s/it] 13%|█▎ | 1595/12313 [1:11:41<8:10:57, 2.75s/it] {'loss': 0.6022, 'grad_norm': 3.8336332872072507, 'learning_rate': 4.87132508531358e-06, 'epoch': 0.13} 13%|█▎ | 1595/12313 [1:11:41<8:10:57, 2.75s/it] 13%|█▎ | 1596/12313 [1:11:44<8:10:40, 2.75s/it] {'loss': 0.4905, 'grad_norm': 6.539483107346492, 'learning_rate': 4.871116742757803e-06, 'epoch': 0.13} 13%|█▎ | 1596/12313 [1:11:44<8:10:40, 2.75s/it] 13%|█▎ | 1597/12313 [1:11:46<8:04:47, 2.71s/it] {'loss': 0.6639, 'grad_norm': 3.2101449234700072, 'learning_rate': 4.870908236133297e-06, 'epoch': 0.13} 13%|█▎ | 1597/12313 [1:11:46<8:04:47, 2.71s/it] 13%|█▎ | 1598/12313 [1:11:49<8:04:45, 2.71s/it] {'loss': 0.5098, 'grad_norm': 11.166157350084525, 'learning_rate': 4.870699565454489e-06, 'epoch': 0.13} 13%|█▎ | 1598/12313 [1:11:49<8:04:45, 2.71s/it] 13%|█▎ | 1599/12313 [1:11:52<8:30:31, 2.86s/it] {'loss': 0.4853, 'grad_norm': 4.137968227782281, 'learning_rate': 4.870490730735818e-06, 'epoch': 0.13} 13%|█▎ | 1599/12313 [1:11:52<8:30:31, 2.86s/it] 13%|█▎ | 1600/12313 [1:11:55<8:34:15, 2.88s/it] {'loss': 0.5433, 'grad_norm': 3.555952898094085, 'learning_rate': 4.870281731991733e-06, 'epoch': 0.13} 13%|█▎ | 1600/12313 [1:11:55<8:34:15, 2.88s/it] 13%|█▎ | 1601/12313 [1:11:58<8:18:24, 2.79s/it] {'loss': 0.5833, 'grad_norm': 6.292559801998863, 'learning_rate': 4.870072569236697e-06, 'epoch': 0.13} 13%|█▎ | 1601/12313 [1:11:58<8:18:24, 2.79s/it] 13%|█▎ | 1602/12313 [1:12:01<8:23:07, 2.82s/it] {'loss': 0.5839, 'grad_norm': 4.245679851338393, 'learning_rate': 4.869863242485183e-06, 'epoch': 0.13} 13%|█▎ | 1602/12313 [1:12:01<8:23:07, 2.82s/it] 13%|█▎ | 1603/12313 [1:12:03<8:27:47, 2.84s/it] {'loss': 0.5284, 'grad_norm': 10.695591016769804, 'learning_rate': 4.8696537517516754e-06, 'epoch': 0.13} 13%|█▎ | 1603/12313 [1:12:03<8:27:47, 2.84s/it] 13%|█▎ | 1604/12313 [1:12:06<8:26:23, 2.84s/it] {'loss': 0.5927, 'grad_norm': 5.501068173317865, 'learning_rate': 4.869444097050668e-06, 'epoch': 0.13} 13%|█▎ | 1604/12313 [1:12:06<8:26:23, 2.84s/it] 13%|█▎ | 1605/12313 [1:12:09<8:24:14, 2.83s/it] {'loss': 0.5258, 'grad_norm': 5.717664746661781, 'learning_rate': 4.8692342783966706e-06, 'epoch': 0.13} 13%|█▎ | 1605/12313 [1:12:09<8:24:14, 2.83s/it] 13%|█▎ | 1606/12313 [1:12:12<8:15:56, 2.78s/it] {'loss': 0.6064, 'grad_norm': 5.029968109519692, 'learning_rate': 4.869024295804199e-06, 'epoch': 0.13} 13%|█▎ | 1606/12313 [1:12:12<8:15:56, 2.78s/it] 13%|█▎ | 1607/12313 [1:12:14<8:13:09, 2.76s/it] {'loss': 0.5764, 'grad_norm': 3.7931205356074575, 'learning_rate': 4.868814149287785e-06, 'epoch': 0.13} 13%|█▎ | 1607/12313 [1:12:14<8:13:09, 2.76s/it] 13%|█▎ | 1608/12313 [1:12:17<8:09:33, 2.74s/it] {'loss': 0.5991, 'grad_norm': 4.944616527342133, 'learning_rate': 4.868603838861969e-06, 'epoch': 0.13} 13%|█▎ | 1608/12313 [1:12:17<8:09:33, 2.74s/it] 13%|█▎ | 1609/12313 [1:12:20<7:59:00, 2.68s/it] {'loss': 0.6465, 'grad_norm': 5.178859852231613, 'learning_rate': 4.868393364541301e-06, 'epoch': 0.13} 13%|█▎ | 1609/12313 [1:12:20<7:59:00, 2.68s/it] 13%|█▎ | 1610/12313 [1:12:22<8:01:35, 2.70s/it] {'loss': 0.6649, 'grad_norm': 4.567090745940825, 'learning_rate': 4.868182726340349e-06, 'epoch': 0.13} 13%|█▎ | 1610/12313 [1:12:22<8:01:35, 2.70s/it] 13%|█▎ | 1611/12313 [1:12:25<8:07:50, 2.74s/it] {'loss': 0.5043, 'grad_norm': 5.802293496504625, 'learning_rate': 4.867971924273685e-06, 'epoch': 0.13} 13%|█▎ | 1611/12313 [1:12:25<8:07:50, 2.74s/it] 13%|█▎ | 1612/12313 [1:12:28<8:10:49, 2.75s/it] {'loss': 0.5295, 'grad_norm': 4.914755653217928, 'learning_rate': 4.8677609583558956e-06, 'epoch': 0.13} 13%|█▎ | 1612/12313 [1:12:28<8:10:49, 2.75s/it] 13%|█▎ | 1613/12313 [1:12:31<8:02:36, 2.71s/it] {'loss': 0.6164, 'grad_norm': 7.208070575519608, 'learning_rate': 4.867549828601579e-06, 'epoch': 0.13} 13%|█▎ | 1613/12313 [1:12:31<8:02:36, 2.71s/it] 13%|█▎ | 1614/12313 [1:12:33<8:08:10, 2.74s/it] {'loss': 0.5208, 'grad_norm': 7.77772857949289, 'learning_rate': 4.8673385350253454e-06, 'epoch': 0.13} 13%|█▎ | 1614/12313 [1:12:33<8:08:10, 2.74s/it] 13%|█▎ | 1615/12313 [1:12:36<8:18:07, 2.79s/it] {'loss': 0.6235, 'grad_norm': 8.264639243565528, 'learning_rate': 4.867127077641813e-06, 'epoch': 0.13} 13%|█▎ | 1615/12313 [1:12:36<8:18:07, 2.79s/it] 13%|█▎ | 1616/12313 [1:12:39<8:22:03, 2.82s/it] {'loss': 0.57, 'grad_norm': 5.062219960669989, 'learning_rate': 4.866915456465615e-06, 'epoch': 0.13} 13%|█▎ | 1616/12313 [1:12:39<8:22:03, 2.82s/it] 13%|█▎ | 1617/12313 [1:12:42<8:17:57, 2.79s/it] {'loss': 0.5729, 'grad_norm': 16.478111147319943, 'learning_rate': 4.866703671511395e-06, 'epoch': 0.13} 13%|█▎ | 1617/12313 [1:12:42<8:17:57, 2.79s/it] 13%|█▎ | 1618/12313 [1:12:45<8:15:42, 2.78s/it] {'loss': 0.5722, 'grad_norm': 4.4225512989681475, 'learning_rate': 4.8664917227938056e-06, 'epoch': 0.13} 13%|█▎ | 1618/12313 [1:12:45<8:15:42, 2.78s/it] 13%|█▎ | 1619/12313 [1:12:47<8:01:14, 2.70s/it] {'loss': 0.7651, 'grad_norm': 5.041669688436434, 'learning_rate': 4.866279610327514e-06, 'epoch': 0.13} 13%|█▎ | 1619/12313 [1:12:47<8:01:14, 2.70s/it] 13%|█▎ | 1620/12313 [1:12:50<7:54:06, 2.66s/it] {'loss': 0.6381, 'grad_norm': 5.8825838835020505, 'learning_rate': 4.8660673341271966e-06, 'epoch': 0.13} 13%|█▎ | 1620/12313 [1:12:50<7:54:06, 2.66s/it] 13%|█▎ | 1621/12313 [1:12:52<7:49:07, 2.63s/it] {'loss': 0.6949, 'grad_norm': 4.034188407645285, 'learning_rate': 4.865854894207541e-06, 'epoch': 0.13} 13%|█▎ | 1621/12313 [1:12:52<7:49:07, 2.63s/it] 13%|█▎ | 1622/12313 [1:12:55<7:55:33, 2.67s/it] {'loss': 0.5172, 'grad_norm': 3.6403419129784864, 'learning_rate': 4.865642290583249e-06, 'epoch': 0.13} 13%|█▎ | 1622/12313 [1:12:55<7:55:33, 2.67s/it] 13%|█▎ | 1623/12313 [1:12:58<7:47:25, 2.62s/it] {'loss': 0.6615, 'grad_norm': 4.836824286973765, 'learning_rate': 4.86542952326903e-06, 'epoch': 0.13} 13%|█▎ | 1623/12313 [1:12:58<7:47:25, 2.62s/it] 13%|█▎ | 1624/12313 [1:13:01<8:00:55, 2.70s/it] {'loss': 0.6131, 'grad_norm': 4.388993168301412, 'learning_rate': 4.865216592279607e-06, 'epoch': 0.13} 13%|█▎ | 1624/12313 [1:13:01<8:00:55, 2.70s/it] 13%|█▎ | 1625/12313 [1:13:04<8:18:29, 2.80s/it] {'loss': 0.6583, 'grad_norm': 6.591216319051455, 'learning_rate': 4.865003497629713e-06, 'epoch': 0.13} 13%|█▎ | 1625/12313 [1:13:04<8:18:29, 2.80s/it] 13%|█▎ | 1626/12313 [1:13:06<8:08:43, 2.74s/it] {'loss': 0.6026, 'grad_norm': 4.389020157372549, 'learning_rate': 4.8647902393340955e-06, 'epoch': 0.13} 13%|█▎ | 1626/12313 [1:13:06<8:08:43, 2.74s/it] 13%|█▎ | 1627/12313 [1:13:09<7:57:57, 2.68s/it] {'loss': 0.6323, 'grad_norm': 4.250808042920454, 'learning_rate': 4.864576817407507e-06, 'epoch': 0.13} 13%|█▎ | 1627/12313 [1:13:09<7:57:57, 2.68s/it] 13%|█▎ | 1628/12313 [1:13:11<7:57:12, 2.68s/it] {'loss': 0.6207, 'grad_norm': 6.881832060317491, 'learning_rate': 4.864363231864717e-06, 'epoch': 0.13} 13%|█▎ | 1628/12313 [1:13:11<7:57:12, 2.68s/it] 13%|█▎ | 1629/12313 [1:13:14<7:50:42, 2.64s/it] {'loss': 0.5453, 'grad_norm': 10.412011238660916, 'learning_rate': 4.864149482720505e-06, 'epoch': 0.13} 13%|█▎ | 1629/12313 [1:13:14<7:50:42, 2.64s/it] 13%|█▎ | 1630/12313 [1:13:17<7:47:27, 2.63s/it] {'loss': 0.5582, 'grad_norm': 5.774868373588398, 'learning_rate': 4.863935569989662e-06, 'epoch': 0.13} 13%|█▎ | 1630/12313 [1:13:17<7:47:27, 2.63s/it] 13%|█▎ | 1631/12313 [1:13:19<7:54:22, 2.66s/it] {'loss': 0.6431, 'grad_norm': 5.245799631843284, 'learning_rate': 4.863721493686987e-06, 'epoch': 0.13} 13%|█▎ | 1631/12313 [1:13:19<7:54:22, 2.66s/it] 13%|█▎ | 1632/12313 [1:13:22<8:07:40, 2.74s/it] {'loss': 0.5681, 'grad_norm': 4.177337851592638, 'learning_rate': 4.8635072538272954e-06, 'epoch': 0.13} 13%|█▎ | 1632/12313 [1:13:22<8:07:40, 2.74s/it] 13%|█▎ | 1633/12313 [1:13:25<7:53:21, 2.66s/it] {'loss': 0.6479, 'grad_norm': 3.517896942856102, 'learning_rate': 4.863292850425409e-06, 'epoch': 0.13} 13%|█▎ | 1633/12313 [1:13:25<7:53:21, 2.66s/it] 13%|█▎ | 1634/12313 [1:13:28<8:08:09, 2.74s/it] {'loss': 0.6847, 'grad_norm': 5.350168873979354, 'learning_rate': 4.863078283496167e-06, 'epoch': 0.13} 13%|█▎ | 1634/12313 [1:13:28<8:08:09, 2.74s/it] 13%|█▎ | 1635/12313 [1:13:30<8:14:13, 2.78s/it] {'loss': 0.6294, 'grad_norm': 6.986629354106734, 'learning_rate': 4.862863553054413e-06, 'epoch': 0.13} 13%|█▎ | 1635/12313 [1:13:30<8:14:13, 2.78s/it] 13%|█▎ | 1636/12313 [1:13:33<8:11:20, 2.76s/it] {'loss': 0.7014, 'grad_norm': 4.997326232828624, 'learning_rate': 4.862648659115007e-06, 'epoch': 0.13} 13%|█▎ | 1636/12313 [1:13:33<8:11:20, 2.76s/it] 13%|█▎ | 1637/12313 [1:13:36<8:20:03, 2.81s/it] {'loss': 0.5626, 'grad_norm': 4.2346207052457485, 'learning_rate': 4.8624336016928175e-06, 'epoch': 0.13} 13%|█▎ | 1637/12313 [1:13:36<8:20:03, 2.81s/it] 13%|█▎ | 1638/12313 [1:13:39<8:12:30, 2.77s/it] {'loss': 0.6618, 'grad_norm': 4.331783623394118, 'learning_rate': 4.8622183808027255e-06, 'epoch': 0.13} 13%|█▎ | 1638/12313 [1:13:39<8:12:30, 2.77s/it] 13%|█▎ | 1639/12313 [1:13:41<7:56:14, 2.68s/it] {'loss': 0.6353, 'grad_norm': 5.008484418976238, 'learning_rate': 4.8620029964596234e-06, 'epoch': 0.13} 13%|█▎ | 1639/12313 [1:13:41<7:56:14, 2.68s/it] 13%|█▎ | 1640/12313 [1:13:44<8:00:48, 2.70s/it] {'loss': 0.486, 'grad_norm': 6.50139784970913, 'learning_rate': 4.861787448678416e-06, 'epoch': 0.13} 13%|█▎ | 1640/12313 [1:13:44<8:00:48, 2.70s/it] 13%|█▎ | 1641/12313 [1:13:47<7:54:43, 2.67s/it] {'loss': 0.5904, 'grad_norm': 6.369833370069241, 'learning_rate': 4.861571737474015e-06, 'epoch': 0.13} 13%|█▎ | 1641/12313 [1:13:47<7:54:43, 2.67s/it] 13%|█▎ | 1642/12313 [1:13:49<7:53:22, 2.66s/it] {'loss': 0.8424, 'grad_norm': 3.8537585558162317, 'learning_rate': 4.8613558628613494e-06, 'epoch': 0.13} 13%|█▎ | 1642/12313 [1:13:49<7:53:22, 2.66s/it] 13%|█▎ | 1643/12313 [1:13:52<7:47:44, 2.63s/it] {'loss': 0.5671, 'grad_norm': 6.904987154180493, 'learning_rate': 4.8611398248553554e-06, 'epoch': 0.13} 13%|█▎ | 1643/12313 [1:13:52<7:47:44, 2.63s/it] 13%|█▎ | 1644/12313 [1:13:55<8:18:24, 2.80s/it] {'loss': 0.4704, 'grad_norm': 3.1433510533347415, 'learning_rate': 4.860923623470981e-06, 'epoch': 0.13} 13%|█▎ | 1644/12313 [1:13:55<8:18:24, 2.80s/it] 13%|█▎ | 1645/12313 [1:13:58<8:12:46, 2.77s/it] {'loss': 0.5313, 'grad_norm': 5.418253266261615, 'learning_rate': 4.860707258723187e-06, 'epoch': 0.13} 13%|█▎ | 1645/12313 [1:13:58<8:12:46, 2.77s/it] 13%|█▎ | 1646/12313 [1:14:00<8:11:10, 2.76s/it] {'loss': 0.5742, 'grad_norm': 5.2359535500265615, 'learning_rate': 4.860490730626945e-06, 'epoch': 0.13} 13%|█▎ | 1646/12313 [1:14:00<8:11:10, 2.76s/it] 13%|█▎ | 1647/12313 [1:14:03<8:07:09, 2.74s/it] {'loss': 0.5654, 'grad_norm': 11.436418505055565, 'learning_rate': 4.860274039197237e-06, 'epoch': 0.13} 13%|█▎ | 1647/12313 [1:14:03<8:07:09, 2.74s/it] 13%|█▎ | 1648/12313 [1:14:06<7:57:11, 2.68s/it] {'loss': 0.4724, 'grad_norm': 6.262589762855426, 'learning_rate': 4.860057184449057e-06, 'epoch': 0.13} 13%|█▎ | 1648/12313 [1:14:06<7:57:11, 2.68s/it] 13%|█▎ | 1649/12313 [1:14:08<7:51:55, 2.66s/it] {'loss': 0.5537, 'grad_norm': 5.518995358402323, 'learning_rate': 4.85984016639741e-06, 'epoch': 0.13} 13%|█▎ | 1649/12313 [1:14:08<7:51:55, 2.66s/it] 13%|█▎ | 1650/12313 [1:14:11<7:55:30, 2.68s/it] {'loss': 0.5638, 'grad_norm': 5.088469039450566, 'learning_rate': 4.859622985057313e-06, 'epoch': 0.13} 13%|█▎ | 1650/12313 [1:14:11<7:55:30, 2.68s/it] 13%|█▎ | 1651/12313 [1:14:14<8:04:50, 2.73s/it] {'loss': 0.6239, 'grad_norm': 4.614369100716233, 'learning_rate': 4.859405640443793e-06, 'epoch': 0.13} 13%|█▎ | 1651/12313 [1:14:14<8:04:50, 2.73s/it] 13%|█▎ | 1652/12313 [1:14:16<7:57:07, 2.69s/it] {'loss': 0.5406, 'grad_norm': 6.877855199561925, 'learning_rate': 4.85918813257189e-06, 'epoch': 0.13} 13%|█▎ | 1652/12313 [1:14:16<7:57:07, 2.69s/it] 13%|█▎ | 1653/12313 [1:14:19<7:50:32, 2.65s/it] {'loss': 0.5815, 'grad_norm': 6.336440018776646, 'learning_rate': 4.858970461456655e-06, 'epoch': 0.13} 13%|█▎ | 1653/12313 [1:14:19<7:50:32, 2.65s/it] 13%|█▎ | 1654/12313 [1:14:22<7:45:55, 2.62s/it] {'loss': 0.6946, 'grad_norm': 6.858031875975209, 'learning_rate': 4.858752627113148e-06, 'epoch': 0.13} 13%|█▎ | 1654/12313 [1:14:22<7:45:55, 2.62s/it] 13%|█▎ | 1655/12313 [1:14:24<7:59:31, 2.70s/it] {'loss': 0.7919, 'grad_norm': 3.403378239632881, 'learning_rate': 4.8585346295564425e-06, 'epoch': 0.13} 13%|█▎ | 1655/12313 [1:14:24<7:59:31, 2.70s/it] 13%|█▎ | 1656/12313 [1:14:27<7:52:51, 2.66s/it] {'loss': 0.4289, 'grad_norm': 6.487088297683663, 'learning_rate': 4.858316468801624e-06, 'epoch': 0.13} 13%|█▎ | 1656/12313 [1:14:27<7:52:51, 2.66s/it] 13%|█▎ | 1657/12313 [1:14:30<7:44:39, 2.62s/it] {'loss': 0.5673, 'grad_norm': 9.756045801760866, 'learning_rate': 4.858098144863786e-06, 'epoch': 0.13} 13%|█▎ | 1657/12313 [1:14:30<7:44:39, 2.62s/it] 13%|█▎ | 1658/12313 [1:14:32<7:51:18, 2.65s/it] {'loss': 0.6929, 'grad_norm': 4.769994844405226, 'learning_rate': 4.857879657758037e-06, 'epoch': 0.13} 13%|█▎ | 1658/12313 [1:14:32<7:51:18, 2.65s/it] 13%|█▎ | 1659/12313 [1:14:35<8:12:43, 2.77s/it] {'loss': 0.6402, 'grad_norm': 4.567963041913676, 'learning_rate': 4.857661007499493e-06, 'epoch': 0.13} 13%|█▎ | 1659/12313 [1:14:35<8:12:43, 2.77s/it] 13%|█▎ | 1660/12313 [1:14:38<7:56:20, 2.68s/it] {'loss': 0.6195, 'grad_norm': 7.539959586550248, 'learning_rate': 4.857442194103287e-06, 'epoch': 0.13} 13%|█▎ | 1660/12313 [1:14:38<7:56:20, 2.68s/it] 13%|█▎ | 1661/12313 [1:14:40<7:54:44, 2.67s/it] {'loss': 0.5644, 'grad_norm': 3.320972169580362, 'learning_rate': 4.8572232175845574e-06, 'epoch': 0.13} 13%|█▎ | 1661/12313 [1:14:40<7:54:44, 2.67s/it] 13%|█▎ | 1662/12313 [1:14:43<7:57:51, 2.69s/it] {'loss': 0.6244, 'grad_norm': 6.466693615210596, 'learning_rate': 4.857004077958456e-06, 'epoch': 0.13} 13%|█▎ | 1662/12313 [1:14:43<7:57:51, 2.69s/it] 14%|█▎ | 1663/12313 [1:14:46<7:56:07, 2.68s/it] {'loss': 0.5756, 'grad_norm': 5.503111817650875, 'learning_rate': 4.8567847752401476e-06, 'epoch': 0.14} 14%|█▎ | 1663/12313 [1:14:46<7:56:07, 2.68s/it] 14%|█▎ | 1664/12313 [1:14:49<7:57:25, 2.69s/it] {'loss': 0.6822, 'grad_norm': 5.858104142956886, 'learning_rate': 4.8565653094448054e-06, 'epoch': 0.14} 14%|█▎ | 1664/12313 [1:14:49<7:57:25, 2.69s/it] 14%|█▎ | 1665/12313 [1:14:51<7:58:31, 2.70s/it] {'loss': 0.7815, 'grad_norm': 4.782981817213062, 'learning_rate': 4.856345680587616e-06, 'epoch': 0.14} 14%|█▎ | 1665/12313 [1:14:51<7:58:31, 2.70s/it] 14%|█▎ | 1666/12313 [1:14:54<8:10:03, 2.76s/it] {'loss': 0.4739, 'grad_norm': 15.4570240471019, 'learning_rate': 4.856125888683775e-06, 'epoch': 0.14} 14%|█▎ | 1666/12313 [1:14:54<8:10:03, 2.76s/it] 14%|█▎ | 1667/12313 [1:14:57<8:06:26, 2.74s/it] {'loss': 0.5324, 'grad_norm': 4.282142809688187, 'learning_rate': 4.855905933748492e-06, 'epoch': 0.14} 14%|█▎ | 1667/12313 [1:14:57<8:06:26, 2.74s/it] 14%|█▎ | 1668/12313 [1:14:59<7:56:52, 2.69s/it] {'loss': 0.5745, 'grad_norm': 5.095680068440331, 'learning_rate': 4.855685815796989e-06, 'epoch': 0.14} 14%|█▎ | 1668/12313 [1:14:59<7:56:52, 2.69s/it] 14%|█▎ | 1669/12313 [1:15:02<7:48:26, 2.64s/it] {'loss': 0.7417, 'grad_norm': 4.141977804675802, 'learning_rate': 4.855465534844494e-06, 'epoch': 0.14} 14%|█▎ | 1669/12313 [1:15:02<7:48:26, 2.64s/it] 14%|█▎ | 1670/12313 [1:15:05<7:45:12, 2.62s/it] {'loss': 0.5206, 'grad_norm': 7.6969307870140895, 'learning_rate': 4.8552450909062494e-06, 'epoch': 0.14} 14%|█▎ | 1670/12313 [1:15:05<7:45:12, 2.62s/it] 14%|█▎ | 1671/12313 [1:15:07<7:50:20, 2.65s/it] {'loss': 0.655, 'grad_norm': 6.450650416903025, 'learning_rate': 4.855024483997509e-06, 'epoch': 0.14} 14%|█▎ | 1671/12313 [1:15:07<7:50:20, 2.65s/it] 14%|█▎ | 1672/12313 [1:15:10<7:48:38, 2.64s/it] {'loss': 0.5323, 'grad_norm': 4.810253355009561, 'learning_rate': 4.85480371413354e-06, 'epoch': 0.14} 14%|█▎ | 1672/12313 [1:15:10<7:48:38, 2.64s/it] 14%|█▎ | 1673/12313 [1:15:12<7:35:53, 2.57s/it] {'loss': 0.5753, 'grad_norm': 3.3253261110937116, 'learning_rate': 4.8545827813296154e-06, 'epoch': 0.14} 14%|█▎ | 1673/12313 [1:15:12<7:35:53, 2.57s/it] 14%|█▎ | 1674/12313 [1:15:15<7:45:11, 2.62s/it] {'loss': 0.7926, 'grad_norm': 3.8466882509432496, 'learning_rate': 4.8543616856010235e-06, 'epoch': 0.14} 14%|█▎ | 1674/12313 [1:15:15<7:45:11, 2.62s/it] 14%|█▎ | 1675/12313 [1:15:18<7:45:59, 2.63s/it] {'loss': 0.5541, 'grad_norm': 4.8978637912258, 'learning_rate': 4.854140426963064e-06, 'epoch': 0.14} 14%|█▎ | 1675/12313 [1:15:18<7:45:59, 2.63s/it] 14%|█▎ | 1676/12313 [1:15:20<7:45:53, 2.63s/it] {'loss': 0.4727, 'grad_norm': 5.229352786538515, 'learning_rate': 4.853919005431046e-06, 'epoch': 0.14} 14%|█▎ | 1676/12313 [1:15:20<7:45:53, 2.63s/it] 14%|█▎ | 1677/12313 [1:15:23<7:38:10, 2.58s/it] {'loss': 0.7667, 'grad_norm': 2.9378106286602095, 'learning_rate': 4.85369742102029e-06, 'epoch': 0.14} 14%|█▎ | 1677/12313 [1:15:23<7:38:10, 2.58s/it] 14%|█▎ | 1678/12313 [1:15:26<7:47:37, 2.64s/it] {'loss': 0.7053, 'grad_norm': 9.698335327128627, 'learning_rate': 4.8534756737461305e-06, 'epoch': 0.14} 14%|█▎ | 1678/12313 [1:15:26<7:47:37, 2.64s/it] 14%|█▎ | 1679/12313 [1:15:28<7:45:15, 2.63s/it] {'loss': 0.7551, 'grad_norm': 4.258566186707566, 'learning_rate': 4.853253763623909e-06, 'epoch': 0.14} 14%|█▎ | 1679/12313 [1:15:28<7:45:15, 2.63s/it] 14%|█▎ | 1680/12313 [1:15:31<8:11:19, 2.77s/it] {'loss': 0.6012, 'grad_norm': 7.742141939352821, 'learning_rate': 4.853031690668982e-06, 'epoch': 0.14} 14%|█▎ | 1680/12313 [1:15:31<8:11:19, 2.77s/it] 14%|█▎ | 1681/12313 [1:15:34<8:06:59, 2.75s/it] {'loss': 0.8128, 'grad_norm': 7.324297389610232, 'learning_rate': 4.852809454896715e-06, 'epoch': 0.14} 14%|█▎ | 1681/12313 [1:15:34<8:06:59, 2.75s/it] 14%|█▎ | 1682/12313 [1:15:36<7:55:04, 2.68s/it] {'loss': 0.4649, 'grad_norm': 6.79191875565859, 'learning_rate': 4.852587056322485e-06, 'epoch': 0.14} 14%|█▎ | 1682/12313 [1:15:36<7:55:04, 2.68s/it] 14%|█▎ | 1683/12313 [1:15:39<8:05:00, 2.74s/it] {'loss': 0.5741, 'grad_norm': 4.346976001633204, 'learning_rate': 4.852364494961684e-06, 'epoch': 0.14} 14%|█▎ | 1683/12313 [1:15:39<8:05:00, 2.74s/it] 14%|█▎ | 1684/12313 [1:15:42<8:02:21, 2.72s/it] {'loss': 0.5591, 'grad_norm': 5.599252491473373, 'learning_rate': 4.852141770829707e-06, 'epoch': 0.14} 14%|█▎ | 1684/12313 [1:15:42<8:02:21, 2.72s/it] 14%|█▎ | 1685/12313 [1:15:45<8:16:44, 2.80s/it] {'loss': 0.5636, 'grad_norm': 7.657958990969802, 'learning_rate': 4.851918883941969e-06, 'epoch': 0.14} 14%|█▎ | 1685/12313 [1:15:45<8:16:44, 2.80s/it] 14%|█▎ | 1686/12313 [1:15:48<8:11:01, 2.77s/it] {'loss': 0.7366, 'grad_norm': 3.4450921267982126, 'learning_rate': 4.851695834313892e-06, 'epoch': 0.14} 14%|█▎ | 1686/12313 [1:15:48<8:11:01, 2.77s/it] 14%|█▎ | 1687/12313 [1:15:50<7:48:13, 2.64s/it] {'loss': 0.5808, 'grad_norm': 5.273690756756951, 'learning_rate': 4.851472621960909e-06, 'epoch': 0.14} 14%|█▎ | 1687/12313 [1:15:50<7:48:13, 2.64s/it] 14%|█▎ | 1688/12313 [1:15:53<7:48:33, 2.65s/it] {'loss': 0.6648, 'grad_norm': 4.952280541512243, 'learning_rate': 4.851249246898465e-06, 'epoch': 0.14} 14%|█▎ | 1688/12313 [1:15:53<7:48:33, 2.65s/it] 14%|█▎ | 1689/12313 [1:15:55<7:37:09, 2.58s/it] {'loss': 0.6268, 'grad_norm': 7.707004969920144, 'learning_rate': 4.851025709142018e-06, 'epoch': 0.14} 14%|█▎ | 1689/12313 [1:15:55<7:37:09, 2.58s/it] 14%|█▎ | 1690/12313 [1:15:58<8:12:23, 2.78s/it] {'loss': 0.4935, 'grad_norm': 4.430949277747921, 'learning_rate': 4.850802008707034e-06, 'epoch': 0.14} 14%|█▎ | 1690/12313 [1:15:58<8:12:23, 2.78s/it] 14%|█▎ | 1691/12313 [1:16:01<8:09:54, 2.77s/it] {'loss': 0.7303, 'grad_norm': 4.0862917668071175, 'learning_rate': 4.8505781456089926e-06, 'epoch': 0.14} 14%|█▎ | 1691/12313 [1:16:01<8:09:54, 2.77s/it] 14%|█▎ | 1692/12313 [1:16:04<8:05:37, 2.74s/it] {'loss': 0.573, 'grad_norm': 8.15399495830509, 'learning_rate': 4.850354119863384e-06, 'epoch': 0.14} 14%|█▎ | 1692/12313 [1:16:04<8:05:37, 2.74s/it] 14%|█▎ | 1693/12313 [1:16:07<8:33:57, 2.90s/it] {'loss': 0.6696, 'grad_norm': 4.4854550799183, 'learning_rate': 4.850129931485709e-06, 'epoch': 0.14} 14%|█▎ | 1693/12313 [1:16:07<8:33:57, 2.90s/it] 14%|█▍ | 1694/12313 [1:16:10<8:23:28, 2.84s/it] {'loss': 0.506, 'grad_norm': 6.062165358428213, 'learning_rate': 4.849905580491481e-06, 'epoch': 0.14} 14%|█▍ | 1694/12313 [1:16:10<8:23:28, 2.84s/it] 14%|█▍ | 1695/12313 [1:16:12<8:01:09, 2.72s/it] {'loss': 0.5298, 'grad_norm': 4.30707595006544, 'learning_rate': 4.849681066896224e-06, 'epoch': 0.14} 14%|█▍ | 1695/12313 [1:16:12<8:01:09, 2.72s/it] 14%|█▍ | 1696/12313 [1:16:15<7:54:12, 2.68s/it] {'loss': 0.5178, 'grad_norm': 4.645280994473025, 'learning_rate': 4.849456390715471e-06, 'epoch': 0.14} 14%|█▍ | 1696/12313 [1:16:15<7:54:12, 2.68s/it] 14%|█▍ | 1697/12313 [1:16:18<8:09:32, 2.77s/it] {'loss': 0.5725, 'grad_norm': 4.408267231508574, 'learning_rate': 4.849231551964771e-06, 'epoch': 0.14} 14%|█▍ | 1697/12313 [1:16:18<8:09:32, 2.77s/it] 14%|█▍ | 1698/12313 [1:16:20<8:02:49, 2.73s/it] {'loss': 0.6328, 'grad_norm': 9.45812967426135, 'learning_rate': 4.849006550659681e-06, 'epoch': 0.14} 14%|█▍ | 1698/12313 [1:16:20<8:02:49, 2.73s/it] 14%|█▍ | 1699/12313 [1:16:23<8:00:02, 2.71s/it] {'loss': 0.5886, 'grad_norm': 4.491014201207106, 'learning_rate': 4.84878138681577e-06, 'epoch': 0.14} 14%|█▍ | 1699/12313 [1:16:23<8:00:02, 2.71s/it] 14%|█▍ | 1700/12313 [1:16:26<7:50:35, 2.66s/it] {'loss': 0.5413, 'grad_norm': 6.727338627008777, 'learning_rate': 4.848556060448617e-06, 'epoch': 0.14} 14%|█▍ | 1700/12313 [1:16:26<7:50:35, 2.66s/it] 14%|█▍ | 1701/12313 [1:16:28<7:36:41, 2.58s/it] {'loss': 0.582, 'grad_norm': 6.965832541226805, 'learning_rate': 4.848330571573815e-06, 'epoch': 0.14} 14%|█▍ | 1701/12313 [1:16:28<7:36:41, 2.58s/it] 14%|█▍ | 1702/12313 [1:16:31<7:39:58, 2.60s/it] {'loss': 0.5413, 'grad_norm': 5.263090461409852, 'learning_rate': 4.848104920206964e-06, 'epoch': 0.14} 14%|█▍ | 1702/12313 [1:16:31<7:39:58, 2.60s/it] 14%|█▍ | 1703/12313 [1:16:34<7:53:05, 2.68s/it] {'loss': 0.5164, 'grad_norm': 4.916463781199789, 'learning_rate': 4.847879106363681e-06, 'epoch': 0.14} 14%|█▍ | 1703/12313 [1:16:34<7:53:05, 2.68s/it] 14%|█▍ | 1704/12313 [1:16:36<7:37:33, 2.59s/it] {'loss': 0.5987, 'grad_norm': 4.020484866288476, 'learning_rate': 4.847653130059591e-06, 'epoch': 0.14} 14%|█▍ | 1704/12313 [1:16:36<7:37:33, 2.59s/it] 14%|█▍ | 1705/12313 [1:16:39<7:39:52, 2.60s/it] {'loss': 0.5019, 'grad_norm': 7.925299595269813, 'learning_rate': 4.847426991310327e-06, 'epoch': 0.14} 14%|█▍ | 1705/12313 [1:16:39<7:39:52, 2.60s/it] 14%|█▍ | 1706/12313 [1:16:41<7:49:18, 2.65s/it] {'loss': 0.6024, 'grad_norm': 4.765085041505615, 'learning_rate': 4.84720069013154e-06, 'epoch': 0.14} 14%|█▍ | 1706/12313 [1:16:41<7:49:18, 2.65s/it] 14%|█▍ | 1707/12313 [1:16:44<7:44:36, 2.63s/it] {'loss': 0.5936, 'grad_norm': 3.5389423477868704, 'learning_rate': 4.846974226538887e-06, 'epoch': 0.14} 14%|█▍ | 1707/12313 [1:16:44<7:44:36, 2.63s/it] 14%|█▍ | 1708/12313 [1:16:46<7:32:00, 2.56s/it] {'loss': 0.6592, 'grad_norm': 4.437272587336693, 'learning_rate': 4.846747600548039e-06, 'epoch': 0.14} 14%|█▍ | 1708/12313 [1:16:46<7:32:00, 2.56s/it] 14%|█▍ | 1709/12313 [1:16:49<7:41:55, 2.61s/it] {'loss': 0.7827, 'grad_norm': 6.923436463835136, 'learning_rate': 4.8465208121746775e-06, 'epoch': 0.14} 14%|█▍ | 1709/12313 [1:16:49<7:41:55, 2.61s/it] 14%|█▍ | 1710/12313 [1:16:52<7:42:41, 2.62s/it] {'loss': 0.6561, 'grad_norm': 4.075078776605313, 'learning_rate': 4.846293861434494e-06, 'epoch': 0.14} 14%|█▍ | 1710/12313 [1:16:52<7:42:41, 2.62s/it] 14%|█▍ | 1711/12313 [1:16:55<8:00:12, 2.72s/it] {'loss': 0.6394, 'grad_norm': 4.965543277677793, 'learning_rate': 4.846066748343193e-06, 'epoch': 0.14} 14%|█▍ | 1711/12313 [1:16:55<8:00:12, 2.72s/it] 14%|█▍ | 1712/12313 [1:16:57<8:00:09, 2.72s/it] {'loss': 0.5429, 'grad_norm': 4.952847589325617, 'learning_rate': 4.84583947291649e-06, 'epoch': 0.14} 14%|█▍ | 1712/12313 [1:16:57<8:00:09, 2.72s/it] 14%|█▍ | 1713/12313 [1:17:00<7:58:59, 2.71s/it] {'loss': 0.6042, 'grad_norm': 6.231762044145288, 'learning_rate': 4.84561203517011e-06, 'epoch': 0.14} 14%|█▍ | 1713/12313 [1:17:00<7:58:59, 2.71s/it] 14%|█▍ | 1714/12313 [1:17:03<7:54:34, 2.69s/it] {'loss': 0.6262, 'grad_norm': 7.90195822251666, 'learning_rate': 4.8453844351197906e-06, 'epoch': 0.14} 14%|█▍ | 1714/12313 [1:17:03<7:54:34, 2.69s/it] 14%|█▍ | 1715/12313 [1:17:05<7:53:38, 2.68s/it] {'loss': 0.4433, 'grad_norm': 5.163740585708867, 'learning_rate': 4.845156672781283e-06, 'epoch': 0.14} 14%|█▍ | 1715/12313 [1:17:05<7:53:38, 2.68s/it] 14%|█▍ | 1716/12313 [1:17:08<7:50:16, 2.66s/it] {'loss': 0.5625, 'grad_norm': 6.772106592052472, 'learning_rate': 4.844928748170343e-06, 'epoch': 0.14} 14%|█▍ | 1716/12313 [1:17:08<7:50:16, 2.66s/it] 14%|█▍ | 1717/12313 [1:17:11<8:08:28, 2.77s/it] {'loss': 0.5634, 'grad_norm': 7.967404236571118, 'learning_rate': 4.844700661302745e-06, 'epoch': 0.14} 14%|█▍ | 1717/12313 [1:17:11<8:08:28, 2.77s/it] 14%|█▍ | 1718/12313 [1:17:13<7:53:53, 2.68s/it] {'loss': 0.4938, 'grad_norm': 4.952354575898061, 'learning_rate': 4.844472412194271e-06, 'epoch': 0.14} 14%|█▍ | 1718/12313 [1:17:13<7:53:53, 2.68s/it] 14%|█▍ | 1719/12313 [1:17:16<8:06:23, 2.75s/it] {'loss': 0.514, 'grad_norm': 4.882182783614095, 'learning_rate': 4.844244000860713e-06, 'epoch': 0.14} 14%|█▍ | 1719/12313 [1:17:16<8:06:23, 2.75s/it] 14%|█▍ | 1720/12313 [1:17:19<7:49:57, 2.66s/it] {'loss': 0.5567, 'grad_norm': 4.517018593698755, 'learning_rate': 4.844015427317878e-06, 'epoch': 0.14} 14%|█▍ | 1720/12313 [1:17:19<7:49:57, 2.66s/it] 14%|█▍ | 1721/12313 [1:17:21<7:50:10, 2.66s/it] {'loss': 0.6774, 'grad_norm': 5.353241935249897, 'learning_rate': 4.84378669158158e-06, 'epoch': 0.14} 14%|█▍ | 1721/12313 [1:17:22<7:50:10, 2.66s/it] 14%|█▍ | 1722/12313 [1:17:24<7:47:51, 2.65s/it] {'loss': 0.5591, 'grad_norm': 4.71686990816346, 'learning_rate': 4.843557793667647e-06, 'epoch': 0.14} 14%|█▍ | 1722/12313 [1:17:24<7:47:51, 2.65s/it] 14%|█▍ | 1723/12313 [1:17:27<7:46:37, 2.64s/it] {'loss': 0.711, 'grad_norm': 4.136924741202888, 'learning_rate': 4.843328733591918e-06, 'epoch': 0.14} 14%|█▍ | 1723/12313 [1:17:27<7:46:37, 2.64s/it] 14%|█▍ | 1724/12313 [1:17:29<7:43:46, 2.63s/it] {'loss': 0.6455, 'grad_norm': 5.450901778292648, 'learning_rate': 4.843099511370243e-06, 'epoch': 0.14} 14%|█▍ | 1724/12313 [1:17:29<7:43:46, 2.63s/it] 14%|█▍ | 1725/12313 [1:17:32<7:41:22, 2.61s/it] {'loss': 0.5929, 'grad_norm': 5.05521301047581, 'learning_rate': 4.842870127018482e-06, 'epoch': 0.14} 14%|█▍ | 1725/12313 [1:17:32<7:41:22, 2.61s/it] 14%|█▍ | 1726/12313 [1:17:35<7:43:55, 2.63s/it] {'loss': 0.5624, 'grad_norm': 5.278566981624996, 'learning_rate': 4.842640580552508e-06, 'epoch': 0.14} 14%|█▍ | 1726/12313 [1:17:35<7:43:55, 2.63s/it] 14%|█▍ | 1727/12313 [1:17:37<7:46:34, 2.64s/it] {'loss': 0.4295, 'grad_norm': 3.187653564030086, 'learning_rate': 4.842410871988204e-06, 'epoch': 0.14} 14%|█▍ | 1727/12313 [1:17:37<7:46:34, 2.64s/it] 14%|█▍ | 1728/12313 [1:17:40<7:35:56, 2.58s/it] {'loss': 0.6622, 'grad_norm': 4.96925386948935, 'learning_rate': 4.842181001341465e-06, 'epoch': 0.14} 14%|█▍ | 1728/12313 [1:17:40<7:35:56, 2.58s/it] 14%|█▍ | 1729/12313 [1:17:43<7:54:28, 2.69s/it] {'loss': 0.6541, 'grad_norm': 4.620002867331911, 'learning_rate': 4.8419509686281965e-06, 'epoch': 0.14} 14%|█▍ | 1729/12313 [1:17:43<7:54:28, 2.69s/it] 14%|█▍ | 1730/12313 [1:17:45<7:56:43, 2.70s/it] {'loss': 0.5794, 'grad_norm': 4.351427980680414, 'learning_rate': 4.841720773864315e-06, 'epoch': 0.14} 14%|█▍ | 1730/12313 [1:17:45<7:56:43, 2.70s/it] 14%|█▍ | 1731/12313 [1:17:49<8:23:26, 2.85s/it] {'loss': 0.6507, 'grad_norm': 6.055475530739616, 'learning_rate': 4.84149041706575e-06, 'epoch': 0.14} 14%|█▍ | 1731/12313 [1:17:49<8:23:26, 2.85s/it] 14%|█▍ | 1732/12313 [1:17:51<8:15:07, 2.81s/it] {'loss': 0.5077, 'grad_norm': 4.39137607633036, 'learning_rate': 4.8412598982484396e-06, 'epoch': 0.14} 14%|█▍ | 1732/12313 [1:17:51<8:15:07, 2.81s/it] 14%|█▍ | 1733/12313 [1:17:54<8:19:48, 2.83s/it] {'loss': 0.5427, 'grad_norm': 5.347025072258548, 'learning_rate': 4.8410292174283356e-06, 'epoch': 0.14} 14%|█▍ | 1733/12313 [1:17:54<8:19:48, 2.83s/it] 14%|█▍ | 1734/12313 [1:17:57<8:12:54, 2.80s/it] {'loss': 0.6222, 'grad_norm': 5.982794959612033, 'learning_rate': 4.840798374621399e-06, 'epoch': 0.14} 14%|█▍ | 1734/12313 [1:17:57<8:12:54, 2.80s/it] 14%|█▍ | 1735/12313 [1:18:00<8:05:38, 2.75s/it] {'loss': 0.5887, 'grad_norm': 4.786923948878069, 'learning_rate': 4.8405673698436046e-06, 'epoch': 0.14} 14%|█▍ | 1735/12313 [1:18:00<8:05:38, 2.75s/it] 14%|█▍ | 1736/12313 [1:18:02<7:57:50, 2.71s/it] {'loss': 0.8337, 'grad_norm': 4.34136443925787, 'learning_rate': 4.840336203110934e-06, 'epoch': 0.14} 14%|█▍ | 1736/12313 [1:18:02<7:57:50, 2.71s/it] 14%|█▍ | 1737/12313 [1:18:05<8:08:11, 2.77s/it] {'loss': 0.4646, 'grad_norm': 6.40416263003894, 'learning_rate': 4.840104874439385e-06, 'epoch': 0.14} 14%|█▍ | 1737/12313 [1:18:05<8:08:11, 2.77s/it] 14%|█▍ | 1738/12313 [1:18:08<8:05:45, 2.76s/it] {'loss': 0.6091, 'grad_norm': 4.993914992295412, 'learning_rate': 4.839873383844964e-06, 'epoch': 0.14} 14%|█▍ | 1738/12313 [1:18:08<8:05:45, 2.76s/it] 14%|█▍ | 1739/12313 [1:18:10<7:58:38, 2.72s/it] {'loss': 0.5803, 'grad_norm': 4.072501067349767, 'learning_rate': 4.839641731343688e-06, 'epoch': 0.14} 14%|█▍ | 1739/12313 [1:18:10<7:58:38, 2.72s/it] 14%|█▍ | 1740/12313 [1:18:13<8:05:21, 2.75s/it] {'loss': 0.6042, 'grad_norm': 4.916864007591204, 'learning_rate': 4.839409916951586e-06, 'epoch': 0.14} 14%|█▍ | 1740/12313 [1:18:13<8:05:21, 2.75s/it] 14%|█▍ | 1741/12313 [1:18:16<8:01:07, 2.73s/it] {'loss': 0.5263, 'grad_norm': 7.127573823763706, 'learning_rate': 4.839177940684699e-06, 'epoch': 0.14} 14%|█▍ | 1741/12313 [1:18:16<8:01:07, 2.73s/it] 14%|█▍ | 1742/12313 [1:18:18<7:41:33, 2.62s/it] {'loss': 0.6171, 'grad_norm': 4.634413084773449, 'learning_rate': 4.838945802559079e-06, 'epoch': 0.14} 14%|█▍ | 1742/12313 [1:18:18<7:41:33, 2.62s/it] 14%|█▍ | 1743/12313 [1:18:21<7:49:20, 2.66s/it] {'loss': 0.5441, 'grad_norm': 4.578421538253424, 'learning_rate': 4.8387135025907885e-06, 'epoch': 0.14} 14%|█▍ | 1743/12313 [1:18:21<7:49:20, 2.66s/it] 14%|█▍ | 1744/12313 [1:18:24<8:02:52, 2.74s/it] {'loss': 0.5399, 'grad_norm': 4.586524758850179, 'learning_rate': 4.8384810407959e-06, 'epoch': 0.14} 14%|█▍ | 1744/12313 [1:18:24<8:02:52, 2.74s/it] 14%|█▍ | 1745/12313 [1:18:27<8:10:01, 2.78s/it] {'loss': 0.4686, 'grad_norm': 5.918670545030279, 'learning_rate': 4.8382484171905006e-06, 'epoch': 0.14} 14%|█▍ | 1745/12313 [1:18:27<8:10:01, 2.78s/it] 14%|█▍ | 1746/12313 [1:18:30<8:02:58, 2.74s/it] {'loss': 0.6141, 'grad_norm': 5.704207192778562, 'learning_rate': 4.8380156317906855e-06, 'epoch': 0.14} 14%|█▍ | 1746/12313 [1:18:30<8:02:58, 2.74s/it] 14%|█▍ | 1747/12313 [1:18:32<7:54:23, 2.69s/it] {'loss': 0.5936, 'grad_norm': 4.921937978250558, 'learning_rate': 4.837782684612562e-06, 'epoch': 0.14} 14%|█▍ | 1747/12313 [1:18:32<7:54:23, 2.69s/it] 14%|█▍ | 1748/12313 [1:18:35<7:48:23, 2.66s/it] {'loss': 0.5511, 'grad_norm': 8.366604146884352, 'learning_rate': 4.83754957567225e-06, 'epoch': 0.14} 14%|█▍ | 1748/12313 [1:18:35<7:48:23, 2.66s/it] 14%|█▍ | 1749/12313 [1:18:38<8:17:35, 2.83s/it] {'loss': 0.6865, 'grad_norm': 3.91804902399355, 'learning_rate': 4.837316304985879e-06, 'epoch': 0.14} 14%|█▍ | 1749/12313 [1:18:38<8:17:35, 2.83s/it] 14%|█▍ | 1750/12313 [1:18:41<8:15:34, 2.81s/it] {'loss': 0.5439, 'grad_norm': 7.888393422464691, 'learning_rate': 4.8370828725695885e-06, 'epoch': 0.14} 14%|█▍ | 1750/12313 [1:18:41<8:15:34, 2.81s/it] 14%|█▍ | 1751/12313 [1:18:43<8:01:44, 2.74s/it] {'loss': 0.6774, 'grad_norm': 4.847280666549644, 'learning_rate': 4.836849278439532e-06, 'epoch': 0.14} 14%|█▍ | 1751/12313 [1:18:43<8:01:44, 2.74s/it] 14%|█▍ | 1752/12313 [1:18:46<8:29:01, 2.89s/it] {'loss': 0.5419, 'grad_norm': 5.563719824565103, 'learning_rate': 4.836615522611874e-06, 'epoch': 0.14} 14%|█▍ | 1752/12313 [1:18:46<8:29:01, 2.89s/it] 14%|█▍ | 1753/12313 [1:18:50<8:38:10, 2.94s/it] {'loss': 0.5761, 'grad_norm': 9.244694159995396, 'learning_rate': 4.8363816051027875e-06, 'epoch': 0.14} 14%|█▍ | 1753/12313 [1:18:50<8:38:10, 2.94s/it] 14%|█▍ | 1754/12313 [1:18:52<8:37:36, 2.94s/it] {'loss': 0.6103, 'grad_norm': 4.37956947048255, 'learning_rate': 4.8361475259284604e-06, 'epoch': 0.14} 14%|█▍ | 1754/12313 [1:18:52<8:37:36, 2.94s/it] 14%|█▍ | 1755/12313 [1:18:55<8:16:37, 2.82s/it] {'loss': 0.4938, 'grad_norm': 11.159158566456547, 'learning_rate': 4.8359132851050875e-06, 'epoch': 0.14} 14%|█▍ | 1755/12313 [1:18:55<8:16:37, 2.82s/it] 14%|█▍ | 1756/12313 [1:18:58<8:13:48, 2.81s/it] {'loss': 0.7047, 'grad_norm': 5.121940842643738, 'learning_rate': 4.835678882648878e-06, 'epoch': 0.14} 14%|█▍ | 1756/12313 [1:18:58<8:13:48, 2.81s/it] 14%|█▍ | 1757/12313 [1:19:00<8:08:22, 2.78s/it] {'loss': 0.5441, 'grad_norm': 4.634689837382904, 'learning_rate': 4.8354443185760505e-06, 'epoch': 0.14} 14%|█▍ | 1757/12313 [1:19:00<8:08:22, 2.78s/it] 14%|█▍ | 1758/12313 [1:19:04<8:27:42, 2.89s/it] {'loss': 0.6011, 'grad_norm': 4.696643113982221, 'learning_rate': 4.835209592902837e-06, 'epoch': 0.14} 14%|█▍ | 1758/12313 [1:19:04<8:27:42, 2.89s/it] 14%|█▍ | 1759/12313 [1:19:06<8:12:35, 2.80s/it] {'loss': 0.5516, 'grad_norm': 33.651614779388595, 'learning_rate': 4.834974705645478e-06, 'epoch': 0.14} 14%|█▍ | 1759/12313 [1:19:06<8:12:35, 2.80s/it] 14%|█▍ | 1760/12313 [1:19:09<7:58:26, 2.72s/it] {'loss': 0.6453, 'grad_norm': 7.739691900140675, 'learning_rate': 4.834739656820228e-06, 'epoch': 0.14} 14%|█▍ | 1760/12313 [1:19:09<7:58:26, 2.72s/it] 14%|█▍ | 1761/12313 [1:19:12<8:09:06, 2.78s/it] {'loss': 0.7097, 'grad_norm': 6.791535657044752, 'learning_rate': 4.83450444644335e-06, 'epoch': 0.14} 14%|█▍ | 1761/12313 [1:19:12<8:09:06, 2.78s/it] 14%|█▍ | 1762/12313 [1:19:14<7:50:27, 2.68s/it] {'loss': 0.5335, 'grad_norm': 5.503204544875674, 'learning_rate': 4.834269074531119e-06, 'epoch': 0.14} 14%|█▍ | 1762/12313 [1:19:14<7:50:27, 2.68s/it] 14%|█▍ | 1763/12313 [1:19:17<7:45:25, 2.65s/it] {'loss': 0.5697, 'grad_norm': 4.4158123615058935, 'learning_rate': 4.834033541099822e-06, 'epoch': 0.14} 14%|█▍ | 1763/12313 [1:19:17<7:45:25, 2.65s/it] 14%|█▍ | 1764/12313 [1:19:20<8:04:38, 2.76s/it] {'loss': 0.575, 'grad_norm': 8.45132186133372, 'learning_rate': 4.833797846165758e-06, 'epoch': 0.14} 14%|█▍ | 1764/12313 [1:19:20<8:04:38, 2.76s/it] 14%|█▍ | 1765/12313 [1:19:22<8:02:27, 2.74s/it] {'loss': 0.7003, 'grad_norm': 4.117636505890885, 'learning_rate': 4.833561989745232e-06, 'epoch': 0.14} 14%|█▍ | 1765/12313 [1:19:22<8:02:27, 2.74s/it] 14%|█▍ | 1766/12313 [1:19:25<7:58:36, 2.72s/it] {'loss': 0.5898, 'grad_norm': 5.425663068958977, 'learning_rate': 4.833325971854568e-06, 'epoch': 0.14} 14%|█▍ | 1766/12313 [1:19:25<7:58:36, 2.72s/it] 14%|█▍ | 1767/12313 [1:19:28<8:26:33, 2.88s/it] {'loss': 0.6641, 'grad_norm': 4.870653806817092, 'learning_rate': 4.8330897925100966e-06, 'epoch': 0.14} 14%|█▍ | 1767/12313 [1:19:28<8:26:33, 2.88s/it] 14%|█▍ | 1768/12313 [1:19:31<8:17:54, 2.83s/it] {'loss': 0.6453, 'grad_norm': 4.047961871711326, 'learning_rate': 4.8328534517281575e-06, 'epoch': 0.14} 14%|█▍ | 1768/12313 [1:19:31<8:17:54, 2.83s/it] 14%|█▍ | 1769/12313 [1:19:34<7:59:27, 2.73s/it] {'loss': 0.4695, 'grad_norm': 4.458252548214351, 'learning_rate': 4.832616949525107e-06, 'epoch': 0.14} 14%|█▍ | 1769/12313 [1:19:34<7:59:27, 2.73s/it] 14%|█▍ | 1770/12313 [1:19:36<7:58:47, 2.72s/it] {'loss': 0.5208, 'grad_norm': 7.708237827828306, 'learning_rate': 4.832380285917309e-06, 'epoch': 0.14} 14%|█▍ | 1770/12313 [1:19:36<7:58:47, 2.72s/it] 14%|█▍ | 1771/12313 [1:19:39<7:48:04, 2.66s/it] {'loss': 0.6735, 'grad_norm': 6.499621307199643, 'learning_rate': 4.8321434609211386e-06, 'epoch': 0.14} 14%|█▍ | 1771/12313 [1:19:39<7:48:04, 2.66s/it] 14%|█▍ | 1772/12313 [1:19:42<8:05:04, 2.76s/it] {'loss': 0.467, 'grad_norm': 5.872483962693705, 'learning_rate': 4.831906474552983e-06, 'epoch': 0.14} 14%|█▍ | 1772/12313 [1:19:42<8:05:04, 2.76s/it] 14%|█▍ | 1773/12313 [1:19:45<8:08:06, 2.78s/it] {'loss': 0.6378, 'grad_norm': 3.669849200249159, 'learning_rate': 4.831669326829242e-06, 'epoch': 0.14} 14%|█▍ | 1773/12313 [1:19:45<8:08:06, 2.78s/it] 14%|█▍ | 1774/12313 [1:19:47<8:02:22, 2.75s/it] {'loss': 0.652, 'grad_norm': 4.3559205214849674, 'learning_rate': 4.831432017766323e-06, 'epoch': 0.14} 14%|█▍ | 1774/12313 [1:19:47<8:02:22, 2.75s/it] 14%|█▍ | 1775/12313 [1:19:50<7:54:26, 2.70s/it] {'loss': 0.6826, 'grad_norm': 4.444082580638693, 'learning_rate': 4.831194547380647e-06, 'epoch': 0.14} 14%|█▍ | 1775/12313 [1:19:50<7:54:26, 2.70s/it] 14%|█▍ | 1776/12313 [1:19:52<7:43:37, 2.64s/it] {'loss': 0.5328, 'grad_norm': 7.791669775176109, 'learning_rate': 4.830956915688647e-06, 'epoch': 0.14} 14%|█▍ | 1776/12313 [1:19:52<7:43:37, 2.64s/it] 14%|█▍ | 1777/12313 [1:19:55<7:45:29, 2.65s/it] {'loss': 0.7301, 'grad_norm': 3.828853400773109, 'learning_rate': 4.830719122706764e-06, 'epoch': 0.14} 14%|█▍ | 1777/12313 [1:19:55<7:45:29, 2.65s/it] 14%|█▍ | 1778/12313 [1:19:58<7:55:12, 2.71s/it] {'loss': 0.6871, 'grad_norm': 4.69526829505865, 'learning_rate': 4.830481168451453e-06, 'epoch': 0.14} 14%|█▍ | 1778/12313 [1:19:58<7:55:12, 2.71s/it] 14%|█▍ | 1779/12313 [1:20:01<7:50:24, 2.68s/it] {'loss': 0.6172, 'grad_norm': 3.7376839261484927, 'learning_rate': 4.830243052939179e-06, 'epoch': 0.14} 14%|█▍ | 1779/12313 [1:20:01<7:50:24, 2.68s/it] 14%|█▍ | 1780/12313 [1:20:03<8:03:51, 2.76s/it] {'loss': 0.6784, 'grad_norm': 5.5007109017935365, 'learning_rate': 4.830004776186419e-06, 'epoch': 0.14} 14%|█▍ | 1780/12313 [1:20:03<8:03:51, 2.76s/it] 14%|█▍ | 1781/12313 [1:20:06<7:58:43, 2.73s/it] {'loss': 0.5217, 'grad_norm': 9.156018680302578, 'learning_rate': 4.82976633820966e-06, 'epoch': 0.14} 14%|█▍ | 1781/12313 [1:20:06<7:58:43, 2.73s/it] 14%|█▍ | 1782/12313 [1:20:09<8:06:59, 2.77s/it] {'loss': 0.7003, 'grad_norm': 3.661157732313108, 'learning_rate': 4.829527739025399e-06, 'epoch': 0.14} 14%|█▍ | 1782/12313 [1:20:09<8:06:59, 2.77s/it] 14%|█▍ | 1783/12313 [1:20:12<7:59:34, 2.73s/it] {'loss': 0.5237, 'grad_norm': 6.275804371905447, 'learning_rate': 4.829288978650149e-06, 'epoch': 0.14} 14%|█▍ | 1783/12313 [1:20:12<7:59:34, 2.73s/it] 14%|█▍ | 1784/12313 [1:20:14<8:00:32, 2.74s/it] {'loss': 0.5854, 'grad_norm': 6.549033525463337, 'learning_rate': 4.829050057100428e-06, 'epoch': 0.14} 14%|█▍ | 1784/12313 [1:20:14<8:00:32, 2.74s/it] 14%|█▍ | 1785/12313 [1:20:17<7:53:34, 2.70s/it] {'loss': 0.6303, 'grad_norm': 3.236921347753896, 'learning_rate': 4.82881097439277e-06, 'epoch': 0.14} 14%|█▍ | 1785/12313 [1:20:17<7:53:34, 2.70s/it] 15%|█▍ | 1786/12313 [1:20:20<7:51:41, 2.69s/it] {'loss': 0.5043, 'grad_norm': 5.188648638663512, 'learning_rate': 4.828571730543718e-06, 'epoch': 0.15} 15%|█▍ | 1786/12313 [1:20:20<7:51:41, 2.69s/it] 15%|█▍ | 1787/12313 [1:20:23<8:03:00, 2.75s/it] {'loss': 0.6505, 'grad_norm': 4.430201943407788, 'learning_rate': 4.828332325569825e-06, 'epoch': 0.15} 15%|█▍ | 1787/12313 [1:20:23<8:03:00, 2.75s/it] 15%|█▍ | 1788/12313 [1:20:25<7:54:12, 2.70s/it] {'loss': 0.6374, 'grad_norm': 4.8572855854705645, 'learning_rate': 4.828092759487658e-06, 'epoch': 0.15} 15%|█▍ | 1788/12313 [1:20:25<7:54:12, 2.70s/it] 15%|█▍ | 1789/12313 [1:20:28<7:52:17, 2.69s/it] {'loss': 0.4907, 'grad_norm': 18.826306527694214, 'learning_rate': 4.827853032313793e-06, 'epoch': 0.15} 15%|█▍ | 1789/12313 [1:20:28<7:52:17, 2.69s/it] 15%|█▍ | 1790/12313 [1:20:30<7:46:33, 2.66s/it] {'loss': 0.5695, 'grad_norm': 7.143933819169128, 'learning_rate': 4.827613144064819e-06, 'epoch': 0.15} 15%|█▍ | 1790/12313 [1:20:30<7:46:33, 2.66s/it] 15%|█▍ | 1791/12313 [1:20:33<7:35:35, 2.60s/it] {'loss': 0.6081, 'grad_norm': 5.6480765506427755, 'learning_rate': 4.827373094757334e-06, 'epoch': 0.15} 15%|█▍ | 1791/12313 [1:20:33<7:35:35, 2.60s/it] 15%|█▍ | 1792/12313 [1:20:36<7:43:57, 2.65s/it] {'loss': 0.6509, 'grad_norm': 3.6245079382385987, 'learning_rate': 4.827132884407948e-06, 'epoch': 0.15} 15%|█▍ | 1792/12313 [1:20:36<7:43:57, 2.65s/it] 15%|█▍ | 1793/12313 [1:20:38<7:42:05, 2.64s/it] {'loss': 0.6714, 'grad_norm': 4.493541124449762, 'learning_rate': 4.826892513033283e-06, 'epoch': 0.15} 15%|█▍ | 1793/12313 [1:20:38<7:42:05, 2.64s/it] 15%|█▍ | 1794/12313 [1:20:41<7:41:53, 2.63s/it] {'loss': 0.6869, 'grad_norm': 4.222014821722098, 'learning_rate': 4.8266519806499705e-06, 'epoch': 0.15} 15%|█▍ | 1794/12313 [1:20:41<7:41:53, 2.63s/it] 15%|█▍ | 1795/12313 [1:20:43<7:40:39, 2.63s/it] {'loss': 0.5754, 'grad_norm': 5.40579336748145, 'learning_rate': 4.826411287274655e-06, 'epoch': 0.15} 15%|█▍ | 1795/12313 [1:20:43<7:40:39, 2.63s/it] 15%|█▍ | 1796/12313 [1:20:46<7:45:09, 2.65s/it] {'loss': 0.5751, 'grad_norm': 7.803642190908184, 'learning_rate': 4.82617043292399e-06, 'epoch': 0.15} 15%|█▍ | 1796/12313 [1:20:46<7:45:09, 2.65s/it] 15%|█▍ | 1797/12313 [1:20:50<8:21:16, 2.86s/it] {'loss': 0.5584, 'grad_norm': 5.10891502640099, 'learning_rate': 4.825929417614643e-06, 'epoch': 0.15} 15%|█▍ | 1797/12313 [1:20:50<8:21:16, 2.86s/it] 15%|█▍ | 1798/12313 [1:20:52<7:59:23, 2.74s/it] {'loss': 0.6378, 'grad_norm': 5.342228163843412, 'learning_rate': 4.825688241363289e-06, 'epoch': 0.15} 15%|█▍ | 1798/12313 [1:20:52<7:59:23, 2.74s/it] 15%|█▍ | 1799/12313 [1:20:55<8:21:45, 2.86s/it] {'loss': 0.7725, 'grad_norm': 3.4939671214065506, 'learning_rate': 4.825446904186619e-06, 'epoch': 0.15} 15%|█▍ | 1799/12313 [1:20:55<8:21:45, 2.86s/it] 15%|█▍ | 1800/12313 [1:20:58<8:06:14, 2.78s/it] {'loss': 0.5815, 'grad_norm': 7.740693551879523, 'learning_rate': 4.825205406101328e-06, 'epoch': 0.15} 15%|█▍ | 1800/12313 [1:20:58<8:06:14, 2.78s/it] 15%|█▍ | 1801/12313 [1:21:00<7:58:09, 2.73s/it] {'loss': 0.6857, 'grad_norm': 3.8127593882048676, 'learning_rate': 4.824963747124132e-06, 'epoch': 0.15} 15%|█▍ | 1801/12313 [1:21:00<7:58:09, 2.73s/it] 15%|█▍ | 1802/12313 [1:21:03<7:51:36, 2.69s/it] {'loss': 0.5974, 'grad_norm': 7.341843917850864, 'learning_rate': 4.824721927271747e-06, 'epoch': 0.15} 15%|█▍ | 1802/12313 [1:21:03<7:51:36, 2.69s/it] 15%|█▍ | 1803/12313 [1:21:05<7:41:02, 2.63s/it] {'loss': 0.6106, 'grad_norm': 5.649042550435367, 'learning_rate': 4.8244799465609095e-06, 'epoch': 0.15} 15%|█▍ | 1803/12313 [1:21:05<7:41:02, 2.63s/it] 15%|█▍ | 1804/12313 [1:21:08<8:01:16, 2.75s/it] {'loss': 0.5848, 'grad_norm': 6.644722885021702, 'learning_rate': 4.82423780500836e-06, 'epoch': 0.15} 15%|█▍ | 1804/12313 [1:21:08<8:01:16, 2.75s/it] 15%|█▍ | 1805/12313 [1:21:11<7:56:53, 2.72s/it] {'loss': 0.5819, 'grad_norm': 3.4917280474695853, 'learning_rate': 4.823995502630857e-06, 'epoch': 0.15} 15%|█▍ | 1805/12313 [1:21:11<7:56:53, 2.72s/it] 15%|█▍ | 1806/12313 [1:21:14<7:45:58, 2.66s/it] {'loss': 0.6797, 'grad_norm': 4.938957664863549, 'learning_rate': 4.823753039445164e-06, 'epoch': 0.15} 15%|█▍ | 1806/12313 [1:21:14<7:45:58, 2.66s/it] 15%|█▍ | 1807/12313 [1:21:16<7:46:35, 2.66s/it] {'loss': 0.6352, 'grad_norm': 3.8798922539217955, 'learning_rate': 4.823510415468059e-06, 'epoch': 0.15} 15%|█▍ | 1807/12313 [1:21:16<7:46:35, 2.66s/it] 15%|█▍ | 1808/12313 [1:21:19<7:49:57, 2.68s/it] {'loss': 0.6874, 'grad_norm': 4.629237568082465, 'learning_rate': 4.82326763071633e-06, 'epoch': 0.15} 15%|█▍ | 1808/12313 [1:21:19<7:49:57, 2.68s/it] 15%|█▍ | 1809/12313 [1:21:22<7:40:11, 2.63s/it] {'loss': 0.587, 'grad_norm': 4.720441802830114, 'learning_rate': 4.8230246852067784e-06, 'epoch': 0.15} 15%|█▍ | 1809/12313 [1:21:22<7:40:11, 2.63s/it] 15%|█▍ | 1810/12313 [1:21:24<7:45:50, 2.66s/it] {'loss': 0.4856, 'grad_norm': 5.910430878822207, 'learning_rate': 4.822781578956212e-06, 'epoch': 0.15} 15%|█▍ | 1810/12313 [1:21:24<7:45:50, 2.66s/it] 15%|█▍ | 1811/12313 [1:21:27<7:48:17, 2.68s/it] {'loss': 0.6528, 'grad_norm': 4.138152132805585, 'learning_rate': 4.8225383119814526e-06, 'epoch': 0.15} 15%|█▍ | 1811/12313 [1:21:27<7:48:17, 2.68s/it] 15%|█▍ | 1812/12313 [1:21:30<8:01:25, 2.75s/it] {'loss': 0.552, 'grad_norm': 4.151700339228477, 'learning_rate': 4.822294884299335e-06, 'epoch': 0.15} 15%|█▍ | 1812/12313 [1:21:30<8:01:25, 2.75s/it] 15%|█▍ | 1813/12313 [1:21:33<8:00:43, 2.75s/it] {'loss': 0.7129, 'grad_norm': 4.1614483287452915, 'learning_rate': 4.822051295926701e-06, 'epoch': 0.15} 15%|█▍ | 1813/12313 [1:21:33<8:00:43, 2.75s/it] 15%|█▍ | 1814/12313 [1:21:35<7:58:18, 2.73s/it] {'loss': 0.6328, 'grad_norm': 6.707224091798943, 'learning_rate': 4.821807546880407e-06, 'epoch': 0.15} 15%|█▍ | 1814/12313 [1:21:35<7:58:18, 2.73s/it] 15%|█▍ | 1815/12313 [1:21:39<8:34:37, 2.94s/it] {'loss': 0.7908, 'grad_norm': 6.368579491491943, 'learning_rate': 4.8215636371773186e-06, 'epoch': 0.15} 15%|█▍ | 1815/12313 [1:21:39<8:34:37, 2.94s/it] 15%|█▍ | 1816/12313 [1:21:41<8:10:04, 2.80s/it] {'loss': 0.5206, 'grad_norm': 4.199767598115914, 'learning_rate': 4.821319566834314e-06, 'epoch': 0.15} 15%|█▍ | 1816/12313 [1:21:41<8:10:04, 2.80s/it] 15%|█▍ | 1817/12313 [1:21:44<8:04:57, 2.77s/it] {'loss': 0.6195, 'grad_norm': 4.709792455873804, 'learning_rate': 4.82107533586828e-06, 'epoch': 0.15} 15%|█▍ | 1817/12313 [1:21:44<8:04:57, 2.77s/it] 15%|█▍ | 1818/12313 [1:21:47<7:57:42, 2.73s/it] {'loss': 0.6602, 'grad_norm': 4.319113558733222, 'learning_rate': 4.820830944296117e-06, 'epoch': 0.15} 15%|█▍ | 1818/12313 [1:21:47<7:57:42, 2.73s/it] 15%|█▍ | 1819/12313 [1:21:49<7:43:05, 2.65s/it] {'loss': 0.5952, 'grad_norm': 5.544772853522585, 'learning_rate': 4.820586392134735e-06, 'epoch': 0.15} 15%|█▍ | 1819/12313 [1:21:49<7:43:05, 2.65s/it] 15%|█▍ | 1820/12313 [1:21:52<8:00:20, 2.75s/it] {'loss': 0.5967, 'grad_norm': 4.015521528170778, 'learning_rate': 4.820341679401057e-06, 'epoch': 0.15} 15%|█▍ | 1820/12313 [1:21:52<8:00:20, 2.75s/it] 15%|█▍ | 1821/12313 [1:21:55<8:16:07, 2.84s/it] {'loss': 0.6292, 'grad_norm': 3.9474000516118215, 'learning_rate': 4.820096806112015e-06, 'epoch': 0.15} 15%|█▍ | 1821/12313 [1:21:55<8:16:07, 2.84s/it] 15%|█▍ | 1822/12313 [1:21:58<8:22:45, 2.88s/it] {'loss': 0.4472, 'grad_norm': 4.8859410343320615, 'learning_rate': 4.8198517722845524e-06, 'epoch': 0.15} 15%|█▍ | 1822/12313 [1:21:58<8:22:45, 2.88s/it] 15%|█▍ | 1823/12313 [1:22:01<8:08:47, 2.80s/it] {'loss': 0.6099, 'grad_norm': 4.277013788450521, 'learning_rate': 4.819606577935626e-06, 'epoch': 0.15} 15%|█▍ | 1823/12313 [1:22:01<8:08:47, 2.80s/it] 15%|█▍ | 1824/12313 [1:22:03<7:51:28, 2.70s/it] {'loss': 0.598, 'grad_norm': 4.8797029682772095, 'learning_rate': 4.8193612230822e-06, 'epoch': 0.15} 15%|█▍ | 1824/12313 [1:22:03<7:51:28, 2.70s/it] 15%|█▍ | 1825/12313 [1:22:06<8:03:29, 2.77s/it] {'loss': 0.6474, 'grad_norm': 7.742169376350744, 'learning_rate': 4.819115707741252e-06, 'epoch': 0.15} 15%|█▍ | 1825/12313 [1:22:06<8:03:29, 2.77s/it] 15%|█▍ | 1826/12313 [1:22:09<8:00:39, 2.75s/it] {'loss': 0.4876, 'grad_norm': 5.12268577416012, 'learning_rate': 4.818870031929771e-06, 'epoch': 0.15} 15%|█▍ | 1826/12313 [1:22:09<8:00:39, 2.75s/it] 15%|█▍ | 1827/12313 [1:22:11<7:52:28, 2.70s/it] {'loss': 0.727, 'grad_norm': 3.7709678599136605, 'learning_rate': 4.818624195664756e-06, 'epoch': 0.15} 15%|█▍ | 1827/12313 [1:22:11<7:52:28, 2.70s/it] 15%|█▍ | 1828/12313 [1:22:14<7:57:30, 2.73s/it] {'loss': 0.7224, 'grad_norm': 2.5702762627716833, 'learning_rate': 4.818378198963218e-06, 'epoch': 0.15} 15%|█▍ | 1828/12313 [1:22:14<7:57:30, 2.73s/it] 15%|█▍ | 1829/12313 [1:22:17<7:55:55, 2.72s/it] {'loss': 0.6025, 'grad_norm': 3.097076580981341, 'learning_rate': 4.81813204184218e-06, 'epoch': 0.15} 15%|█▍ | 1829/12313 [1:22:17<7:55:55, 2.72s/it] 15%|█▍ | 1830/12313 [1:22:20<7:57:04, 2.73s/it] {'loss': 0.6685, 'grad_norm': 7.135529520333667, 'learning_rate': 4.817885724318671e-06, 'epoch': 0.15} 15%|█▍ | 1830/12313 [1:22:20<7:57:04, 2.73s/it] 15%|█▍ | 1831/12313 [1:22:22<7:46:07, 2.67s/it] {'loss': 0.6588, 'grad_norm': 3.4538569552830003, 'learning_rate': 4.817639246409738e-06, 'epoch': 0.15} 15%|█▍ | 1831/12313 [1:22:22<7:46:07, 2.67s/it] 15%|█▍ | 1832/12313 [1:22:25<7:46:09, 2.67s/it] {'loss': 0.4665, 'grad_norm': 4.3881359378371965, 'learning_rate': 4.817392608132435e-06, 'epoch': 0.15} 15%|█▍ | 1832/12313 [1:22:25<7:46:09, 2.67s/it] 15%|█▍ | 1833/12313 [1:22:27<7:38:40, 2.63s/it] {'loss': 0.5136, 'grad_norm': 3.7661484270198304, 'learning_rate': 4.817145809503828e-06, 'epoch': 0.15} 15%|█▍ | 1833/12313 [1:22:27<7:38:40, 2.63s/it] 15%|█▍ | 1834/12313 [1:22:30<7:34:22, 2.60s/it] {'loss': 0.5539, 'grad_norm': 5.494631448466639, 'learning_rate': 4.816898850540995e-06, 'epoch': 0.15} 15%|█▍ | 1834/12313 [1:22:30<7:34:22, 2.60s/it] 15%|█▍ | 1835/12313 [1:22:32<7:30:16, 2.58s/it] {'loss': 0.6492, 'grad_norm': 10.552148803818461, 'learning_rate': 4.816651731261023e-06, 'epoch': 0.15} 15%|█▍ | 1835/12313 [1:22:32<7:30:16, 2.58s/it] 15%|█▍ | 1836/12313 [1:22:35<7:52:16, 2.70s/it] {'loss': 0.6898, 'grad_norm': 4.963897986383459, 'learning_rate': 4.816404451681012e-06, 'epoch': 0.15} 15%|█▍ | 1836/12313 [1:22:35<7:52:16, 2.70s/it] 15%|█▍ | 1837/12313 [1:22:38<7:46:09, 2.67s/it] {'loss': 0.6373, 'grad_norm': 4.656896578348469, 'learning_rate': 4.816157011818073e-06, 'epoch': 0.15} 15%|█▍ | 1837/12313 [1:22:38<7:46:09, 2.67s/it] 15%|█▍ | 1838/12313 [1:22:41<7:42:11, 2.65s/it] {'loss': 0.61, 'grad_norm': 6.1635228629484935, 'learning_rate': 4.815909411689326e-06, 'epoch': 0.15} 15%|█▍ | 1838/12313 [1:22:41<7:42:11, 2.65s/it] 15%|█▍ | 1839/12313 [1:22:43<7:40:55, 2.64s/it] {'loss': 0.5255, 'grad_norm': 5.077711445021842, 'learning_rate': 4.815661651311905e-06, 'epoch': 0.15} 15%|█▍ | 1839/12313 [1:22:43<7:40:55, 2.64s/it] 15%|█▍ | 1840/12313 [1:22:46<7:43:46, 2.66s/it] {'loss': 0.5737, 'grad_norm': 3.5697847643358926, 'learning_rate': 4.815413730702953e-06, 'epoch': 0.15} 15%|█▍ | 1840/12313 [1:22:46<7:43:46, 2.66s/it] 15%|█▍ | 1841/12313 [1:22:48<7:35:01, 2.61s/it] {'loss': 0.5621, 'grad_norm': 4.3625630887558025, 'learning_rate': 4.8151656498796245e-06, 'epoch': 0.15} 15%|█▍ | 1841/12313 [1:22:48<7:35:01, 2.61s/it] 15%|█▍ | 1842/12313 [1:22:51<7:39:49, 2.63s/it] {'loss': 0.6579, 'grad_norm': 3.632848050826748, 'learning_rate': 4.814917408859087e-06, 'epoch': 0.15} 15%|█▍ | 1842/12313 [1:22:51<7:39:49, 2.63s/it] 15%|█▍ | 1843/12313 [1:22:54<7:40:08, 2.64s/it] {'loss': 0.6326, 'grad_norm': 5.537325255170911, 'learning_rate': 4.8146690076585145e-06, 'epoch': 0.15} 15%|█▍ | 1843/12313 [1:22:54<7:40:08, 2.64s/it] 15%|█▍ | 1844/12313 [1:22:56<7:24:08, 2.55s/it] {'loss': 0.5227, 'grad_norm': 4.502954580803916, 'learning_rate': 4.8144204462950985e-06, 'epoch': 0.15} 15%|█▍ | 1844/12313 [1:22:56<7:24:08, 2.55s/it] 15%|█▍ | 1845/12313 [1:22:59<7:31:34, 2.59s/it] {'loss': 0.5365, 'grad_norm': 9.989837223985806, 'learning_rate': 4.8141717247860355e-06, 'epoch': 0.15} 15%|█▍ | 1845/12313 [1:22:59<7:31:34, 2.59s/it] 15%|█▍ | 1846/12313 [1:23:01<7:38:59, 2.63s/it] {'loss': 0.7256, 'grad_norm': 9.268156392807445, 'learning_rate': 4.813922843148537e-06, 'epoch': 0.15} 15%|█▍ | 1846/12313 [1:23:01<7:38:59, 2.63s/it] 15%|█▌ | 1847/12313 [1:23:04<7:35:44, 2.61s/it] {'loss': 0.6814, 'grad_norm': 4.723473199759222, 'learning_rate': 4.813673801399825e-06, 'epoch': 0.15} 15%|█▌ | 1847/12313 [1:23:04<7:35:44, 2.61s/it] 15%|█▌ | 1848/12313 [1:23:06<7:28:25, 2.57s/it] {'loss': 0.6083, 'grad_norm': 5.682840300388035, 'learning_rate': 4.81342459955713e-06, 'epoch': 0.15} 15%|█▌ | 1848/12313 [1:23:06<7:28:25, 2.57s/it] 15%|█▌ | 1849/12313 [1:23:10<7:59:20, 2.75s/it] {'loss': 0.5635, 'grad_norm': 3.7968600757264417, 'learning_rate': 4.813175237637697e-06, 'epoch': 0.15} 15%|█▌ | 1849/12313 [1:23:10<7:59:20, 2.75s/it] 15%|█▌ | 1850/12313 [1:23:12<7:53:35, 2.72s/it] {'loss': 0.5237, 'grad_norm': 4.69369222204755, 'learning_rate': 4.812925715658779e-06, 'epoch': 0.15} 15%|█▌ | 1850/12313 [1:23:12<7:53:35, 2.72s/it] 15%|█▌ | 1851/12313 [1:23:15<7:51:36, 2.70s/it] {'loss': 0.6771, 'grad_norm': 4.060309446961183, 'learning_rate': 4.812676033637643e-06, 'epoch': 0.15} 15%|█▌ | 1851/12313 [1:23:15<7:51:36, 2.70s/it] 15%|█▌ | 1852/12313 [1:23:18<7:51:26, 2.70s/it] {'loss': 0.4976, 'grad_norm': 4.068231340082502, 'learning_rate': 4.812426191591565e-06, 'epoch': 0.15} 15%|█▌ | 1852/12313 [1:23:18<7:51:26, 2.70s/it] 15%|█▌ | 1853/12313 [1:23:20<7:56:05, 2.73s/it] {'loss': 0.6372, 'grad_norm': 2.940489927194899, 'learning_rate': 4.812176189537833e-06, 'epoch': 0.15} 15%|█▌ | 1853/12313 [1:23:20<7:56:05, 2.73s/it] 15%|█▌ | 1854/12313 [1:23:23<7:51:27, 2.70s/it] {'loss': 0.5688, 'grad_norm': 18.577063761435014, 'learning_rate': 4.811926027493745e-06, 'epoch': 0.15} 15%|█▌ | 1854/12313 [1:23:23<7:51:27, 2.70s/it] 15%|█▌ | 1855/12313 [1:23:26<7:42:48, 2.66s/it] {'loss': 0.5274, 'grad_norm': 9.15886334831888, 'learning_rate': 4.811675705476613e-06, 'epoch': 0.15} 15%|█▌ | 1855/12313 [1:23:26<7:42:48, 2.66s/it] 15%|█▌ | 1856/12313 [1:23:28<7:45:05, 2.67s/it] {'loss': 0.6006, 'grad_norm': 8.116139489468573, 'learning_rate': 4.811425223503755e-06, 'epoch': 0.15} 15%|█▌ | 1856/12313 [1:23:28<7:45:05, 2.67s/it] 15%|█▌ | 1857/12313 [1:23:31<7:47:52, 2.68s/it] {'loss': 0.4627, 'grad_norm': 5.340837408995612, 'learning_rate': 4.811174581592506e-06, 'epoch': 0.15} 15%|█▌ | 1857/12313 [1:23:31<7:47:52, 2.68s/it] 15%|█▌ | 1858/12313 [1:23:33<7:34:39, 2.61s/it] {'loss': 0.6517, 'grad_norm': 7.861074956324514, 'learning_rate': 4.810923779760207e-06, 'epoch': 0.15} 15%|█▌ | 1858/12313 [1:23:33<7:34:39, 2.61s/it] 15%|█▌ | 1859/12313 [1:23:36<7:36:49, 2.62s/it] {'loss': 0.8007, 'grad_norm': 5.046613794107396, 'learning_rate': 4.810672818024212e-06, 'epoch': 0.15} 15%|█▌ | 1859/12313 [1:23:36<7:36:49, 2.62s/it] 15%|█▌ | 1860/12313 [1:23:39<7:41:21, 2.65s/it] {'loss': 0.4462, 'grad_norm': 3.78641177578102, 'learning_rate': 4.810421696401889e-06, 'epoch': 0.15} 15%|█▌ | 1860/12313 [1:23:39<7:41:21, 2.65s/it] 15%|█▌ | 1861/12313 [1:23:41<7:38:25, 2.63s/it] {'loss': 0.6483, 'grad_norm': 5.953501964167005, 'learning_rate': 4.810170414910611e-06, 'epoch': 0.15} 15%|█▌ | 1861/12313 [1:23:41<7:38:25, 2.63s/it] 15%|█▌ | 1862/12313 [1:23:44<7:27:06, 2.57s/it] {'loss': 0.4402, 'grad_norm': 4.508454868931279, 'learning_rate': 4.809918973567767e-06, 'epoch': 0.15} 15%|█▌ | 1862/12313 [1:23:44<7:27:06, 2.57s/it] 15%|█▌ | 1863/12313 [1:23:46<7:28:06, 2.57s/it] {'loss': 0.5401, 'grad_norm': 6.1091026815566645, 'learning_rate': 4.809667372390755e-06, 'epoch': 0.15} 15%|█▌ | 1863/12313 [1:23:46<7:28:06, 2.57s/it] 15%|█▌ | 1864/12313 [1:23:49<7:32:46, 2.60s/it] {'loss': 0.6984, 'grad_norm': 4.269870818997486, 'learning_rate': 4.809415611396984e-06, 'epoch': 0.15} 15%|█▌ | 1864/12313 [1:23:49<7:32:46, 2.60s/it] 15%|█▌ | 1865/12313 [1:23:52<7:33:03, 2.60s/it] {'loss': 0.6111, 'grad_norm': 8.488877241499456, 'learning_rate': 4.809163690603877e-06, 'epoch': 0.15} 15%|█▌ | 1865/12313 [1:23:52<7:33:03, 2.60s/it] 15%|█▌ | 1866/12313 [1:23:54<7:24:57, 2.56s/it] {'loss': 0.5755, 'grad_norm': 5.571115307591249, 'learning_rate': 4.808911610028861e-06, 'epoch': 0.15} 15%|█▌ | 1866/12313 [1:23:54<7:24:57, 2.56s/it] 15%|█▌ | 1867/12313 [1:23:57<7:26:00, 2.56s/it] {'loss': 0.7285, 'grad_norm': 5.592522309071258, 'learning_rate': 4.808659369689384e-06, 'epoch': 0.15} 15%|█▌ | 1867/12313 [1:23:57<7:26:00, 2.56s/it] 15%|█▌ | 1868/12313 [1:23:59<7:26:19, 2.56s/it] {'loss': 0.6066, 'grad_norm': 3.3631906733071535, 'learning_rate': 4.808406969602895e-06, 'epoch': 0.15} 15%|█▌ | 1868/12313 [1:23:59<7:26:19, 2.56s/it] 15%|█▌ | 1869/12313 [1:24:02<7:29:25, 2.58s/it] {'loss': 0.599, 'grad_norm': 5.558011990131933, 'learning_rate': 4.8081544097868615e-06, 'epoch': 0.15} 15%|█▌ | 1869/12313 [1:24:02<7:29:25, 2.58s/it] 15%|█▌ | 1870/12313 [1:24:05<8:10:04, 2.82s/it] {'loss': 0.6429, 'grad_norm': 4.196854563685466, 'learning_rate': 4.8079016902587586e-06, 'epoch': 0.15} 15%|█▌ | 1870/12313 [1:24:05<8:10:04, 2.82s/it] 15%|█▌ | 1871/12313 [1:24:08<8:08:05, 2.80s/it] {'loss': 0.4956, 'grad_norm': 5.663489456766142, 'learning_rate': 4.807648811036073e-06, 'epoch': 0.15} 15%|█▌ | 1871/12313 [1:24:08<8:08:05, 2.80s/it] 15%|█▌ | 1872/12313 [1:24:11<8:09:45, 2.81s/it] {'loss': 0.5331, 'grad_norm': 4.348733009474466, 'learning_rate': 4.807395772136303e-06, 'epoch': 0.15} 15%|█▌ | 1872/12313 [1:24:11<8:09:45, 2.81s/it] 15%|█▌ | 1873/12313 [1:24:14<7:59:56, 2.76s/it] {'loss': 0.7043, 'grad_norm': 5.237186173256271, 'learning_rate': 4.807142573576958e-06, 'epoch': 0.15} 15%|█▌ | 1873/12313 [1:24:14<7:59:56, 2.76s/it] 15%|█▌ | 1874/12313 [1:24:16<8:02:52, 2.78s/it] {'loss': 0.491, 'grad_norm': 5.71550907696815, 'learning_rate': 4.806889215375556e-06, 'epoch': 0.15} 15%|█▌ | 1874/12313 [1:24:16<8:02:52, 2.78s/it] 15%|█▌ | 1875/12313 [1:24:19<8:18:13, 2.86s/it] {'loss': 0.5455, 'grad_norm': 5.112913674384685, 'learning_rate': 4.80663569754963e-06, 'epoch': 0.15} 15%|█▌ | 1875/12313 [1:24:19<8:18:13, 2.86s/it] 15%|█▌ | 1876/12313 [1:24:23<8:32:20, 2.95s/it] {'loss': 0.6936, 'grad_norm': 4.422551589450753, 'learning_rate': 4.806382020116721e-06, 'epoch': 0.15} 15%|█▌ | 1876/12313 [1:24:23<8:32:20, 2.95s/it] 15%|█▌ | 1877/12313 [1:24:25<8:26:02, 2.91s/it] {'loss': 0.6371, 'grad_norm': 4.418250640226396, 'learning_rate': 4.806128183094383e-06, 'epoch': 0.15} 15%|█▌ | 1877/12313 [1:24:25<8:26:02, 2.91s/it] 15%|█▌ | 1878/12313 [1:24:28<8:15:36, 2.85s/it] {'loss': 0.5783, 'grad_norm': 4.611528104096768, 'learning_rate': 4.805874186500179e-06, 'epoch': 0.15} 15%|█▌ | 1878/12313 [1:24:28<8:15:36, 2.85s/it] 15%|█▌ | 1879/12313 [1:24:31<8:10:26, 2.82s/it] {'loss': 0.6227, 'grad_norm': 4.3459558003142345, 'learning_rate': 4.805620030351686e-06, 'epoch': 0.15} 15%|█▌ | 1879/12313 [1:24:31<8:10:26, 2.82s/it] 15%|█▌ | 1880/12313 [1:24:34<8:05:45, 2.79s/it] {'loss': 0.5419, 'grad_norm': 6.170103993151758, 'learning_rate': 4.805365714666489e-06, 'epoch': 0.15} 15%|█▌ | 1880/12313 [1:24:34<8:05:45, 2.79s/it] 15%|█▌ | 1881/12313 [1:24:36<8:11:06, 2.82s/it] {'loss': 0.7513, 'grad_norm': 2.958664035946074, 'learning_rate': 4.805111239462185e-06, 'epoch': 0.15} 15%|█▌ | 1881/12313 [1:24:36<8:11:06, 2.82s/it] 15%|█▌ | 1882/12313 [1:24:39<7:59:31, 2.76s/it] {'loss': 0.4584, 'grad_norm': 13.416473280689857, 'learning_rate': 4.8048566047563835e-06, 'epoch': 0.15} 15%|█▌ | 1882/12313 [1:24:39<7:59:31, 2.76s/it] 15%|█▌ | 1883/12313 [1:24:42<7:50:53, 2.71s/it] {'loss': 0.6614, 'grad_norm': 4.99824146528109, 'learning_rate': 4.8046018105667024e-06, 'epoch': 0.15} 15%|█▌ | 1883/12313 [1:24:42<7:50:53, 2.71s/it] 15%|█▌ | 1884/12313 [1:24:45<7:58:24, 2.75s/it] {'loss': 0.7377, 'grad_norm': 9.102398270856987, 'learning_rate': 4.8043468569107735e-06, 'epoch': 0.15} 15%|█▌ | 1884/12313 [1:24:45<7:58:24, 2.75s/it] 15%|█▌ | 1885/12313 [1:24:47<7:48:28, 2.70s/it] {'loss': 0.4525, 'grad_norm': 3.746424579208254, 'learning_rate': 4.804091743806237e-06, 'epoch': 0.15} 15%|█▌ | 1885/12313 [1:24:47<7:48:28, 2.70s/it] 15%|█▌ | 1886/12313 [1:24:50<8:02:43, 2.78s/it] {'loss': 0.5629, 'grad_norm': 6.368130414333825, 'learning_rate': 4.803836471270748e-06, 'epoch': 0.15} 15%|█▌ | 1886/12313 [1:24:50<8:02:43, 2.78s/it] 15%|█▌ | 1887/12313 [1:24:53<8:06:26, 2.80s/it] {'loss': 0.5416, 'grad_norm': 6.446103551439812, 'learning_rate': 4.803581039321966e-06, 'epoch': 0.15} 15%|█▌ | 1887/12313 [1:24:53<8:06:26, 2.80s/it] 15%|█▌ | 1888/12313 [1:24:55<7:46:18, 2.68s/it] {'loss': 0.5842, 'grad_norm': 5.229434076566928, 'learning_rate': 4.803325447977568e-06, 'epoch': 0.15} 15%|█▌ | 1888/12313 [1:24:55<7:46:18, 2.68s/it] 15%|█▌ | 1889/12313 [1:24:58<7:36:00, 2.62s/it] {'loss': 0.5898, 'grad_norm': 5.954690104051001, 'learning_rate': 4.80306969725524e-06, 'epoch': 0.15} 15%|█▌ | 1889/12313 [1:24:58<7:36:00, 2.62s/it] 15%|█▌ | 1890/12313 [1:25:01<8:01:14, 2.77s/it] {'loss': 0.5839, 'grad_norm': 6.150359963901614, 'learning_rate': 4.802813787172678e-06, 'epoch': 0.15} 15%|█▌ | 1890/12313 [1:25:01<8:01:14, 2.77s/it] 15%|█▌ | 1891/12313 [1:25:04<7:56:05, 2.74s/it] {'loss': 0.5668, 'grad_norm': 5.125992935197739, 'learning_rate': 4.802557717747588e-06, 'epoch': 0.15} 15%|█▌ | 1891/12313 [1:25:04<7:56:05, 2.74s/it] 15%|█▌ | 1892/12313 [1:25:06<7:48:43, 2.70s/it] {'loss': 0.4172, 'grad_norm': 4.364632434646274, 'learning_rate': 4.802301488997691e-06, 'epoch': 0.15} 15%|█▌ | 1892/12313 [1:25:06<7:48:43, 2.70s/it] 15%|█▌ | 1893/12313 [1:25:09<7:47:03, 2.69s/it] {'loss': 0.6242, 'grad_norm': 4.882188374905198, 'learning_rate': 4.802045100940715e-06, 'epoch': 0.15} 15%|█▌ | 1893/12313 [1:25:09<7:47:03, 2.69s/it] 15%|█▌ | 1894/12313 [1:25:11<7:38:33, 2.64s/it] {'loss': 0.5872, 'grad_norm': 3.3401070060949403, 'learning_rate': 4.801788553594403e-06, 'epoch': 0.15} 15%|█▌ | 1894/12313 [1:25:11<7:38:33, 2.64s/it] 15%|█▌ | 1895/12313 [1:25:14<7:27:04, 2.57s/it] {'loss': 0.5688, 'grad_norm': 4.868516648657934, 'learning_rate': 4.801531846976504e-06, 'epoch': 0.15} 15%|█▌ | 1895/12313 [1:25:14<7:27:04, 2.57s/it] 15%|█▌ | 1896/12313 [1:25:17<7:45:45, 2.68s/it] {'loss': 0.6434, 'grad_norm': 5.105870944801884, 'learning_rate': 4.801274981104781e-06, 'epoch': 0.15} 15%|█▌ | 1896/12313 [1:25:17<7:45:45, 2.68s/it] 15%|█▌ | 1897/12313 [1:25:19<7:48:53, 2.70s/it] {'loss': 0.5289, 'grad_norm': 3.8171439187123903, 'learning_rate': 4.80101795599701e-06, 'epoch': 0.15} 15%|█▌ | 1897/12313 [1:25:19<7:48:53, 2.70s/it] 15%|█▌ | 1898/12313 [1:25:22<7:45:52, 2.68s/it] {'loss': 0.4721, 'grad_norm': 16.280448607344095, 'learning_rate': 4.800760771670974e-06, 'epoch': 0.15} 15%|█▌ | 1898/12313 [1:25:22<7:45:52, 2.68s/it] 15%|█▌ | 1899/12313 [1:25:25<7:55:51, 2.74s/it] {'loss': 0.6052, 'grad_norm': 3.7231117337427055, 'learning_rate': 4.800503428144469e-06, 'epoch': 0.15} 15%|█▌ | 1899/12313 [1:25:25<7:55:51, 2.74s/it] 15%|█▌ | 1900/12313 [1:25:28<7:46:30, 2.69s/it] {'loss': 0.6106, 'grad_norm': 3.648455745421052, 'learning_rate': 4.800245925435302e-06, 'epoch': 0.15} 15%|█▌ | 1900/12313 [1:25:28<7:46:30, 2.69s/it] 15%|█▌ | 1901/12313 [1:25:30<7:44:33, 2.68s/it] {'loss': 0.5272, 'grad_norm': 6.0421687476954995, 'learning_rate': 4.7999882635612916e-06, 'epoch': 0.15} 15%|█▌ | 1901/12313 [1:25:30<7:44:33, 2.68s/it] 15%|█▌ | 1902/12313 [1:25:33<7:44:20, 2.68s/it] {'loss': 0.4801, 'grad_norm': 4.462318149762371, 'learning_rate': 4.799730442540265e-06, 'epoch': 0.15} 15%|█▌ | 1902/12313 [1:25:33<7:44:20, 2.68s/it] 15%|█▌ | 1903/12313 [1:25:36<8:01:57, 2.78s/it] {'loss': 0.6126, 'grad_norm': 5.097311868178371, 'learning_rate': 4.7994724623900636e-06, 'epoch': 0.15} 15%|█▌ | 1903/12313 [1:25:36<8:01:57, 2.78s/it] 15%|█▌ | 1904/12313 [1:25:38<7:51:36, 2.72s/it] {'loss': 0.7263, 'grad_norm': 10.491382918494583, 'learning_rate': 4.799214323128537e-06, 'epoch': 0.15} 15%|█▌ | 1904/12313 [1:25:38<7:51:36, 2.72s/it] 15%|█▌ | 1905/12313 [1:25:41<7:46:24, 2.69s/it] {'loss': 0.427, 'grad_norm': 5.632700323452676, 'learning_rate': 4.798956024773548e-06, 'epoch': 0.15} 15%|█▌ | 1905/12313 [1:25:41<7:46:24, 2.69s/it] 15%|█▌ | 1906/12313 [1:25:44<7:41:44, 2.66s/it] {'loss': 0.8174, 'grad_norm': 4.125424526878242, 'learning_rate': 4.798697567342969e-06, 'epoch': 0.15} 15%|█▌ | 1906/12313 [1:25:44<7:41:44, 2.66s/it] 15%|█▌ | 1907/12313 [1:25:47<7:54:11, 2.73s/it] {'loss': 0.5607, 'grad_norm': 3.2529153125095682, 'learning_rate': 4.798438950854685e-06, 'epoch': 0.15} 15%|█▌ | 1907/12313 [1:25:47<7:54:11, 2.73s/it] 15%|█▌ | 1908/12313 [1:25:49<7:47:24, 2.70s/it] {'loss': 0.6207, 'grad_norm': 6.639548071036903, 'learning_rate': 4.798180175326589e-06, 'epoch': 0.15} 15%|█▌ | 1908/12313 [1:25:49<7:47:24, 2.70s/it] 16%|█▌ | 1909/12313 [1:25:52<7:35:31, 2.63s/it] {'loss': 0.6548, 'grad_norm': 3.419026777929521, 'learning_rate': 4.797921240776587e-06, 'epoch': 0.16} 16%|█▌ | 1909/12313 [1:25:52<7:35:31, 2.63s/it] 16%|█▌ | 1910/12313 [1:25:54<7:42:31, 2.67s/it] {'loss': 0.722, 'grad_norm': 6.3759242855440315, 'learning_rate': 4.797662147222598e-06, 'epoch': 0.16} 16%|█▌ | 1910/12313 [1:25:54<7:42:31, 2.67s/it] 16%|█▌ | 1911/12313 [1:25:57<7:31:27, 2.60s/it] {'loss': 0.7711, 'grad_norm': 4.2074330009622525, 'learning_rate': 4.797402894682548e-06, 'epoch': 0.16} 16%|█▌ | 1911/12313 [1:25:57<7:31:27, 2.60s/it] 16%|█▌ | 1912/12313 [1:26:00<7:33:32, 2.62s/it] {'loss': 0.5241, 'grad_norm': 5.86918562130916, 'learning_rate': 4.797143483174377e-06, 'epoch': 0.16} 16%|█▌ | 1912/12313 [1:26:00<7:33:32, 2.62s/it] 16%|█▌ | 1913/12313 [1:26:02<7:48:34, 2.70s/it] {'loss': 0.6855, 'grad_norm': 4.110856189622478, 'learning_rate': 4.796883912716034e-06, 'epoch': 0.16} 16%|█▌ | 1913/12313 [1:26:02<7:48:34, 2.70s/it] 16%|█▌ | 1914/12313 [1:26:05<7:51:58, 2.72s/it] {'loss': 0.616, 'grad_norm': 4.190357484842863, 'learning_rate': 4.79662418332548e-06, 'epoch': 0.16} 16%|█▌ | 1914/12313 [1:26:05<7:51:58, 2.72s/it] 16%|█▌ | 1915/12313 [1:26:08<7:46:05, 2.69s/it] {'loss': 0.554, 'grad_norm': 6.818021899205355, 'learning_rate': 4.796364295020688e-06, 'epoch': 0.16} 16%|█▌ | 1915/12313 [1:26:08<7:46:05, 2.69s/it] 16%|█▌ | 1916/12313 [1:26:11<7:58:36, 2.76s/it] {'loss': 0.5425, 'grad_norm': 4.400441609364029, 'learning_rate': 4.7961042478196394e-06, 'epoch': 0.16} 16%|█▌ | 1916/12313 [1:26:11<7:58:36, 2.76s/it] 16%|█▌ | 1917/12313 [1:26:14<7:59:52, 2.77s/it] {'loss': 0.5295, 'grad_norm': 3.755867976669089, 'learning_rate': 4.7958440417403295e-06, 'epoch': 0.16} 16%|█▌ | 1917/12313 [1:26:14<7:59:52, 2.77s/it] 16%|█▌ | 1918/12313 [1:26:16<7:49:28, 2.71s/it] {'loss': 0.4355, 'grad_norm': 3.799331044287829, 'learning_rate': 4.795583676800762e-06, 'epoch': 0.16} 16%|█▌ | 1918/12313 [1:26:16<7:49:28, 2.71s/it] 16%|█▌ | 1919/12313 [1:26:19<7:45:19, 2.69s/it] {'loss': 0.6229, 'grad_norm': 5.47739825987387, 'learning_rate': 4.795323153018953e-06, 'epoch': 0.16} 16%|█▌ | 1919/12313 [1:26:19<7:45:19, 2.69s/it] 16%|█▌ | 1920/12313 [1:26:21<7:34:56, 2.63s/it] {'loss': 0.6301, 'grad_norm': 4.656969062795371, 'learning_rate': 4.795062470412931e-06, 'epoch': 0.16} 16%|█▌ | 1920/12313 [1:26:21<7:34:56, 2.63s/it] 16%|█▌ | 1921/12313 [1:26:24<7:35:25, 2.63s/it] {'loss': 0.6974, 'grad_norm': 4.914269261314513, 'learning_rate': 4.794801629000732e-06, 'epoch': 0.16} 16%|█▌ | 1921/12313 [1:26:24<7:35:25, 2.63s/it] 16%|█▌ | 1922/12313 [1:26:27<7:35:50, 2.63s/it] {'loss': 0.6272, 'grad_norm': 6.004759949716761, 'learning_rate': 4.794540628800405e-06, 'epoch': 0.16} 16%|█▌ | 1922/12313 [1:26:27<7:35:50, 2.63s/it] 16%|█▌ | 1923/12313 [1:26:29<7:39:13, 2.65s/it] {'loss': 0.624, 'grad_norm': 5.879866804838136, 'learning_rate': 4.79427946983001e-06, 'epoch': 0.16} 16%|█▌ | 1923/12313 [1:26:29<7:39:13, 2.65s/it] 16%|█▌ | 1924/12313 [1:26:32<7:51:35, 2.72s/it] {'loss': 0.5972, 'grad_norm': 4.9010570268519835, 'learning_rate': 4.794018152107618e-06, 'epoch': 0.16} 16%|█▌ | 1924/12313 [1:26:32<7:51:35, 2.72s/it] 16%|█▌ | 1925/12313 [1:26:35<7:45:38, 2.69s/it] {'loss': 0.5372, 'grad_norm': 6.1005563812014625, 'learning_rate': 4.793756675651311e-06, 'epoch': 0.16} 16%|█▌ | 1925/12313 [1:26:35<7:45:38, 2.69s/it] 16%|█▌ | 1926/12313 [1:26:37<7:44:55, 2.69s/it] {'loss': 0.5059, 'grad_norm': 7.657069975595423, 'learning_rate': 4.7934950404791815e-06, 'epoch': 0.16} 16%|█▌ | 1926/12313 [1:26:37<7:44:55, 2.69s/it] 16%|█▌ | 1927/12313 [1:26:40<7:46:15, 2.69s/it] {'loss': 0.5145, 'grad_norm': 3.7164386302842325, 'learning_rate': 4.793233246609333e-06, 'epoch': 0.16} 16%|█▌ | 1927/12313 [1:26:40<7:46:15, 2.69s/it] 16%|█▌ | 1928/12313 [1:26:43<7:45:34, 2.69s/it] {'loss': 0.5909, 'grad_norm': 6.7098122658626025, 'learning_rate': 4.792971294059882e-06, 'epoch': 0.16} 16%|█▌ | 1928/12313 [1:26:43<7:45:34, 2.69s/it] 16%|█▌ | 1929/12313 [1:26:46<7:50:52, 2.72s/it] {'loss': 0.5851, 'grad_norm': 4.5647090456408765, 'learning_rate': 4.792709182848951e-06, 'epoch': 0.16} 16%|█▌ | 1929/12313 [1:26:46<7:50:52, 2.72s/it] 16%|█▌ | 1930/12313 [1:26:48<7:46:09, 2.69s/it] {'loss': 0.509, 'grad_norm': 5.659279097889884, 'learning_rate': 4.792446912994679e-06, 'epoch': 0.16} 16%|█▌ | 1930/12313 [1:26:48<7:46:09, 2.69s/it] 16%|█▌ | 1931/12313 [1:26:51<7:48:57, 2.71s/it] {'loss': 0.6214, 'grad_norm': 4.660823647963301, 'learning_rate': 4.792184484515214e-06, 'epoch': 0.16} 16%|█▌ | 1931/12313 [1:26:51<7:48:57, 2.71s/it] 16%|█▌ | 1932/12313 [1:26:53<7:39:30, 2.66s/it] {'loss': 0.6083, 'grad_norm': 5.379888896939118, 'learning_rate': 4.791921897428714e-06, 'epoch': 0.16} 16%|█▌ | 1932/12313 [1:26:53<7:39:30, 2.66s/it] 16%|█▌ | 1933/12313 [1:26:56<7:35:32, 2.63s/it] {'loss': 0.5366, 'grad_norm': 3.7091543807048555, 'learning_rate': 4.791659151753348e-06, 'epoch': 0.16} 16%|█▌ | 1933/12313 [1:26:56<7:35:32, 2.63s/it] 16%|█▌ | 1934/12313 [1:26:59<7:39:55, 2.66s/it] {'loss': 0.5548, 'grad_norm': 4.46536923401703, 'learning_rate': 4.791396247507297e-06, 'epoch': 0.16} 16%|█▌ | 1934/12313 [1:26:59<7:39:55, 2.66s/it] 16%|█▌ | 1935/12313 [1:27:02<7:43:07, 2.68s/it] {'loss': 0.6562, 'grad_norm': 3.0478693770891, 'learning_rate': 4.791133184708753e-06, 'epoch': 0.16} 16%|█▌ | 1935/12313 [1:27:02<7:43:07, 2.68s/it] 16%|█▌ | 1936/12313 [1:27:04<7:44:36, 2.69s/it] {'loss': 0.6266, 'grad_norm': 5.052248542138617, 'learning_rate': 4.790869963375918e-06, 'epoch': 0.16} 16%|█▌ | 1936/12313 [1:27:04<7:44:36, 2.69s/it] 16%|█▌ | 1937/12313 [1:27:07<7:46:10, 2.70s/it] {'loss': 0.6602, 'grad_norm': 6.299780666458338, 'learning_rate': 4.790606583527006e-06, 'epoch': 0.16} 16%|█▌ | 1937/12313 [1:27:07<7:46:10, 2.70s/it] 16%|█▌ | 1938/12313 [1:27:10<8:02:23, 2.79s/it] {'loss': 0.6822, 'grad_norm': 7.111506262099773, 'learning_rate': 4.790343045180242e-06, 'epoch': 0.16} 16%|█▌ | 1938/12313 [1:27:10<8:02:23, 2.79s/it] 16%|█▌ | 1939/12313 [1:27:12<7:47:05, 2.70s/it] {'loss': 0.5579, 'grad_norm': 3.5717735290621015, 'learning_rate': 4.790079348353859e-06, 'epoch': 0.16} 16%|█▌ | 1939/12313 [1:27:12<7:47:05, 2.70s/it] 16%|█▌ | 1940/12313 [1:27:15<7:37:26, 2.65s/it] {'loss': 0.5513, 'grad_norm': 4.539252075565922, 'learning_rate': 4.789815493066106e-06, 'epoch': 0.16} 16%|█▌ | 1940/12313 [1:27:15<7:37:26, 2.65s/it] 16%|█▌ | 1941/12313 [1:27:18<7:52:51, 2.74s/it] {'loss': 0.6305, 'grad_norm': 3.19405646131031, 'learning_rate': 4.78955147933524e-06, 'epoch': 0.16} 16%|█▌ | 1941/12313 [1:27:18<7:52:51, 2.74s/it] 16%|█▌ | 1942/12313 [1:27:21<7:51:54, 2.73s/it] {'loss': 0.5575, 'grad_norm': 3.9600914109632805, 'learning_rate': 4.7892873071795285e-06, 'epoch': 0.16} 16%|█▌ | 1942/12313 [1:27:21<7:51:54, 2.73s/it] 16%|█▌ | 1943/12313 [1:27:23<7:51:08, 2.73s/it] {'loss': 0.6199, 'grad_norm': 5.033361013107082, 'learning_rate': 4.789022976617251e-06, 'epoch': 0.16} 16%|█▌ | 1943/12313 [1:27:23<7:51:08, 2.73s/it] 16%|█▌ | 1944/12313 [1:27:26<7:34:05, 2.63s/it] {'loss': 0.458, 'grad_norm': 15.306333789764619, 'learning_rate': 4.7887584876666984e-06, 'epoch': 0.16} 16%|█▌ | 1944/12313 [1:27:26<7:34:05, 2.63s/it] 16%|█▌ | 1945/12313 [1:27:28<7:30:34, 2.61s/it] {'loss': 0.691, 'grad_norm': 4.337274209547185, 'learning_rate': 4.788493840346172e-06, 'epoch': 0.16} 16%|█▌ | 1945/12313 [1:27:28<7:30:34, 2.61s/it] 16%|█▌ | 1946/12313 [1:27:31<7:35:05, 2.63s/it] {'loss': 0.7317, 'grad_norm': 4.2820007012074415, 'learning_rate': 4.788229034673983e-06, 'epoch': 0.16} 16%|█▌ | 1946/12313 [1:27:31<7:35:05, 2.63s/it] 16%|█▌ | 1947/12313 [1:27:34<7:36:46, 2.64s/it] {'loss': 0.5208, 'grad_norm': 4.361412328827347, 'learning_rate': 4.787964070668455e-06, 'epoch': 0.16} 16%|█▌ | 1947/12313 [1:27:34<7:36:46, 2.64s/it] 16%|█▌ | 1948/12313 [1:27:36<7:32:46, 2.62s/it] {'loss': 0.7488, 'grad_norm': 4.543017266735986, 'learning_rate': 4.787698948347922e-06, 'epoch': 0.16} 16%|█▌ | 1948/12313 [1:27:36<7:32:46, 2.62s/it] 16%|█▌ | 1949/12313 [1:27:39<7:40:36, 2.67s/it] {'loss': 0.478, 'grad_norm': 6.9360624754566444, 'learning_rate': 4.78743366773073e-06, 'epoch': 0.16} 16%|█▌ | 1949/12313 [1:27:39<7:40:36, 2.67s/it] 16%|█▌ | 1950/12313 [1:27:42<7:44:59, 2.69s/it] {'loss': 0.8373, 'grad_norm': 4.321466474176901, 'learning_rate': 4.787168228835234e-06, 'epoch': 0.16} 16%|█▌ | 1950/12313 [1:27:42<7:44:59, 2.69s/it] 16%|█▌ | 1951/12313 [1:27:44<7:45:12, 2.69s/it] {'loss': 0.5927, 'grad_norm': 10.29187836219349, 'learning_rate': 4.7869026316798005e-06, 'epoch': 0.16} 16%|█▌ | 1951/12313 [1:27:44<7:45:12, 2.69s/it] 16%|█▌ | 1952/12313 [1:27:47<7:45:02, 2.69s/it] {'loss': 0.7284, 'grad_norm': 5.206315709924438, 'learning_rate': 4.7866368762828095e-06, 'epoch': 0.16} 16%|█▌ | 1952/12313 [1:27:47<7:45:02, 2.69s/it] 16%|█▌ | 1953/12313 [1:27:50<7:58:17, 2.77s/it] {'loss': 0.6129, 'grad_norm': 4.695971985775556, 'learning_rate': 4.786370962662647e-06, 'epoch': 0.16} 16%|█▌ | 1953/12313 [1:27:50<7:58:17, 2.77s/it] 16%|█▌ | 1954/12313 [1:27:53<8:08:38, 2.83s/it] {'loss': 0.7817, 'grad_norm': 3.810668859896304, 'learning_rate': 4.786104890837715e-06, 'epoch': 0.16} 16%|█▌ | 1954/12313 [1:27:53<8:08:38, 2.83s/it] 16%|█▌ | 1955/12313 [1:27:56<7:50:54, 2.73s/it] {'loss': 0.4929, 'grad_norm': 5.945278033090391, 'learning_rate': 4.785838660826424e-06, 'epoch': 0.16} 16%|█▌ | 1955/12313 [1:27:56<7:50:54, 2.73s/it] 16%|█▌ | 1956/12313 [1:27:58<7:48:18, 2.71s/it] {'loss': 0.5365, 'grad_norm': 5.613790832997751, 'learning_rate': 4.785572272647196e-06, 'epoch': 0.16} 16%|█▌ | 1956/12313 [1:27:58<7:48:18, 2.71s/it] 16%|█▌ | 1957/12313 [1:28:01<7:46:56, 2.71s/it] {'loss': 0.5274, 'grad_norm': 12.194006996853599, 'learning_rate': 4.785305726318461e-06, 'epoch': 0.16} 16%|█▌ | 1957/12313 [1:28:01<7:46:56, 2.71s/it] 16%|█▌ | 1958/12313 [1:28:04<7:43:08, 2.68s/it] {'loss': 0.5129, 'grad_norm': 6.67246997329038, 'learning_rate': 4.785039021858665e-06, 'epoch': 0.16} 16%|█▌ | 1958/12313 [1:28:04<7:43:08, 2.68s/it] 16%|█▌ | 1959/12313 [1:28:07<8:01:37, 2.79s/it] {'loss': 0.5762, 'grad_norm': 4.361772829570909, 'learning_rate': 4.784772159286263e-06, 'epoch': 0.16} 16%|█▌ | 1959/12313 [1:28:07<8:01:37, 2.79s/it] 16%|█▌ | 1960/12313 [1:28:09<7:53:10, 2.74s/it] {'loss': 0.5687, 'grad_norm': 3.6171339798903115, 'learning_rate': 4.784505138619719e-06, 'epoch': 0.16} 16%|█▌ | 1960/12313 [1:28:09<7:53:10, 2.74s/it] 16%|█▌ | 1961/12313 [1:28:12<7:40:50, 2.67s/it] {'loss': 0.6731, 'grad_norm': 3.3498401020879696, 'learning_rate': 4.78423795987751e-06, 'epoch': 0.16} 16%|█▌ | 1961/12313 [1:28:12<7:40:50, 2.67s/it] 16%|█▌ | 1962/12313 [1:28:14<7:41:32, 2.68s/it] {'loss': 0.5832, 'grad_norm': 4.218304560758474, 'learning_rate': 4.783970623078124e-06, 'epoch': 0.16} 16%|█▌ | 1962/12313 [1:28:14<7:41:32, 2.68s/it] 16%|█▌ | 1963/12313 [1:28:17<7:49:07, 2.72s/it] {'loss': 0.626, 'grad_norm': 3.2954022912979473, 'learning_rate': 4.783703128240058e-06, 'epoch': 0.16} 16%|█▌ | 1963/12313 [1:28:17<7:49:07, 2.72s/it] 16%|█▌ | 1964/12313 [1:28:20<7:45:49, 2.70s/it] {'loss': 0.549, 'grad_norm': 5.637076756015832, 'learning_rate': 4.783435475381822e-06, 'epoch': 0.16} 16%|█▌ | 1964/12313 [1:28:20<7:45:49, 2.70s/it] 16%|█▌ | 1965/12313 [1:28:23<7:50:01, 2.73s/it] {'loss': 0.6577, 'grad_norm': 4.690580073274557, 'learning_rate': 4.7831676645219364e-06, 'epoch': 0.16} 16%|█▌ | 1965/12313 [1:28:23<7:50:01, 2.73s/it] 16%|█▌ | 1966/12313 [1:28:25<7:55:01, 2.75s/it] {'loss': 0.5685, 'grad_norm': 5.3749569715672125, 'learning_rate': 4.782899695678931e-06, 'epoch': 0.16} 16%|█▌ | 1966/12313 [1:28:25<7:55:01, 2.75s/it] 16%|█▌ | 1967/12313 [1:28:28<7:53:16, 2.74s/it] {'loss': 0.6891, 'grad_norm': 6.809233492187388, 'learning_rate': 4.782631568871349e-06, 'epoch': 0.16} 16%|█▌ | 1967/12313 [1:28:28<7:53:16, 2.74s/it] 16%|█▌ | 1968/12313 [1:28:31<7:49:12, 2.72s/it] {'loss': 0.3833, 'grad_norm': 4.530421811396376, 'learning_rate': 4.782363284117744e-06, 'epoch': 0.16} 16%|█▌ | 1968/12313 [1:28:31<7:49:12, 2.72s/it] 16%|█▌ | 1969/12313 [1:28:33<7:41:12, 2.68s/it] {'loss': 0.4926, 'grad_norm': 4.835132250782642, 'learning_rate': 4.782094841436677e-06, 'epoch': 0.16} 16%|█▌ | 1969/12313 [1:28:33<7:41:12, 2.68s/it] 16%|█▌ | 1970/12313 [1:28:36<7:38:52, 2.66s/it] {'loss': 0.7052, 'grad_norm': 3.8849675683937184, 'learning_rate': 4.781826240846726e-06, 'epoch': 0.16} 16%|█▌ | 1970/12313 [1:28:36<7:38:52, 2.66s/it] 16%|█▌ | 1971/12313 [1:28:39<7:29:47, 2.61s/it] {'loss': 0.6791, 'grad_norm': 4.260063677720038, 'learning_rate': 4.781557482366477e-06, 'epoch': 0.16} 16%|█▌ | 1971/12313 [1:28:39<7:29:47, 2.61s/it] 16%|█▌ | 1972/12313 [1:28:41<7:31:24, 2.62s/it] {'loss': 0.6356, 'grad_norm': 9.298148375842267, 'learning_rate': 4.781288566014524e-06, 'epoch': 0.16} 16%|█▌ | 1972/12313 [1:28:41<7:31:24, 2.62s/it] 16%|█▌ | 1973/12313 [1:28:44<7:27:33, 2.60s/it] {'loss': 0.5682, 'grad_norm': 4.484161226398304, 'learning_rate': 4.781019491809475e-06, 'epoch': 0.16} 16%|█▌ | 1973/12313 [1:28:44<7:27:33, 2.60s/it] 16%|█▌ | 1974/12313 [1:28:47<7:44:20, 2.69s/it] {'loss': 0.6072, 'grad_norm': 3.7283315441063984, 'learning_rate': 4.78075025976995e-06, 'epoch': 0.16} 16%|█▌ | 1974/12313 [1:28:47<7:44:20, 2.69s/it] 16%|█▌ | 1975/12313 [1:28:49<7:34:47, 2.64s/it] {'loss': 0.5845, 'grad_norm': 6.059353678550269, 'learning_rate': 4.780480869914578e-06, 'epoch': 0.16} 16%|█▌ | 1975/12313 [1:28:49<7:34:47, 2.64s/it] 16%|█▌ | 1976/12313 [1:28:52<7:22:42, 2.57s/it] {'loss': 0.5095, 'grad_norm': 4.554935916876435, 'learning_rate': 4.780211322261998e-06, 'epoch': 0.16} 16%|█▌ | 1976/12313 [1:28:52<7:22:42, 2.57s/it] 16%|█▌ | 1977/12313 [1:28:54<7:23:15, 2.57s/it] {'loss': 0.5098, 'grad_norm': 6.981060106812001, 'learning_rate': 4.779941616830863e-06, 'epoch': 0.16} 16%|█▌ | 1977/12313 [1:28:54<7:23:15, 2.57s/it] 16%|█▌ | 1978/12313 [1:28:57<7:29:08, 2.61s/it] {'loss': 0.7675, 'grad_norm': 4.9440805505487, 'learning_rate': 4.779671753639835e-06, 'epoch': 0.16} 16%|█▌ | 1978/12313 [1:28:57<7:29:08, 2.61s/it] 16%|█▌ | 1979/12313 [1:29:00<7:32:37, 2.63s/it] {'loss': 0.5639, 'grad_norm': 8.280400518698295, 'learning_rate': 4.779401732707586e-06, 'epoch': 0.16} 16%|█▌ | 1979/12313 [1:29:00<7:32:37, 2.63s/it] 16%|█▌ | 1980/12313 [1:29:02<7:39:03, 2.67s/it] {'loss': 0.5981, 'grad_norm': 6.056327257248986, 'learning_rate': 4.779131554052801e-06, 'epoch': 0.16} 16%|█▌ | 1980/12313 [1:29:02<7:39:03, 2.67s/it] 16%|█▌ | 1981/12313 [1:29:05<7:35:10, 2.64s/it] {'loss': 0.6512, 'grad_norm': 5.385241894979969, 'learning_rate': 4.778861217694174e-06, 'epoch': 0.16} 16%|█▌ | 1981/12313 [1:29:05<7:35:10, 2.64s/it] 16%|█▌ | 1982/12313 [1:29:08<7:36:59, 2.65s/it] {'loss': 0.5836, 'grad_norm': 4.550144423102369, 'learning_rate': 4.778590723650413e-06, 'epoch': 0.16} 16%|█▌ | 1982/12313 [1:29:08<7:36:59, 2.65s/it] 16%|█▌ | 1983/12313 [1:29:11<8:00:03, 2.79s/it] {'loss': 0.7309, 'grad_norm': 4.3719124593201055, 'learning_rate': 4.778320071940231e-06, 'epoch': 0.16} 16%|█▌ | 1983/12313 [1:29:11<8:00:03, 2.79s/it] 16%|█▌ | 1984/12313 [1:29:13<8:01:00, 2.79s/it] {'loss': 0.5897, 'grad_norm': 4.283507192707898, 'learning_rate': 4.77804926258236e-06, 'epoch': 0.16} 16%|█▌ | 1984/12313 [1:29:13<8:01:00, 2.79s/it] 16%|█▌ | 1985/12313 [1:29:16<8:10:55, 2.85s/it] {'loss': 0.6077, 'grad_norm': 3.9610170071985524, 'learning_rate': 4.777778295595535e-06, 'epoch': 0.16} 16%|█▌ | 1985/12313 [1:29:16<8:10:55, 2.85s/it] 16%|█▌ | 1986/12313 [1:29:19<8:11:28, 2.86s/it] {'loss': 0.7406, 'grad_norm': 6.219189683636367, 'learning_rate': 4.777507170998508e-06, 'epoch': 0.16} 16%|█▌ | 1986/12313 [1:29:19<8:11:28, 2.86s/it] 16%|█▌ | 1987/12313 [1:29:22<8:01:35, 2.80s/it] {'loss': 0.5908, 'grad_norm': 4.465198885140077, 'learning_rate': 4.777235888810037e-06, 'epoch': 0.16} 16%|█▌ | 1987/12313 [1:29:22<8:01:35, 2.80s/it] 16%|█▌ | 1988/12313 [1:29:24<7:44:23, 2.70s/it] {'loss': 0.5436, 'grad_norm': 6.450650746265666, 'learning_rate': 4.776964449048895e-06, 'epoch': 0.16} 16%|█▌ | 1988/12313 [1:29:24<7:44:23, 2.70s/it] 16%|█▌ | 1989/12313 [1:29:27<7:46:27, 2.71s/it] {'loss': 0.4796, 'grad_norm': 9.863744897399046, 'learning_rate': 4.776692851733864e-06, 'epoch': 0.16} 16%|█▌ | 1989/12313 [1:29:27<7:46:27, 2.71s/it] 16%|█▌ | 1990/12313 [1:29:30<7:41:45, 2.68s/it] {'loss': 0.737, 'grad_norm': 3.703087572235551, 'learning_rate': 4.776421096883737e-06, 'epoch': 0.16} 16%|█▌ | 1990/12313 [1:29:30<7:41:45, 2.68s/it] 16%|█▌ | 1991/12313 [1:29:32<7:36:00, 2.65s/it] {'loss': 0.7037, 'grad_norm': 5.885859853270686, 'learning_rate': 4.776149184517318e-06, 'epoch': 0.16} 16%|█▌ | 1991/12313 [1:29:32<7:36:00, 2.65s/it] 16%|█▌ | 1992/12313 [1:29:35<7:41:05, 2.68s/it] {'loss': 0.5807, 'grad_norm': 4.7082939980756136, 'learning_rate': 4.775877114653422e-06, 'epoch': 0.16} 16%|█▌ | 1992/12313 [1:29:35<7:41:05, 2.68s/it] 16%|█▌ | 1993/12313 [1:29:38<7:36:46, 2.66s/it] {'loss': 0.5684, 'grad_norm': 4.719665547064216, 'learning_rate': 4.775604887310874e-06, 'epoch': 0.16} 16%|█▌ | 1993/12313 [1:29:38<7:36:46, 2.66s/it] 16%|█▌ | 1994/12313 [1:29:40<7:40:08, 2.68s/it] {'loss': 0.5176, 'grad_norm': 4.372013687616748, 'learning_rate': 4.775332502508511e-06, 'epoch': 0.16} 16%|█▌ | 1994/12313 [1:29:40<7:40:08, 2.68s/it] 16%|█▌ | 1995/12313 [1:29:43<7:52:59, 2.75s/it] {'loss': 0.6306, 'grad_norm': 3.574773949292872, 'learning_rate': 4.775059960265181e-06, 'epoch': 0.16} 16%|█▌ | 1995/12313 [1:29:43<7:52:59, 2.75s/it] 16%|█▌ | 1996/12313 [1:29:46<7:43:10, 2.69s/it] {'loss': 0.5673, 'grad_norm': 3.0419771807588134, 'learning_rate': 4.774787260599744e-06, 'epoch': 0.16} 16%|█▌ | 1996/12313 [1:29:46<7:43:10, 2.69s/it] 16%|█▌ | 1997/12313 [1:29:49<7:42:47, 2.69s/it] {'loss': 0.5221, 'grad_norm': 4.11991510271793, 'learning_rate': 4.7745144035310656e-06, 'epoch': 0.16} 16%|█▌ | 1997/12313 [1:29:49<7:42:47, 2.69s/it] 16%|█▌ | 1998/12313 [1:29:51<7:44:29, 2.70s/it] {'loss': 0.5639, 'grad_norm': 8.878399802384537, 'learning_rate': 4.77424138907803e-06, 'epoch': 0.16} 16%|█▌ | 1998/12313 [1:29:51<7:44:29, 2.70s/it] 16%|█▌ | 1999/12313 [1:29:54<7:46:00, 2.71s/it] {'loss': 0.6172, 'grad_norm': 5.661428317678857, 'learning_rate': 4.773968217259525e-06, 'epoch': 0.16} 16%|█▌ | 1999/12313 [1:29:54<7:46:00, 2.71s/it] 16%|█▌ | 2000/12313 [1:29:57<7:41:28, 2.68s/it] {'loss': 0.5641, 'grad_norm': 4.587434812251549, 'learning_rate': 4.773694888094454e-06, 'epoch': 0.16} 16%|█▌ | 2000/12313 [1:29:57<7:41:28, 2.68s/it] 16%|█▋ | 2001/12313 [1:29:59<7:37:38, 2.66s/it] {'loss': 0.6165, 'grad_norm': 6.782537400712025, 'learning_rate': 4.773421401601731e-06, 'epoch': 0.16} 16%|█▋ | 2001/12313 [1:29:59<7:37:38, 2.66s/it] 16%|█▋ | 2002/12313 [1:30:02<7:33:37, 2.64s/it] {'loss': 0.5165, 'grad_norm': 5.248837296847351, 'learning_rate': 4.773147757800279e-06, 'epoch': 0.16} 16%|█▋ | 2002/12313 [1:30:02<7:33:37, 2.64s/it] 16%|█▋ | 2003/12313 [1:30:04<7:31:25, 2.63s/it] {'loss': 0.5348, 'grad_norm': 6.771172761964919, 'learning_rate': 4.772873956709032e-06, 'epoch': 0.16} 16%|█▋ | 2003/12313 [1:30:05<7:31:25, 2.63s/it] 16%|█▋ | 2004/12313 [1:30:07<7:37:31, 2.66s/it] {'loss': 0.7519, 'grad_norm': 3.7091479668688407, 'learning_rate': 4.772599998346937e-06, 'epoch': 0.16} 16%|█▋ | 2004/12313 [1:30:07<7:37:31, 2.66s/it] 16%|█▋ | 2005/12313 [1:30:10<7:37:30, 2.66s/it] {'loss': 0.5111, 'grad_norm': 4.478594715066707, 'learning_rate': 4.772325882732949e-06, 'epoch': 0.16} 16%|█▋ | 2005/12313 [1:30:10<7:37:30, 2.66s/it] 16%|█▋ | 2006/12313 [1:30:13<7:34:37, 2.65s/it] {'loss': 0.4556, 'grad_norm': 5.771923023591905, 'learning_rate': 4.772051609886036e-06, 'epoch': 0.16} 16%|█▋ | 2006/12313 [1:30:13<7:34:37, 2.65s/it] 16%|█▋ | 2007/12313 [1:30:15<7:32:40, 2.64s/it] {'loss': 0.5635, 'grad_norm': 4.808968764339891, 'learning_rate': 4.771777179825176e-06, 'epoch': 0.16} 16%|█▋ | 2007/12313 [1:30:15<7:32:40, 2.64s/it] 16%|█▋ | 2008/12313 [1:30:18<7:37:21, 2.66s/it] {'loss': 0.8752, 'grad_norm': 6.317225738730894, 'learning_rate': 4.7715025925693595e-06, 'epoch': 0.16} 16%|█▋ | 2008/12313 [1:30:18<7:37:21, 2.66s/it] 16%|█▋ | 2009/12313 [1:30:20<7:26:04, 2.60s/it] {'loss': 0.6393, 'grad_norm': 4.920049883294183, 'learning_rate': 4.771227848137585e-06, 'epoch': 0.16} 16%|█▋ | 2009/12313 [1:30:20<7:26:04, 2.60s/it] 16%|█▋ | 2010/12313 [1:30:23<7:36:15, 2.66s/it] {'loss': 0.6003, 'grad_norm': 4.256612076091997, 'learning_rate': 4.770952946548864e-06, 'epoch': 0.16} 16%|█▋ | 2010/12313 [1:30:23<7:36:15, 2.66s/it] 16%|█▋ | 2011/12313 [1:30:26<7:39:09, 2.67s/it] {'loss': 0.4413, 'grad_norm': 4.5734192613631315, 'learning_rate': 4.770677887822217e-06, 'epoch': 0.16} 16%|█▋ | 2011/12313 [1:30:26<7:39:09, 2.67s/it] 16%|█▋ | 2012/12313 [1:30:29<7:51:41, 2.75s/it] {'loss': 0.6597, 'grad_norm': 4.163070055535816, 'learning_rate': 4.770402671976677e-06, 'epoch': 0.16} 16%|█▋ | 2012/12313 [1:30:29<7:51:41, 2.75s/it] 16%|█▋ | 2013/12313 [1:30:31<7:50:52, 2.74s/it] {'loss': 0.4076, 'grad_norm': 3.0742054711235887, 'learning_rate': 4.77012729903129e-06, 'epoch': 0.16} 16%|█▋ | 2013/12313 [1:30:31<7:50:52, 2.74s/it] 16%|█▋ | 2014/12313 [1:30:34<7:41:51, 2.69s/it] {'loss': 0.5314, 'grad_norm': 5.55441667151701, 'learning_rate': 4.769851769005107e-06, 'epoch': 0.16} 16%|█▋ | 2014/12313 [1:30:34<7:41:51, 2.69s/it] 16%|█▋ | 2015/12313 [1:30:36<7:26:21, 2.60s/it] {'loss': 0.6124, 'grad_norm': 4.576121873555356, 'learning_rate': 4.769576081917195e-06, 'epoch': 0.16} 16%|█▋ | 2015/12313 [1:30:36<7:26:21, 2.60s/it] 16%|█▋ | 2016/12313 [1:30:39<7:22:12, 2.58s/it] {'loss': 0.6729, 'grad_norm': 6.382821144731639, 'learning_rate': 4.7693002377866295e-06, 'epoch': 0.16} 16%|█▋ | 2016/12313 [1:30:39<7:22:12, 2.58s/it] 16%|█▋ | 2017/12313 [1:30:42<7:28:46, 2.62s/it] {'loss': 0.4966, 'grad_norm': 9.489241945586448, 'learning_rate': 4.769024236632498e-06, 'epoch': 0.16} 16%|█▋ | 2017/12313 [1:30:42<7:28:46, 2.62s/it] 16%|█▋ | 2018/12313 [1:30:44<7:26:05, 2.60s/it] {'loss': 0.6579, 'grad_norm': 10.574632939063292, 'learning_rate': 4.768748078473898e-06, 'epoch': 0.16} 16%|█▋ | 2018/12313 [1:30:44<7:26:05, 2.60s/it] 16%|█▋ | 2019/12313 [1:30:47<7:22:36, 2.58s/it] {'loss': 0.9138, 'grad_norm': 3.323125086340652, 'learning_rate': 4.768471763329938e-06, 'epoch': 0.16} 16%|█▋ | 2019/12313 [1:30:47<7:22:36, 2.58s/it] 16%|█▋ | 2020/12313 [1:30:49<7:26:01, 2.60s/it] {'loss': 0.4129, 'grad_norm': 14.315467246083593, 'learning_rate': 4.768195291219738e-06, 'epoch': 0.16} 16%|█▋ | 2020/12313 [1:30:49<7:26:01, 2.60s/it] 16%|█▋ | 2021/12313 [1:30:52<7:26:10, 2.60s/it] {'loss': 0.6045, 'grad_norm': 3.5370073104621613, 'learning_rate': 4.767918662162428e-06, 'epoch': 0.16} 16%|█▋ | 2021/12313 [1:30:52<7:26:10, 2.60s/it] 16%|█▋ | 2022/12313 [1:30:55<7:27:43, 2.61s/it] {'loss': 0.5661, 'grad_norm': 4.952315377512936, 'learning_rate': 4.767641876177149e-06, 'epoch': 0.16} 16%|█▋ | 2022/12313 [1:30:55<7:27:43, 2.61s/it] 16%|█▋ | 2023/12313 [1:30:58<7:46:05, 2.72s/it] {'loss': 0.5964, 'grad_norm': 2.9834860448273273, 'learning_rate': 4.767364933283053e-06, 'epoch': 0.16} 16%|█▋ | 2023/12313 [1:30:58<7:46:05, 2.72s/it] 16%|█▋ | 2024/12313 [1:31:00<7:42:39, 2.70s/it] {'loss': 0.545, 'grad_norm': 38.67374390447997, 'learning_rate': 4.767087833499305e-06, 'epoch': 0.16} 16%|█▋ | 2024/12313 [1:31:00<7:42:39, 2.70s/it] 16%|█▋ | 2025/12313 [1:31:03<7:43:43, 2.70s/it] {'loss': 0.6039, 'grad_norm': 8.792176852895958, 'learning_rate': 4.7668105768450755e-06, 'epoch': 0.16} 16%|█▋ | 2025/12313 [1:31:03<7:43:43, 2.70s/it] 16%|█▋ | 2026/12313 [1:31:05<7:31:13, 2.63s/it] {'loss': 0.5289, 'grad_norm': 8.176394720323723, 'learning_rate': 4.766533163339553e-06, 'epoch': 0.16} 16%|█▋ | 2026/12313 [1:31:05<7:31:13, 2.63s/it] 16%|█▋ | 2027/12313 [1:31:08<7:30:34, 2.63s/it] {'loss': 0.4951, 'grad_norm': 6.977860465026282, 'learning_rate': 4.766255593001929e-06, 'epoch': 0.16} 16%|█▋ | 2027/12313 [1:31:08<7:30:34, 2.63s/it] 16%|█▋ | 2028/12313 [1:31:11<7:26:48, 2.61s/it] {'loss': 0.4714, 'grad_norm': 5.399038234082669, 'learning_rate': 4.765977865851413e-06, 'epoch': 0.16} 16%|█▋ | 2028/12313 [1:31:11<7:26:48, 2.61s/it] 16%|█▋ | 2029/12313 [1:31:13<7:30:33, 2.63s/it] {'loss': 0.5556, 'grad_norm': 6.62984701521281, 'learning_rate': 4.765699981907221e-06, 'epoch': 0.16} 16%|█▋ | 2029/12313 [1:31:13<7:30:33, 2.63s/it] 16%|█▋ | 2030/12313 [1:31:16<7:21:00, 2.57s/it] {'loss': 0.8441, 'grad_norm': 6.225819915723398, 'learning_rate': 4.765421941188582e-06, 'epoch': 0.16} 16%|█▋ | 2030/12313 [1:31:16<7:21:00, 2.57s/it] 16%|█▋ | 2031/12313 [1:31:18<7:21:51, 2.58s/it] {'loss': 0.411, 'grad_norm': 5.344504032565665, 'learning_rate': 4.765143743714734e-06, 'epoch': 0.16} 16%|█▋ | 2031/12313 [1:31:18<7:21:51, 2.58s/it] 17%|█▋ | 2032/12313 [1:31:21<7:26:41, 2.61s/it] {'loss': 0.5036, 'grad_norm': 6.407678819914006, 'learning_rate': 4.764865389504927e-06, 'epoch': 0.17} 17%|█▋ | 2032/12313 [1:31:21<7:26:41, 2.61s/it] 17%|█▋ | 2033/12313 [1:31:24<7:24:17, 2.59s/it] {'loss': 0.5595, 'grad_norm': 4.502779183775894, 'learning_rate': 4.764586878578421e-06, 'epoch': 0.17} 17%|█▋ | 2033/12313 [1:31:24<7:24:17, 2.59s/it] 17%|█▋ | 2034/12313 [1:31:26<7:21:20, 2.58s/it] {'loss': 0.5466, 'grad_norm': 5.385105120731812, 'learning_rate': 4.7643082109544894e-06, 'epoch': 0.17} 17%|█▋ | 2034/12313 [1:31:26<7:21:20, 2.58s/it] 17%|█▋ | 2035/12313 [1:31:29<7:16:10, 2.55s/it] {'loss': 0.5369, 'grad_norm': 5.3322559203413435, 'learning_rate': 4.764029386652412e-06, 'epoch': 0.17} 17%|█▋ | 2035/12313 [1:31:29<7:16:10, 2.55s/it] 17%|█▋ | 2036/12313 [1:31:31<7:24:18, 2.59s/it] {'loss': 0.5569, 'grad_norm': 5.983847101773067, 'learning_rate': 4.763750405691483e-06, 'epoch': 0.17} 17%|█▋ | 2036/12313 [1:31:31<7:24:18, 2.59s/it] 17%|█▋ | 2037/12313 [1:31:34<7:27:11, 2.61s/it] {'loss': 0.6493, 'grad_norm': 4.212501542827999, 'learning_rate': 4.7634712680910075e-06, 'epoch': 0.17} 17%|█▋ | 2037/12313 [1:31:34<7:27:11, 2.61s/it] 17%|█▋ | 2038/12313 [1:31:37<7:29:55, 2.63s/it] {'loss': 0.511, 'grad_norm': 5.116892931497582, 'learning_rate': 4.7631919738703e-06, 'epoch': 0.17} 17%|█▋ | 2038/12313 [1:31:37<7:29:55, 2.63s/it] 17%|█▋ | 2039/12313 [1:31:40<7:48:09, 2.73s/it] {'loss': 0.5332, 'grad_norm': 6.147176716463638, 'learning_rate': 4.762912523048685e-06, 'epoch': 0.17} 17%|█▋ | 2039/12313 [1:31:40<7:48:09, 2.73s/it] 17%|█▋ | 2040/12313 [1:31:42<7:43:17, 2.71s/it] {'loss': 0.843, 'grad_norm': 4.334801382262547, 'learning_rate': 4.7626329156455e-06, 'epoch': 0.17} 17%|█▋ | 2040/12313 [1:31:42<7:43:17, 2.71s/it] 17%|█▋ | 2041/12313 [1:31:45<7:44:07, 2.71s/it] {'loss': 0.5301, 'grad_norm': 6.491344666003298, 'learning_rate': 4.7623531516800916e-06, 'epoch': 0.17} 17%|█▋ | 2041/12313 [1:31:45<7:44:07, 2.71s/it] 17%|█▋ | 2042/12313 [1:31:47<7:34:25, 2.65s/it] {'loss': 0.7063, 'grad_norm': 4.47643734321392, 'learning_rate': 4.762073231171819e-06, 'epoch': 0.17} 17%|█▋ | 2042/12313 [1:31:47<7:34:25, 2.65s/it] 17%|█▋ | 2043/12313 [1:31:50<7:36:12, 2.67s/it] {'loss': 0.6217, 'grad_norm': 4.934657642144628, 'learning_rate': 4.76179315414005e-06, 'epoch': 0.17} 17%|█▋ | 2043/12313 [1:31:50<7:36:12, 2.67s/it] 17%|█▋ | 2044/12313 [1:31:53<7:30:18, 2.63s/it] {'loss': 0.5184, 'grad_norm': 6.800502146070916, 'learning_rate': 4.761512920604165e-06, 'epoch': 0.17} 17%|█▋ | 2044/12313 [1:31:53<7:30:18, 2.63s/it] 17%|█▋ | 2045/12313 [1:31:56<7:45:36, 2.72s/it] {'loss': 0.4322, 'grad_norm': 4.106268196492193, 'learning_rate': 4.761232530583556e-06, 'epoch': 0.17} 17%|█▋ | 2045/12313 [1:31:56<7:45:36, 2.72s/it] 17%|█▋ | 2046/12313 [1:31:58<7:39:03, 2.68s/it] {'loss': 0.541, 'grad_norm': 4.860918988738907, 'learning_rate': 4.760951984097622e-06, 'epoch': 0.17} 17%|█▋ | 2046/12313 [1:31:58<7:39:03, 2.68s/it] 17%|█▋ | 2047/12313 [1:32:01<7:34:37, 2.66s/it] {'loss': 0.7025, 'grad_norm': 3.6112840346969564, 'learning_rate': 4.760671281165777e-06, 'epoch': 0.17} 17%|█▋ | 2047/12313 [1:32:01<7:34:37, 2.66s/it] 17%|█▋ | 2048/12313 [1:32:04<7:38:22, 2.68s/it] {'loss': 0.5791, 'grad_norm': 3.363415520275136, 'learning_rate': 4.760390421807445e-06, 'epoch': 0.17} 17%|█▋ | 2048/12313 [1:32:04<7:38:22, 2.68s/it] 17%|█▋ | 2049/12313 [1:32:07<7:56:07, 2.78s/it] {'loss': 0.545, 'grad_norm': 5.52653404100446, 'learning_rate': 4.760109406042057e-06, 'epoch': 0.17} 17%|█▋ | 2049/12313 [1:32:07<7:56:07, 2.78s/it] 17%|█▋ | 2050/12313 [1:32:09<7:52:24, 2.76s/it] {'loss': 0.5705, 'grad_norm': 6.546707591746984, 'learning_rate': 4.759828233889061e-06, 'epoch': 0.17} 17%|█▋ | 2050/12313 [1:32:09<7:52:24, 2.76s/it] 17%|█▋ | 2051/12313 [1:32:12<7:44:35, 2.72s/it] {'loss': 0.5044, 'grad_norm': 5.661580374415145, 'learning_rate': 4.75954690536791e-06, 'epoch': 0.17} 17%|█▋ | 2051/12313 [1:32:12<7:44:35, 2.72s/it] 17%|█▋ | 2052/12313 [1:32:15<7:53:27, 2.77s/it] {'loss': 0.5467, 'grad_norm': 4.883386412845315, 'learning_rate': 4.759265420498073e-06, 'epoch': 0.17} 17%|█▋ | 2052/12313 [1:32:15<7:53:27, 2.77s/it] 17%|█▋ | 2053/12313 [1:32:17<7:37:14, 2.67s/it] {'loss': 0.5892, 'grad_norm': 9.411178068543851, 'learning_rate': 4.758983779299025e-06, 'epoch': 0.17} 17%|█▋ | 2053/12313 [1:32:17<7:37:14, 2.67s/it] 17%|█▋ | 2054/12313 [1:32:20<7:33:48, 2.65s/it] {'loss': 0.986, 'grad_norm': 4.132413410970519, 'learning_rate': 4.758701981790255e-06, 'epoch': 0.17} 17%|█▋ | 2054/12313 [1:32:20<7:33:48, 2.65s/it] 17%|█▋ | 2055/12313 [1:32:23<7:34:41, 2.66s/it] {'loss': 0.6445, 'grad_norm': 34.42736555216525, 'learning_rate': 4.7584200279912614e-06, 'epoch': 0.17} 17%|█▋ | 2055/12313 [1:32:23<7:34:41, 2.66s/it] 17%|█▋ | 2056/12313 [1:32:25<7:25:39, 2.61s/it] {'loss': 0.7808, 'grad_norm': 4.276245246172677, 'learning_rate': 4.7581379179215545e-06, 'epoch': 0.17} 17%|█▋ | 2056/12313 [1:32:25<7:25:39, 2.61s/it] 17%|█▋ | 2057/12313 [1:32:28<7:38:30, 2.68s/it] {'loss': 0.7379, 'grad_norm': 3.964870477401249, 'learning_rate': 4.757855651600656e-06, 'epoch': 0.17} 17%|█▋ | 2057/12313 [1:32:28<7:38:30, 2.68s/it] 17%|█▋ | 2058/12313 [1:32:31<7:41:30, 2.70s/it] {'loss': 0.7927, 'grad_norm': 3.606188650388298, 'learning_rate': 4.757573229048095e-06, 'epoch': 0.17} 17%|█▋ | 2058/12313 [1:32:31<7:41:30, 2.70s/it] 17%|█▋ | 2059/12313 [1:32:33<7:39:22, 2.69s/it] {'loss': 0.5828, 'grad_norm': 4.909495914927784, 'learning_rate': 4.757290650283414e-06, 'epoch': 0.17} 17%|█▋ | 2059/12313 [1:32:33<7:39:22, 2.69s/it] 17%|█▋ | 2060/12313 [1:32:36<7:36:30, 2.67s/it] {'loss': 0.5334, 'grad_norm': 4.647983652975215, 'learning_rate': 4.757007915326167e-06, 'epoch': 0.17} 17%|█▋ | 2060/12313 [1:32:36<7:36:30, 2.67s/it] 17%|█▋ | 2061/12313 [1:32:39<7:37:07, 2.68s/it] {'loss': 0.7908, 'grad_norm': 6.091135267313866, 'learning_rate': 4.756725024195918e-06, 'epoch': 0.17} 17%|█▋ | 2061/12313 [1:32:39<7:37:07, 2.68s/it] 17%|█▋ | 2062/12313 [1:32:41<7:47:06, 2.73s/it] {'loss': 0.4387, 'grad_norm': 4.236635859135639, 'learning_rate': 4.75644197691224e-06, 'epoch': 0.17} 17%|█▋ | 2062/12313 [1:32:41<7:47:06, 2.73s/it] 17%|█▋ | 2063/12313 [1:32:44<7:33:55, 2.66s/it] {'loss': 0.5377, 'grad_norm': 5.254900085055447, 'learning_rate': 4.7561587734947195e-06, 'epoch': 0.17} 17%|█▋ | 2063/12313 [1:32:44<7:33:55, 2.66s/it] 17%|█▋ | 2064/12313 [1:32:47<7:31:24, 2.64s/it] {'loss': 0.4746, 'grad_norm': 6.08722014972973, 'learning_rate': 4.755875413962953e-06, 'epoch': 0.17} 17%|█▋ | 2064/12313 [1:32:47<7:31:24, 2.64s/it] 17%|█▋ | 2065/12313 [1:32:49<7:31:14, 2.64s/it] {'loss': 0.4134, 'grad_norm': 6.117784644667623, 'learning_rate': 4.7555918983365456e-06, 'epoch': 0.17} 17%|█▋ | 2065/12313 [1:32:49<7:31:14, 2.64s/it] 17%|█▋ | 2066/12313 [1:32:52<7:34:02, 2.66s/it] {'loss': 0.6176, 'grad_norm': 3.931238911985374, 'learning_rate': 4.755308226635117e-06, 'epoch': 0.17} 17%|█▋ | 2066/12313 [1:32:52<7:34:02, 2.66s/it] 17%|█▋ | 2067/12313 [1:32:54<7:23:32, 2.60s/it] {'loss': 0.5139, 'grad_norm': 5.993875492451563, 'learning_rate': 4.755024398878296e-06, 'epoch': 0.17} 17%|█▋ | 2067/12313 [1:32:54<7:23:32, 2.60s/it] 17%|█▋ | 2068/12313 [1:32:57<7:33:55, 2.66s/it] {'loss': 0.7734, 'grad_norm': 4.120681576998816, 'learning_rate': 4.75474041508572e-06, 'epoch': 0.17} 17%|█▋ | 2068/12313 [1:32:57<7:33:55, 2.66s/it] 17%|█▋ | 2069/12313 [1:33:00<7:23:36, 2.60s/it] {'loss': 0.6459, 'grad_norm': 4.2419734083458716, 'learning_rate': 4.7544562752770415e-06, 'epoch': 0.17} 17%|█▋ | 2069/12313 [1:33:00<7:23:36, 2.60s/it] 17%|█▋ | 2070/12313 [1:33:02<7:24:41, 2.60s/it] {'loss': 0.4706, 'grad_norm': 8.463809226446845, 'learning_rate': 4.75417197947192e-06, 'epoch': 0.17} 17%|█▋ | 2070/12313 [1:33:02<7:24:41, 2.60s/it] 17%|█▋ | 2071/12313 [1:33:05<7:13:49, 2.54s/it] {'loss': 0.4932, 'grad_norm': 3.8343323335367128, 'learning_rate': 4.753887527690027e-06, 'epoch': 0.17} 17%|█▋ | 2071/12313 [1:33:05<7:13:49, 2.54s/it] 17%|█▋ | 2072/12313 [1:33:07<7:14:39, 2.55s/it] {'loss': 0.5316, 'grad_norm': 8.218415227607817, 'learning_rate': 4.753602919951046e-06, 'epoch': 0.17} 17%|█▋ | 2072/12313 [1:33:07<7:14:39, 2.55s/it] 17%|█▋ | 2073/12313 [1:33:10<7:32:43, 2.65s/it] {'loss': 0.5167, 'grad_norm': 3.6050617955271216, 'learning_rate': 4.753318156274669e-06, 'epoch': 0.17} 17%|█▋ | 2073/12313 [1:33:10<7:32:43, 2.65s/it] 17%|█▋ | 2074/12313 [1:33:13<7:44:53, 2.72s/it] {'loss': 0.47, 'grad_norm': 4.841750253634242, 'learning_rate': 4.753033236680602e-06, 'epoch': 0.17} 17%|█▋ | 2074/12313 [1:33:13<7:44:53, 2.72s/it] 17%|█▋ | 2075/12313 [1:33:16<7:43:19, 2.72s/it] {'loss': 0.7099, 'grad_norm': 5.173184919410824, 'learning_rate': 4.75274816118856e-06, 'epoch': 0.17} 17%|█▋ | 2075/12313 [1:33:16<7:43:19, 2.72s/it] 17%|█▋ | 2076/12313 [1:33:18<7:39:35, 2.69s/it] {'loss': 0.7438, 'grad_norm': 3.9701269933467622, 'learning_rate': 4.7524629298182655e-06, 'epoch': 0.17} 17%|█▋ | 2076/12313 [1:33:18<7:39:35, 2.69s/it] 17%|█▋ | 2077/12313 [1:33:21<7:57:53, 2.80s/it] {'loss': 0.5601, 'grad_norm': 5.896346024335442, 'learning_rate': 4.752177542589459e-06, 'epoch': 0.17} 17%|█▋ | 2077/12313 [1:33:21<7:57:53, 2.80s/it] 17%|█▋ | 2078/12313 [1:33:24<7:49:58, 2.76s/it] {'loss': 0.6669, 'grad_norm': 4.249527053571267, 'learning_rate': 4.7518919995218854e-06, 'epoch': 0.17} 17%|█▋ | 2078/12313 [1:33:24<7:49:58, 2.76s/it] 17%|█▋ | 2079/12313 [1:33:27<7:45:57, 2.73s/it] {'loss': 0.5073, 'grad_norm': 6.037758485869208, 'learning_rate': 4.7516063006353035e-06, 'epoch': 0.17} 17%|█▋ | 2079/12313 [1:33:27<7:45:57, 2.73s/it] 17%|█▋ | 2080/12313 [1:33:29<7:37:59, 2.69s/it] {'loss': 0.5701, 'grad_norm': 3.7856446494128266, 'learning_rate': 4.7513204459494825e-06, 'epoch': 0.17} 17%|█▋ | 2080/12313 [1:33:29<7:37:59, 2.69s/it] 17%|█▋ | 2081/12313 [1:33:32<7:26:53, 2.62s/it] {'loss': 0.5066, 'grad_norm': 7.188036974920219, 'learning_rate': 4.751034435484201e-06, 'epoch': 0.17} 17%|█▋ | 2081/12313 [1:33:32<7:26:53, 2.62s/it] 17%|█▋ | 2082/12313 [1:33:34<7:23:23, 2.60s/it] {'loss': 0.7534, 'grad_norm': 4.317432007080439, 'learning_rate': 4.75074826925925e-06, 'epoch': 0.17} 17%|█▋ | 2082/12313 [1:33:34<7:23:23, 2.60s/it] 17%|█▋ | 2083/12313 [1:33:37<7:26:42, 2.62s/it] {'loss': 0.6171, 'grad_norm': 5.902119726898797, 'learning_rate': 4.750461947294431e-06, 'epoch': 0.17} 17%|█▋ | 2083/12313 [1:33:37<7:26:42, 2.62s/it] 17%|█▋ | 2084/12313 [1:33:39<7:23:23, 2.60s/it] {'loss': 0.6519, 'grad_norm': 3.7384712420136523, 'learning_rate': 4.750175469609555e-06, 'epoch': 0.17} 17%|█▋ | 2084/12313 [1:33:39<7:23:23, 2.60s/it] 17%|█▋ | 2085/12313 [1:33:42<7:38:31, 2.69s/it] {'loss': 0.6105, 'grad_norm': 3.71902625991903, 'learning_rate': 4.749888836224446e-06, 'epoch': 0.17} 17%|█▋ | 2085/12313 [1:33:42<7:38:31, 2.69s/it] 17%|█▋ | 2086/12313 [1:33:45<7:34:18, 2.67s/it] {'loss': 0.8081, 'grad_norm': 4.570880922255994, 'learning_rate': 4.749602047158937e-06, 'epoch': 0.17} 17%|█▋ | 2086/12313 [1:33:45<7:34:18, 2.67s/it] 17%|█▋ | 2087/12313 [1:33:47<7:23:40, 2.60s/it] {'loss': 0.6383, 'grad_norm': 5.288276551549225, 'learning_rate': 4.749315102432872e-06, 'epoch': 0.17} 17%|█▋ | 2087/12313 [1:33:47<7:23:40, 2.60s/it] 17%|█▋ | 2088/12313 [1:33:50<7:18:19, 2.57s/it] {'loss': 0.5472, 'grad_norm': 11.77165556632558, 'learning_rate': 4.749028002066106e-06, 'epoch': 0.17} 17%|█▋ | 2088/12313 [1:33:50<7:18:19, 2.57s/it] 17%|█▋ | 2089/12313 [1:33:53<7:22:51, 2.60s/it] {'loss': 0.3578, 'grad_norm': 4.971856688210904, 'learning_rate': 4.748740746078505e-06, 'epoch': 0.17} 17%|█▋ | 2089/12313 [1:33:53<7:22:51, 2.60s/it] 17%|█▋ | 2090/12313 [1:33:55<7:27:16, 2.63s/it] {'loss': 0.6725, 'grad_norm': 3.215058873377653, 'learning_rate': 4.748453334489947e-06, 'epoch': 0.17} 17%|█▋ | 2090/12313 [1:33:55<7:27:16, 2.63s/it] 17%|█▋ | 2091/12313 [1:33:58<7:31:40, 2.65s/it] {'loss': 0.604, 'grad_norm': 5.921520125153694, 'learning_rate': 4.748165767320316e-06, 'epoch': 0.17} 17%|█▋ | 2091/12313 [1:33:58<7:31:40, 2.65s/it] 17%|█▋ | 2092/12313 [1:34:01<7:35:23, 2.67s/it] {'loss': 0.527, 'grad_norm': 16.542194531090658, 'learning_rate': 4.747878044589513e-06, 'epoch': 0.17} 17%|█▋ | 2092/12313 [1:34:01<7:35:23, 2.67s/it] 17%|█▋ | 2093/12313 [1:34:03<7:27:56, 2.63s/it] {'loss': 0.6809, 'grad_norm': 6.697998722104277, 'learning_rate': 4.747590166317447e-06, 'epoch': 0.17} 17%|█▋ | 2093/12313 [1:34:03<7:27:56, 2.63s/it] 17%|█▋ | 2094/12313 [1:34:06<7:17:42, 2.57s/it] {'loss': 0.6052, 'grad_norm': 4.365123433682997, 'learning_rate': 4.7473021325240355e-06, 'epoch': 0.17} 17%|█▋ | 2094/12313 [1:34:06<7:17:42, 2.57s/it] 17%|█▋ | 2095/12313 [1:34:08<7:21:15, 2.59s/it] {'loss': 0.7157, 'grad_norm': 4.341123013344583, 'learning_rate': 4.74701394322921e-06, 'epoch': 0.17} 17%|█▋ | 2095/12313 [1:34:08<7:21:15, 2.59s/it] 17%|█▋ | 2096/12313 [1:34:11<7:22:16, 2.60s/it] {'loss': 0.4511, 'grad_norm': 7.815755600154752, 'learning_rate': 4.7467255984529124e-06, 'epoch': 0.17} 17%|█▋ | 2096/12313 [1:34:11<7:22:16, 2.60s/it] 17%|█▋ | 2097/12313 [1:34:14<7:33:03, 2.66s/it] {'loss': 0.5972, 'grad_norm': 3.5475344534604276, 'learning_rate': 4.746437098215094e-06, 'epoch': 0.17} 17%|█▋ | 2097/12313 [1:34:14<7:33:03, 2.66s/it] 17%|█▋ | 2098/12313 [1:34:16<7:34:50, 2.67s/it] {'loss': 0.6985, 'grad_norm': 5.6422926548593635, 'learning_rate': 4.746148442535717e-06, 'epoch': 0.17} 17%|█▋ | 2098/12313 [1:34:16<7:34:50, 2.67s/it] 17%|█▋ | 2099/12313 [1:34:19<7:29:42, 2.64s/it] {'loss': 0.7605, 'grad_norm': 3.7587908987149032, 'learning_rate': 4.745859631434757e-06, 'epoch': 0.17} 17%|█▋ | 2099/12313 [1:34:19<7:29:42, 2.64s/it] 17%|█▋ | 2100/12313 [1:34:22<7:25:02, 2.61s/it] {'loss': 0.7068, 'grad_norm': 6.674895969578671, 'learning_rate': 4.745570664932195e-06, 'epoch': 0.17} 17%|█▋ | 2100/12313 [1:34:22<7:25:02, 2.61s/it] 17%|█▋ | 2101/12313 [1:34:24<7:16:03, 2.56s/it] {'loss': 0.4672, 'grad_norm': 5.123301796152971, 'learning_rate': 4.745281543048027e-06, 'epoch': 0.17} 17%|█▋ | 2101/12313 [1:34:24<7:16:03, 2.56s/it] 17%|█▋ | 2102/12313 [1:34:27<7:22:40, 2.60s/it] {'loss': 0.4934, 'grad_norm': 5.4550745050456655, 'learning_rate': 4.744992265802261e-06, 'epoch': 0.17} 17%|█▋ | 2102/12313 [1:34:27<7:22:40, 2.60s/it] 17%|█▋ | 2103/12313 [1:34:29<7:31:51, 2.66s/it] {'loss': 0.5513, 'grad_norm': 5.148605782096688, 'learning_rate': 4.74470283321491e-06, 'epoch': 0.17} 17%|█▋ | 2103/12313 [1:34:29<7:31:51, 2.66s/it] 17%|█▋ | 2104/12313 [1:34:32<7:25:14, 2.62s/it] {'loss': 0.5779, 'grad_norm': 4.960917833837893, 'learning_rate': 4.7444132453060046e-06, 'epoch': 0.17} 17%|█▋ | 2104/12313 [1:34:32<7:25:14, 2.62s/it] 17%|█▋ | 2105/12313 [1:34:35<7:31:23, 2.65s/it] {'loss': 0.4572, 'grad_norm': 5.30615437756679, 'learning_rate': 4.744123502095579e-06, 'epoch': 0.17} 17%|█▋ | 2105/12313 [1:34:35<7:31:23, 2.65s/it] 17%|█▋ | 2106/12313 [1:34:37<7:28:55, 2.64s/it] {'loss': 0.6257, 'grad_norm': 12.9603922902617, 'learning_rate': 4.743833603603685e-06, 'epoch': 0.17} 17%|█▋ | 2106/12313 [1:34:37<7:28:55, 2.64s/it] 17%|█▋ | 2107/12313 [1:34:40<7:20:55, 2.59s/it] {'loss': 0.695, 'grad_norm': 4.802287043719077, 'learning_rate': 4.743543549850381e-06, 'epoch': 0.17} 17%|█▋ | 2107/12313 [1:34:40<7:20:55, 2.59s/it] 17%|█▋ | 2108/12313 [1:34:42<7:22:58, 2.60s/it] {'loss': 0.6116, 'grad_norm': 10.403228449617778, 'learning_rate': 4.743253340855737e-06, 'epoch': 0.17} 17%|█▋ | 2108/12313 [1:34:42<7:22:58, 2.60s/it] 17%|█▋ | 2109/12313 [1:34:45<7:22:50, 2.60s/it] {'loss': 0.7586, 'grad_norm': 6.756935813007018, 'learning_rate': 4.742962976639835e-06, 'epoch': 0.17} 17%|█▋ | 2109/12313 [1:34:45<7:22:50, 2.60s/it] 17%|█▋ | 2110/12313 [1:34:48<7:27:17, 2.63s/it] {'loss': 0.4777, 'grad_norm': 10.07886788901266, 'learning_rate': 4.742672457222764e-06, 'epoch': 0.17} 17%|█▋ | 2110/12313 [1:34:48<7:27:17, 2.63s/it] 17%|█▋ | 2111/12313 [1:34:50<7:29:49, 2.65s/it] {'loss': 0.4269, 'grad_norm': 5.62161561654736, 'learning_rate': 4.742381782624629e-06, 'epoch': 0.17} 17%|█▋ | 2111/12313 [1:34:50<7:29:49, 2.65s/it] 17%|█▋ | 2112/12313 [1:34:53<7:23:43, 2.61s/it] {'loss': 0.7793, 'grad_norm': 5.949206043889717, 'learning_rate': 4.7420909528655416e-06, 'epoch': 0.17} 17%|█▋ | 2112/12313 [1:34:53<7:23:43, 2.61s/it] 17%|█▋ | 2113/12313 [1:34:56<7:35:50, 2.68s/it] {'loss': 0.5637, 'grad_norm': 4.263095903085148, 'learning_rate': 4.741799967965627e-06, 'epoch': 0.17} 17%|█▋ | 2113/12313 [1:34:56<7:35:50, 2.68s/it] 17%|█▋ | 2114/12313 [1:34:59<7:44:52, 2.73s/it] {'loss': 0.82, 'grad_norm': 10.633709854598653, 'learning_rate': 4.74150882794502e-06, 'epoch': 0.17} 17%|█▋ | 2114/12313 [1:34:59<7:44:52, 2.73s/it] 17%|█▋ | 2115/12313 [1:35:01<7:40:39, 2.71s/it] {'loss': 0.72, 'grad_norm': 3.032916783451706, 'learning_rate': 4.741217532823864e-06, 'epoch': 0.17} 17%|█▋ | 2115/12313 [1:35:01<7:40:39, 2.71s/it] 17%|█▋ | 2116/12313 [1:35:04<7:36:19, 2.69s/it] {'loss': 0.7894, 'grad_norm': 3.8608167614052484, 'learning_rate': 4.740926082622316e-06, 'epoch': 0.17} 17%|█▋ | 2116/12313 [1:35:04<7:36:19, 2.69s/it] 17%|█▋ | 2117/12313 [1:35:06<7:24:46, 2.62s/it] {'loss': 0.5949, 'grad_norm': 3.9765186741541436, 'learning_rate': 4.740634477360544e-06, 'epoch': 0.17} 17%|█▋ | 2117/12313 [1:35:06<7:24:46, 2.62s/it] 17%|█▋ | 2118/12313 [1:35:09<7:32:35, 2.66s/it] {'loss': 0.6829, 'grad_norm': 3.6048556532254246, 'learning_rate': 4.740342717058723e-06, 'epoch': 0.17} 17%|█▋ | 2118/12313 [1:35:09<7:32:35, 2.66s/it] 17%|█▋ | 2119/12313 [1:35:12<7:37:39, 2.69s/it] {'loss': 0.4803, 'grad_norm': 3.612675372859013, 'learning_rate': 4.740050801737045e-06, 'epoch': 0.17} 17%|█▋ | 2119/12313 [1:35:12<7:37:39, 2.69s/it] 17%|█▋ | 2120/12313 [1:35:15<7:31:26, 2.66s/it] {'loss': 0.7209, 'grad_norm': 4.645131781522265, 'learning_rate': 4.739758731415705e-06, 'epoch': 0.17} 17%|█▋ | 2120/12313 [1:35:15<7:31:26, 2.66s/it] 17%|█▋ | 2121/12313 [1:35:17<7:28:24, 2.64s/it] {'loss': 0.6008, 'grad_norm': 5.631469115723518, 'learning_rate': 4.739466506114916e-06, 'epoch': 0.17} 17%|█▋ | 2121/12313 [1:35:17<7:28:24, 2.64s/it] 17%|█▋ | 2122/12313 [1:35:20<7:25:16, 2.62s/it] {'loss': 0.6917, 'grad_norm': 4.1088983111658735, 'learning_rate': 4.739174125854896e-06, 'epoch': 0.17} 17%|█▋ | 2122/12313 [1:35:20<7:25:16, 2.62s/it] 17%|█▋ | 2123/12313 [1:35:23<7:35:46, 2.68s/it] {'loss': 0.5403, 'grad_norm': 5.169477468086094, 'learning_rate': 4.738881590655877e-06, 'epoch': 0.17} 17%|█▋ | 2123/12313 [1:35:23<7:35:46, 2.68s/it] 17%|█▋ | 2124/12313 [1:35:25<7:21:10, 2.60s/it] {'loss': 0.608, 'grad_norm': 4.946107948001271, 'learning_rate': 4.738588900538102e-06, 'epoch': 0.17} 17%|█▋ | 2124/12313 [1:35:25<7:21:10, 2.60s/it] 17%|█▋ | 2125/12313 [1:35:28<7:35:59, 2.69s/it] {'loss': 0.6463, 'grad_norm': 3.559651178268444, 'learning_rate': 4.738296055521821e-06, 'epoch': 0.17} 17%|█▋ | 2125/12313 [1:35:28<7:35:59, 2.69s/it] 17%|█▋ | 2126/12313 [1:35:31<7:48:38, 2.76s/it] {'loss': 0.4888, 'grad_norm': 6.129364916078834, 'learning_rate': 4.738003055627301e-06, 'epoch': 0.17} 17%|█▋ | 2126/12313 [1:35:31<7:48:38, 2.76s/it] 17%|█▋ | 2127/12313 [1:35:33<7:35:13, 2.68s/it] {'loss': 0.8504, 'grad_norm': 4.278147231585998, 'learning_rate': 4.7377099008748125e-06, 'epoch': 0.17} 17%|█▋ | 2127/12313 [1:35:33<7:35:13, 2.68s/it] 17%|█▋ | 2128/12313 [1:35:36<7:28:32, 2.64s/it] {'loss': 0.6727, 'grad_norm': 8.024624331958586, 'learning_rate': 4.737416591284643e-06, 'epoch': 0.17} 17%|█▋ | 2128/12313 [1:35:36<7:28:32, 2.64s/it] 17%|█▋ | 2129/12313 [1:35:39<7:34:14, 2.68s/it] {'loss': 0.7422, 'grad_norm': 29.45093767091559, 'learning_rate': 4.737123126877086e-06, 'epoch': 0.17} 17%|█▋ | 2129/12313 [1:35:39<7:34:14, 2.68s/it] 17%|█▋ | 2130/12313 [1:35:41<7:22:58, 2.61s/it] {'loss': 0.5307, 'grad_norm': 3.13136259957198, 'learning_rate': 4.736829507672449e-06, 'epoch': 0.17} 17%|█▋ | 2130/12313 [1:35:41<7:22:58, 2.61s/it] 17%|█▋ | 2131/12313 [1:35:44<7:26:19, 2.63s/it] {'loss': 0.6303, 'grad_norm': 7.567136439537917, 'learning_rate': 4.736535733691048e-06, 'epoch': 0.17} 17%|█▋ | 2131/12313 [1:35:44<7:26:19, 2.63s/it] 17%|█▋ | 2132/12313 [1:35:46<7:26:31, 2.63s/it] {'loss': 0.5315, 'grad_norm': 5.220061316823007, 'learning_rate': 4.73624180495321e-06, 'epoch': 0.17} 17%|█▋ | 2132/12313 [1:35:46<7:26:31, 2.63s/it] 17%|█▋ | 2133/12313 [1:35:49<7:20:34, 2.60s/it] {'loss': 0.5764, 'grad_norm': 7.195772361468456, 'learning_rate': 4.7359477214792754e-06, 'epoch': 0.17} 17%|█▋ | 2133/12313 [1:35:49<7:20:34, 2.60s/it] 17%|█▋ | 2134/12313 [1:35:51<7:16:31, 2.57s/it] {'loss': 0.5438, 'grad_norm': 7.3854852778069615, 'learning_rate': 4.735653483289591e-06, 'epoch': 0.17} 17%|█▋ | 2134/12313 [1:35:51<7:16:31, 2.57s/it] 17%|█▋ | 2135/12313 [1:35:54<7:15:37, 2.57s/it] {'loss': 0.6625, 'grad_norm': 6.686734429776475, 'learning_rate': 4.7353590904045184e-06, 'epoch': 0.17} 17%|█▋ | 2135/12313 [1:35:54<7:15:37, 2.57s/it] 17%|█▋ | 2136/12313 [1:35:57<7:19:58, 2.59s/it] {'loss': 0.5811, 'grad_norm': 3.9164463638901514, 'learning_rate': 4.735064542844428e-06, 'epoch': 0.17} 17%|█▋ | 2136/12313 [1:35:57<7:19:58, 2.59s/it] 17%|█▋ | 2137/12313 [1:35:59<7:30:38, 2.66s/it] {'loss': 0.4821, 'grad_norm': 4.957310787853842, 'learning_rate': 4.734769840629699e-06, 'epoch': 0.17} 17%|█▋ | 2137/12313 [1:35:59<7:30:38, 2.66s/it] 17%|█▋ | 2138/12313 [1:36:02<7:26:30, 2.63s/it] {'loss': 0.5227, 'grad_norm': 8.419842894201476, 'learning_rate': 4.734474983780724e-06, 'epoch': 0.17} 17%|█▋ | 2138/12313 [1:36:02<7:26:30, 2.63s/it] 17%|█▋ | 2139/12313 [1:36:05<7:38:18, 2.70s/it] {'loss': 0.6015, 'grad_norm': 8.041916137901524, 'learning_rate': 4.734179972317907e-06, 'epoch': 0.17} 17%|█▋ | 2139/12313 [1:36:05<7:38:18, 2.70s/it] 17%|█▋ | 2140/12313 [1:36:07<7:35:22, 2.69s/it] {'loss': 0.6766, 'grad_norm': 4.1188752273792195, 'learning_rate': 4.73388480626166e-06, 'epoch': 0.17} 17%|█▋ | 2140/12313 [1:36:07<7:35:22, 2.69s/it] 17%|█▋ | 2141/12313 [1:36:10<7:49:40, 2.77s/it] {'loss': 0.5939, 'grad_norm': 5.065614073087699, 'learning_rate': 4.733589485632407e-06, 'epoch': 0.17} 17%|█▋ | 2141/12313 [1:36:10<7:49:40, 2.77s/it] 17%|█▋ | 2142/12313 [1:36:13<7:41:06, 2.72s/it] {'loss': 0.5127, 'grad_norm': 3.8622027019274077, 'learning_rate': 4.733294010450583e-06, 'epoch': 0.17} 17%|█▋ | 2142/12313 [1:36:13<7:41:06, 2.72s/it] 17%|█▋ | 2143/12313 [1:36:16<7:34:35, 2.68s/it] {'loss': 0.6016, 'grad_norm': 5.31622281327025, 'learning_rate': 4.732998380736632e-06, 'epoch': 0.17} 17%|█▋ | 2143/12313 [1:36:16<7:34:35, 2.68s/it] 17%|█▋ | 2144/12313 [1:36:18<7:22:52, 2.61s/it] {'loss': 0.4538, 'grad_norm': 3.832733531671153, 'learning_rate': 4.732702596511012e-06, 'epoch': 0.17} 17%|█▋ | 2144/12313 [1:36:18<7:22:52, 2.61s/it] 17%|█▋ | 2145/12313 [1:36:21<7:17:55, 2.58s/it] {'loss': 0.5959, 'grad_norm': 4.720524116716847, 'learning_rate': 4.732406657794188e-06, 'epoch': 0.17} 17%|█▋ | 2145/12313 [1:36:21<7:17:55, 2.58s/it] 17%|█▋ | 2146/12313 [1:36:23<7:13:16, 2.56s/it] {'loss': 0.6259, 'grad_norm': 3.3574370172144734, 'learning_rate': 4.732110564606639e-06, 'epoch': 0.17} 17%|█▋ | 2146/12313 [1:36:23<7:13:16, 2.56s/it] 17%|█▋ | 2147/12313 [1:36:26<7:18:36, 2.59s/it] {'loss': 0.5966, 'grad_norm': 3.845949435304935, 'learning_rate': 4.7318143169688515e-06, 'epoch': 0.17} 17%|█▋ | 2147/12313 [1:36:26<7:18:36, 2.59s/it] 17%|█▋ | 2148/12313 [1:36:28<7:16:53, 2.58s/it] {'loss': 0.6552, 'grad_norm': 3.7946357565371973, 'learning_rate': 4.731517914901324e-06, 'epoch': 0.17} 17%|█▋ | 2148/12313 [1:36:28<7:16:53, 2.58s/it] 17%|█▋ | 2149/12313 [1:36:31<7:23:29, 2.62s/it] {'loss': 0.7327, 'grad_norm': 8.89191739354207, 'learning_rate': 4.731221358424569e-06, 'epoch': 0.17} 17%|█▋ | 2149/12313 [1:36:31<7:23:29, 2.62s/it] 17%|█▋ | 2150/12313 [1:36:34<7:21:55, 2.61s/it] {'loss': 0.6989, 'grad_norm': 6.740032168112452, 'learning_rate': 4.730924647559103e-06, 'epoch': 0.17} 17%|█▋ | 2150/12313 [1:36:34<7:21:55, 2.61s/it] 17%|█▋ | 2151/12313 [1:36:36<7:28:43, 2.65s/it] {'loss': 0.5567, 'grad_norm': 4.6320616184970875, 'learning_rate': 4.730627782325459e-06, 'epoch': 0.17} 17%|█▋ | 2151/12313 [1:36:36<7:28:43, 2.65s/it] 17%|█▋ | 2152/12313 [1:36:39<7:31:57, 2.67s/it] {'loss': 0.5177, 'grad_norm': 4.627022975502933, 'learning_rate': 4.730330762744178e-06, 'epoch': 0.17} 17%|█▋ | 2152/12313 [1:36:39<7:31:57, 2.67s/it] 17%|█▋ | 2153/12313 [1:36:42<7:32:30, 2.67s/it] {'loss': 0.578, 'grad_norm': 5.2547160244571876, 'learning_rate': 4.730033588835812e-06, 'epoch': 0.17} 17%|█▋ | 2153/12313 [1:36:42<7:32:30, 2.67s/it] 17%|█▋ | 2154/12313 [1:36:45<7:40:56, 2.72s/it] {'loss': 0.5025, 'grad_norm': 4.545741644009064, 'learning_rate': 4.729736260620924e-06, 'epoch': 0.17} 17%|█▋ | 2154/12313 [1:36:45<7:40:56, 2.72s/it] 18%|█▊ | 2155/12313 [1:36:47<7:34:56, 2.69s/it] {'loss': 0.6065, 'grad_norm': 3.1745271529048074, 'learning_rate': 4.729438778120088e-06, 'epoch': 0.18} 18%|█▊ | 2155/12313 [1:36:47<7:34:56, 2.69s/it] 18%|█▊ | 2156/12313 [1:36:50<7:34:03, 2.68s/it] {'loss': 0.5726, 'grad_norm': 3.614800398396882, 'learning_rate': 4.729141141353887e-06, 'epoch': 0.18} 18%|█▊ | 2156/12313 [1:36:50<7:34:03, 2.68s/it] 18%|█▊ | 2157/12313 [1:36:53<7:31:45, 2.67s/it] {'loss': 0.7513, 'grad_norm': 4.093940125645144, 'learning_rate': 4.7288433503429165e-06, 'epoch': 0.18} 18%|█▊ | 2157/12313 [1:36:53<7:31:45, 2.67s/it] 18%|█▊ | 2158/12313 [1:36:55<7:28:46, 2.65s/it] {'loss': 0.5511, 'grad_norm': 6.4087498177915885, 'learning_rate': 4.728545405107782e-06, 'epoch': 0.18} 18%|█▊ | 2158/12313 [1:36:55<7:28:46, 2.65s/it] 18%|█▊ | 2159/12313 [1:36:58<7:26:18, 2.64s/it] {'loss': 0.5388, 'grad_norm': 6.964419687177815, 'learning_rate': 4.7282473056691e-06, 'epoch': 0.18} 18%|█▊ | 2159/12313 [1:36:58<7:26:18, 2.64s/it] 18%|█▊ | 2160/12313 [1:37:01<7:46:25, 2.76s/it] {'loss': 0.5553, 'grad_norm': 7.381195247577337, 'learning_rate': 4.727949052047498e-06, 'epoch': 0.18} 18%|█▊ | 2160/12313 [1:37:01<7:46:25, 2.76s/it] 18%|█▊ | 2161/12313 [1:37:03<7:31:56, 2.67s/it] {'loss': 0.6407, 'grad_norm': 3.447447501574935, 'learning_rate': 4.7276506442636125e-06, 'epoch': 0.18} 18%|█▊ | 2161/12313 [1:37:03<7:31:56, 2.67s/it] 18%|█▊ | 2162/12313 [1:37:06<7:25:42, 2.63s/it] {'loss': 0.5699, 'grad_norm': 3.624265660451841, 'learning_rate': 4.727352082338092e-06, 'epoch': 0.18} 18%|█▊ | 2162/12313 [1:37:06<7:25:42, 2.63s/it] 18%|█▊ | 2163/12313 [1:37:08<7:22:06, 2.61s/it] {'loss': 0.5561, 'grad_norm': 4.736948616378692, 'learning_rate': 4.727053366291595e-06, 'epoch': 0.18} 18%|█▊ | 2163/12313 [1:37:08<7:22:06, 2.61s/it] 18%|█▊ | 2164/12313 [1:37:11<7:13:00, 2.56s/it] {'loss': 0.6388, 'grad_norm': 5.557189925086155, 'learning_rate': 4.726754496144792e-06, 'epoch': 0.18} 18%|█▊ | 2164/12313 [1:37:11<7:13:00, 2.56s/it] 18%|█▊ | 2165/12313 [1:37:13<7:21:02, 2.61s/it] {'loss': 0.5798, 'grad_norm': 5.654257813296, 'learning_rate': 4.726455471918363e-06, 'epoch': 0.18} 18%|█▊ | 2165/12313 [1:37:13<7:21:02, 2.61s/it] 18%|█▊ | 2166/12313 [1:37:16<7:19:08, 2.60s/it] {'loss': 0.6494, 'grad_norm': 4.017186015236763, 'learning_rate': 4.726156293632998e-06, 'epoch': 0.18} 18%|█▊ | 2166/12313 [1:37:16<7:19:08, 2.60s/it] 18%|█▊ | 2167/12313 [1:37:19<7:15:50, 2.58s/it] {'loss': 0.6894, 'grad_norm': 3.2976783263541036, 'learning_rate': 4.725856961309401e-06, 'epoch': 0.18} 18%|█▊ | 2167/12313 [1:37:19<7:15:50, 2.58s/it] 18%|█▊ | 2168/12313 [1:37:21<7:17:29, 2.59s/it] {'loss': 0.5775, 'grad_norm': 9.505301533813855, 'learning_rate': 4.725557474968281e-06, 'epoch': 0.18} 18%|█▊ | 2168/12313 [1:37:21<7:17:29, 2.59s/it] 18%|█▊ | 2169/12313 [1:37:24<7:16:49, 2.58s/it] {'loss': 0.5132, 'grad_norm': 5.237101724122944, 'learning_rate': 4.725257834630362e-06, 'epoch': 0.18} 18%|█▊ | 2169/12313 [1:37:24<7:16:49, 2.58s/it] 18%|█▊ | 2170/12313 [1:37:26<7:13:16, 2.56s/it] {'loss': 0.4708, 'grad_norm': 4.666530981374312, 'learning_rate': 4.7249580403163786e-06, 'epoch': 0.18} 18%|█▊ | 2170/12313 [1:37:26<7:13:16, 2.56s/it] 18%|█▊ | 2171/12313 [1:37:29<7:25:29, 2.64s/it] {'loss': 0.6887, 'grad_norm': 3.82098458375026, 'learning_rate': 4.7246580920470746e-06, 'epoch': 0.18} 18%|█▊ | 2171/12313 [1:37:29<7:25:29, 2.64s/it] 18%|█▊ | 2172/12313 [1:37:32<7:27:06, 2.65s/it] {'loss': 0.5326, 'grad_norm': 4.794596840480159, 'learning_rate': 4.7243579898432035e-06, 'epoch': 0.18} 18%|█▊ | 2172/12313 [1:37:32<7:27:06, 2.65s/it] 18%|█▊ | 2173/12313 [1:37:35<7:43:59, 2.75s/it] {'loss': 0.4342, 'grad_norm': 5.163380467947997, 'learning_rate': 4.724057733725532e-06, 'epoch': 0.18} 18%|█▊ | 2173/12313 [1:37:35<7:43:59, 2.75s/it] 18%|█▊ | 2174/12313 [1:37:37<7:43:06, 2.74s/it] {'loss': 0.5504, 'grad_norm': 6.130524877364227, 'learning_rate': 4.723757323714836e-06, 'epoch': 0.18} 18%|█▊ | 2174/12313 [1:37:37<7:43:06, 2.74s/it] 18%|█▊ | 2175/12313 [1:37:40<7:40:24, 2.72s/it] {'loss': 0.519, 'grad_norm': 3.6388040943311566, 'learning_rate': 4.723456759831903e-06, 'epoch': 0.18} 18%|█▊ | 2175/12313 [1:37:40<7:40:24, 2.72s/it] 18%|█▊ | 2176/12313 [1:37:43<7:39:42, 2.72s/it] {'loss': 0.4867, 'grad_norm': 5.217405099472009, 'learning_rate': 4.7231560420975294e-06, 'epoch': 0.18} 18%|█▊ | 2176/12313 [1:37:43<7:39:42, 2.72s/it] 18%|█▊ | 2177/12313 [1:37:46<7:39:47, 2.72s/it] {'loss': 0.532, 'grad_norm': 8.145512643236385, 'learning_rate': 4.722855170532523e-06, 'epoch': 0.18} 18%|█▊ | 2177/12313 [1:37:46<7:39:47, 2.72s/it] 18%|█▊ | 2178/12313 [1:37:49<7:49:28, 2.78s/it] {'loss': 0.7263, 'grad_norm': 6.513554956269954, 'learning_rate': 4.7225541451577035e-06, 'epoch': 0.18} 18%|█▊ | 2178/12313 [1:37:49<7:49:28, 2.78s/it] 18%|█▊ | 2179/12313 [1:37:51<7:39:59, 2.72s/it] {'loss': 0.5248, 'grad_norm': 5.647381362856042, 'learning_rate': 4.7222529659939e-06, 'epoch': 0.18} 18%|█▊ | 2179/12313 [1:37:51<7:39:59, 2.72s/it] 18%|█▊ | 2180/12313 [1:37:54<7:39:43, 2.72s/it] {'loss': 0.5163, 'grad_norm': 8.634842181250088, 'learning_rate': 4.721951633061952e-06, 'epoch': 0.18} 18%|█▊ | 2180/12313 [1:37:54<7:39:43, 2.72s/it] 18%|█▊ | 2181/12313 [1:37:56<7:27:31, 2.65s/it] {'loss': 0.5169, 'grad_norm': 5.804232556564542, 'learning_rate': 4.721650146382711e-06, 'epoch': 0.18} 18%|█▊ | 2181/12313 [1:37:56<7:27:31, 2.65s/it] 18%|█▊ | 2182/12313 [1:37:59<7:27:01, 2.65s/it] {'loss': 0.569, 'grad_norm': 5.170134284377379, 'learning_rate': 4.721348505977037e-06, 'epoch': 0.18} 18%|█▊ | 2182/12313 [1:37:59<7:27:01, 2.65s/it] 18%|█▊ | 2183/12313 [1:38:02<7:30:25, 2.67s/it] {'loss': 0.7412, 'grad_norm': 6.347534050197857, 'learning_rate': 4.721046711865803e-06, 'epoch': 0.18} 18%|█▊ | 2183/12313 [1:38:02<7:30:25, 2.67s/it] 18%|█▊ | 2184/12313 [1:38:05<7:42:03, 2.74s/it] {'loss': 0.6272, 'grad_norm': 3.947255210170048, 'learning_rate': 4.720744764069892e-06, 'epoch': 0.18} 18%|█▊ | 2184/12313 [1:38:05<7:42:03, 2.74s/it] 18%|█▊ | 2185/12313 [1:38:07<7:41:58, 2.74s/it] {'loss': 0.5039, 'grad_norm': 8.963510036415032, 'learning_rate': 4.7204426626101955e-06, 'epoch': 0.18} 18%|█▊ | 2185/12313 [1:38:07<7:41:58, 2.74s/it] 18%|█▊ | 2186/12313 [1:38:10<7:41:01, 2.73s/it] {'loss': 0.5961, 'grad_norm': 5.0045242805343255, 'learning_rate': 4.720140407507619e-06, 'epoch': 0.18} 18%|█▊ | 2186/12313 [1:38:10<7:41:01, 2.73s/it] 18%|█▊ | 2187/12313 [1:38:13<7:32:02, 2.68s/it] {'loss': 0.6292, 'grad_norm': 80.66410418515012, 'learning_rate': 4.719837998783075e-06, 'epoch': 0.18} 18%|█▊ | 2187/12313 [1:38:13<7:32:02, 2.68s/it] 18%|█▊ | 2188/12313 [1:38:15<7:28:10, 2.66s/it] {'loss': 0.5465, 'grad_norm': 3.8541304256097804, 'learning_rate': 4.7195354364574915e-06, 'epoch': 0.18} 18%|█▊ | 2188/12313 [1:38:15<7:28:10, 2.66s/it] 18%|█▊ | 2189/12313 [1:38:18<7:31:24, 2.68s/it] {'loss': 0.6409, 'grad_norm': 3.524866314545861, 'learning_rate': 4.719232720551802e-06, 'epoch': 0.18} 18%|█▊ | 2189/12313 [1:38:18<7:31:24, 2.68s/it] 18%|█▊ | 2190/12313 [1:38:21<7:34:55, 2.70s/it] {'loss': 0.5239, 'grad_norm': 3.929607398758829, 'learning_rate': 4.718929851086953e-06, 'epoch': 0.18} 18%|█▊ | 2190/12313 [1:38:21<7:34:55, 2.70s/it] 18%|█▊ | 2191/12313 [1:38:23<7:25:16, 2.64s/it] {'loss': 0.515, 'grad_norm': 6.4534562177957575, 'learning_rate': 4.718626828083902e-06, 'epoch': 0.18} 18%|█▊ | 2191/12313 [1:38:23<7:25:16, 2.64s/it] 18%|█▊ | 2192/12313 [1:38:26<7:38:26, 2.72s/it] {'loss': 0.4572, 'grad_norm': 4.416334886882551, 'learning_rate': 4.718323651563616e-06, 'epoch': 0.18} 18%|█▊ | 2192/12313 [1:38:26<7:38:26, 2.72s/it] 18%|█▊ | 2193/12313 [1:38:29<7:36:16, 2.71s/it] {'loss': 0.6128, 'grad_norm': 4.2150922659750085, 'learning_rate': 4.718020321547075e-06, 'epoch': 0.18} 18%|█▊ | 2193/12313 [1:38:29<7:36:16, 2.71s/it] 18%|█▊ | 2194/12313 [1:38:31<7:28:23, 2.66s/it] {'loss': 0.642, 'grad_norm': 3.3177339549952705, 'learning_rate': 4.717716838055265e-06, 'epoch': 0.18} 18%|█▊ | 2194/12313 [1:38:31<7:28:23, 2.66s/it] 18%|█▊ | 2195/12313 [1:38:34<7:26:35, 2.65s/it] {'loss': 0.8113, 'grad_norm': 4.248446599343683, 'learning_rate': 4.717413201109187e-06, 'epoch': 0.18} 18%|█▊ | 2195/12313 [1:38:34<7:26:35, 2.65s/it] 18%|█▊ | 2196/12313 [1:38:36<7:23:33, 2.63s/it] {'loss': 0.6368, 'grad_norm': 4.561435467958204, 'learning_rate': 4.717109410729851e-06, 'epoch': 0.18} 18%|█▊ | 2196/12313 [1:38:36<7:23:33, 2.63s/it] 18%|█▊ | 2197/12313 [1:38:39<7:26:25, 2.65s/it] {'loss': 0.6438, 'grad_norm': 5.104529981932405, 'learning_rate': 4.716805466938278e-06, 'epoch': 0.18} 18%|█▊ | 2197/12313 [1:38:39<7:26:25, 2.65s/it] 18%|█▊ | 2198/12313 [1:38:42<7:36:28, 2.71s/it] {'loss': 0.6165, 'grad_norm': 3.5323774996376374, 'learning_rate': 4.7165013697555e-06, 'epoch': 0.18} 18%|█▊ | 2198/12313 [1:38:42<7:36:28, 2.71s/it] 18%|█▊ | 2199/12313 [1:38:45<7:28:06, 2.66s/it] {'loss': 0.6347, 'grad_norm': 6.640997619244458, 'learning_rate': 4.716197119202556e-06, 'epoch': 0.18} 18%|█▊ | 2199/12313 [1:38:45<7:28:06, 2.66s/it] 18%|█▊ | 2200/12313 [1:38:47<7:32:14, 2.68s/it] {'loss': 0.7224, 'grad_norm': 11.632083395111554, 'learning_rate': 4.715892715300501e-06, 'epoch': 0.18} 18%|█▊ | 2200/12313 [1:38:47<7:32:14, 2.68s/it] 18%|█▊ | 2201/12313 [1:38:50<7:41:55, 2.74s/it] {'loss': 0.5769, 'grad_norm': 4.224735988915179, 'learning_rate': 4.7155881580703984e-06, 'epoch': 0.18} 18%|█▊ | 2201/12313 [1:38:50<7:41:55, 2.74s/it] 18%|█▊ | 2202/12313 [1:38:53<7:39:02, 2.72s/it] {'loss': 0.4985, 'grad_norm': 7.9851742695130685, 'learning_rate': 4.71528344753332e-06, 'epoch': 0.18} 18%|█▊ | 2202/12313 [1:38:53<7:39:02, 2.72s/it] 18%|█▊ | 2203/12313 [1:38:56<7:49:54, 2.79s/it] {'loss': 0.5391, 'grad_norm': 6.25379024192091, 'learning_rate': 4.714978583710352e-06, 'epoch': 0.18} 18%|█▊ | 2203/12313 [1:38:56<7:49:54, 2.79s/it] 18%|█▊ | 2204/12313 [1:38:59<8:02:12, 2.86s/it] {'loss': 0.567, 'grad_norm': 3.486820916489094, 'learning_rate': 4.714673566622589e-06, 'epoch': 0.18} 18%|█▊ | 2204/12313 [1:38:59<8:02:12, 2.86s/it] 18%|█▊ | 2205/12313 [1:39:01<7:43:13, 2.75s/it] {'loss': 0.7785, 'grad_norm': 3.4864795044696617, 'learning_rate': 4.714368396291135e-06, 'epoch': 0.18} 18%|█▊ | 2205/12313 [1:39:01<7:43:13, 2.75s/it] 18%|█▊ | 2206/12313 [1:39:04<7:41:07, 2.74s/it] {'loss': 0.4835, 'grad_norm': 3.6141664132762172, 'learning_rate': 4.714063072737108e-06, 'epoch': 0.18} 18%|█▊ | 2206/12313 [1:39:04<7:41:07, 2.74s/it] 18%|█▊ | 2207/12313 [1:39:07<7:32:10, 2.68s/it] {'loss': 0.5031, 'grad_norm': 5.839740334780792, 'learning_rate': 4.713757595981634e-06, 'epoch': 0.18} 18%|█▊ | 2207/12313 [1:39:07<7:32:10, 2.68s/it] 18%|█▊ | 2208/12313 [1:39:09<7:32:27, 2.69s/it] {'loss': 0.6997, 'grad_norm': 8.510815914171106, 'learning_rate': 4.713451966045851e-06, 'epoch': 0.18} 18%|█▊ | 2208/12313 [1:39:09<7:32:27, 2.69s/it] 18%|█▊ | 2209/12313 [1:39:12<7:28:17, 2.66s/it] {'loss': 0.5733, 'grad_norm': 4.803667331216526, 'learning_rate': 4.713146182950905e-06, 'epoch': 0.18} 18%|█▊ | 2209/12313 [1:39:12<7:28:17, 2.66s/it] 18%|█▊ | 2210/12313 [1:39:15<7:41:47, 2.74s/it] {'loss': 0.5907, 'grad_norm': 3.5388152829208015, 'learning_rate': 4.7128402467179575e-06, 'epoch': 0.18} 18%|█▊ | 2210/12313 [1:39:15<7:41:47, 2.74s/it] 18%|█▊ | 2211/12313 [1:39:17<7:35:21, 2.70s/it] {'loss': 0.5774, 'grad_norm': 3.03616427792538, 'learning_rate': 4.712534157368176e-06, 'epoch': 0.18} 18%|█▊ | 2211/12313 [1:39:17<7:35:21, 2.70s/it] 18%|█▊ | 2212/12313 [1:39:20<7:32:06, 2.69s/it] {'loss': 0.5262, 'grad_norm': 9.760631640572191, 'learning_rate': 4.7122279149227405e-06, 'epoch': 0.18} 18%|█▊ | 2212/12313 [1:39:20<7:32:06, 2.69s/it] 18%|█▊ | 2213/12313 [1:39:23<7:33:14, 2.69s/it] {'loss': 0.6952, 'grad_norm': 4.386181821659895, 'learning_rate': 4.711921519402841e-06, 'epoch': 0.18} 18%|█▊ | 2213/12313 [1:39:23<7:33:14, 2.69s/it] 18%|█▊ | 2214/12313 [1:39:26<7:42:47, 2.75s/it] {'loss': 0.5091, 'grad_norm': 3.3351010256653573, 'learning_rate': 4.711614970829679e-06, 'epoch': 0.18} 18%|█▊ | 2214/12313 [1:39:26<7:42:47, 2.75s/it] 18%|█▊ | 2215/12313 [1:39:28<7:35:23, 2.71s/it] {'loss': 0.4963, 'grad_norm': 6.9608846044796024, 'learning_rate': 4.711308269224466e-06, 'epoch': 0.18} 18%|█▊ | 2215/12313 [1:39:28<7:35:23, 2.71s/it] 18%|█▊ | 2216/12313 [1:39:31<7:31:17, 2.68s/it] {'loss': 0.5736, 'grad_norm': 5.215289754630284, 'learning_rate': 4.7110014146084235e-06, 'epoch': 0.18} 18%|█▊ | 2216/12313 [1:39:31<7:31:17, 2.68s/it] 18%|█▊ | 2217/12313 [1:39:33<7:25:50, 2.65s/it] {'loss': 0.7406, 'grad_norm': 11.93663447822261, 'learning_rate': 4.710694407002785e-06, 'epoch': 0.18} 18%|█▊ | 2217/12313 [1:39:33<7:25:50, 2.65s/it] 18%|█▊ | 2218/12313 [1:39:36<7:27:07, 2.66s/it] {'loss': 0.6448, 'grad_norm': 5.163645505705543, 'learning_rate': 4.710387246428794e-06, 'epoch': 0.18} 18%|█▊ | 2218/12313 [1:39:36<7:27:07, 2.66s/it] 18%|█▊ | 2219/12313 [1:39:39<7:29:41, 2.67s/it] {'loss': 0.7224, 'grad_norm': 4.661698490114284, 'learning_rate': 4.710079932907703e-06, 'epoch': 0.18} 18%|█▊ | 2219/12313 [1:39:39<7:29:41, 2.67s/it] 18%|█▊ | 2220/12313 [1:39:42<7:39:52, 2.73s/it] {'loss': 0.5251, 'grad_norm': 3.401258624128954, 'learning_rate': 4.7097724664607775e-06, 'epoch': 0.18} 18%|█▊ | 2220/12313 [1:39:42<7:39:52, 2.73s/it] 18%|█▊ | 2221/12313 [1:39:44<7:30:33, 2.68s/it] {'loss': 0.6384, 'grad_norm': 5.125047047447445, 'learning_rate': 4.709464847109292e-06, 'epoch': 0.18} 18%|█▊ | 2221/12313 [1:39:44<7:30:33, 2.68s/it] 18%|█▊ | 2222/12313 [1:39:47<7:20:18, 2.62s/it] {'loss': 0.5047, 'grad_norm': 4.023853533344583, 'learning_rate': 4.709157074874533e-06, 'epoch': 0.18} 18%|█▊ | 2222/12313 [1:39:47<7:20:18, 2.62s/it] 18%|█▊ | 2223/12313 [1:39:49<7:20:07, 2.62s/it] {'loss': 0.5204, 'grad_norm': 6.949692493391, 'learning_rate': 4.7088491497777965e-06, 'epoch': 0.18} 18%|█▊ | 2223/12313 [1:39:49<7:20:07, 2.62s/it] 18%|█▊ | 2224/12313 [1:39:52<7:36:28, 2.71s/it] {'loss': 0.5865, 'grad_norm': 4.10788473501353, 'learning_rate': 4.708541071840388e-06, 'epoch': 0.18} 18%|█▊ | 2224/12313 [1:39:52<7:36:28, 2.71s/it] 18%|█▊ | 2225/12313 [1:39:55<7:33:11, 2.70s/it] {'loss': 0.5692, 'grad_norm': 4.694036134420709, 'learning_rate': 4.708232841083628e-06, 'epoch': 0.18} 18%|█▊ | 2225/12313 [1:39:55<7:33:11, 2.70s/it] 18%|█▊ | 2226/12313 [1:39:58<7:36:00, 2.71s/it] {'loss': 0.5018, 'grad_norm': 5.994257706825592, 'learning_rate': 4.70792445752884e-06, 'epoch': 0.18} 18%|█▊ | 2226/12313 [1:39:58<7:36:00, 2.71s/it] 18%|█▊ | 2227/12313 [1:40:00<7:27:52, 2.66s/it] {'loss': 0.5381, 'grad_norm': 8.767129488205637, 'learning_rate': 4.707615921197366e-06, 'epoch': 0.18} 18%|█▊ | 2227/12313 [1:40:00<7:27:52, 2.66s/it] 18%|█▊ | 2228/12313 [1:40:03<7:36:06, 2.71s/it] {'loss': 0.7474, 'grad_norm': 4.847865205503708, 'learning_rate': 4.707307232110554e-06, 'epoch': 0.18} 18%|█▊ | 2228/12313 [1:40:03<7:36:06, 2.71s/it] 18%|█▊ | 2229/12313 [1:40:06<7:38:23, 2.73s/it] {'loss': 0.6027, 'grad_norm': 5.668800604259548, 'learning_rate': 4.706998390289763e-06, 'epoch': 0.18} 18%|█▊ | 2229/12313 [1:40:06<7:38:23, 2.73s/it] 18%|█▊ | 2230/12313 [1:40:08<7:31:50, 2.69s/it] {'loss': 0.5229, 'grad_norm': 6.745634612012881, 'learning_rate': 4.706689395756363e-06, 'epoch': 0.18} 18%|█▊ | 2230/12313 [1:40:08<7:31:50, 2.69s/it] 18%|█▊ | 2231/12313 [1:40:11<7:23:47, 2.64s/it] {'loss': 0.5253, 'grad_norm': 4.895045129217889, 'learning_rate': 4.706380248531737e-06, 'epoch': 0.18} 18%|█▊ | 2231/12313 [1:40:11<7:23:47, 2.64s/it] 18%|█▊ | 2232/12313 [1:40:13<7:12:42, 2.58s/it] {'loss': 0.6069, 'grad_norm': 4.810666628926101, 'learning_rate': 4.706070948637274e-06, 'epoch': 0.18} 18%|█▊ | 2232/12313 [1:40:13<7:12:42, 2.58s/it] 18%|█▊ | 2233/12313 [1:40:16<7:02:55, 2.52s/it] {'loss': 0.5576, 'grad_norm': 7.912604183646548, 'learning_rate': 4.705761496094377e-06, 'epoch': 0.18} 18%|█▊ | 2233/12313 [1:40:16<7:02:55, 2.52s/it] 18%|█▊ | 2234/12313 [1:40:18<7:02:26, 2.51s/it] {'loss': 0.6926, 'grad_norm': 4.074525434845986, 'learning_rate': 4.705451890924459e-06, 'epoch': 0.18} 18%|█▊ | 2234/12313 [1:40:18<7:02:26, 2.51s/it] 18%|█▊ | 2235/12313 [1:40:21<7:19:50, 2.62s/it] {'loss': 0.5167, 'grad_norm': 9.471081474256104, 'learning_rate': 4.705142133148943e-06, 'epoch': 0.18} 18%|█▊ | 2235/12313 [1:40:21<7:19:50, 2.62s/it] 18%|█▊ | 2236/12313 [1:40:24<7:13:06, 2.58s/it] {'loss': 0.6418, 'grad_norm': 4.8687622437884555, 'learning_rate': 4.70483222278926e-06, 'epoch': 0.18} 18%|█▊ | 2236/12313 [1:40:24<7:13:06, 2.58s/it] 18%|█▊ | 2237/12313 [1:40:26<7:10:40, 2.56s/it] {'loss': 0.5135, 'grad_norm': 4.3746758828868995, 'learning_rate': 4.704522159866857e-06, 'epoch': 0.18} 18%|█▊ | 2237/12313 [1:40:26<7:10:40, 2.56s/it] 18%|█▊ | 2238/12313 [1:40:29<7:14:10, 2.59s/it] {'loss': 0.6289, 'grad_norm': 4.148684813653182, 'learning_rate': 4.704211944403188e-06, 'epoch': 0.18} 18%|█▊ | 2238/12313 [1:40:29<7:14:10, 2.59s/it] 18%|█▊ | 2239/12313 [1:40:31<7:12:30, 2.58s/it] {'loss': 0.473, 'grad_norm': 4.101220246406701, 'learning_rate': 4.703901576419717e-06, 'epoch': 0.18} 18%|█▊ | 2239/12313 [1:40:31<7:12:30, 2.58s/it] 18%|█▊ | 2240/12313 [1:40:34<7:16:58, 2.60s/it] {'loss': 0.5761, 'grad_norm': 4.884276965609116, 'learning_rate': 4.703591055937922e-06, 'epoch': 0.18} 18%|█▊ | 2240/12313 [1:40:34<7:16:58, 2.60s/it] 18%|█▊ | 2241/12313 [1:40:37<7:38:27, 2.73s/it] {'loss': 0.5745, 'grad_norm': 9.506027279588086, 'learning_rate': 4.7032803829792875e-06, 'epoch': 0.18} 18%|█▊ | 2241/12313 [1:40:37<7:38:27, 2.73s/it] 18%|█▊ | 2242/12313 [1:40:40<7:37:42, 2.73s/it] {'loss': 0.6804, 'grad_norm': 4.5640936040307825, 'learning_rate': 4.702969557565312e-06, 'epoch': 0.18} 18%|█▊ | 2242/12313 [1:40:40<7:37:42, 2.73s/it] 18%|█▊ | 2243/12313 [1:40:43<7:37:43, 2.73s/it] {'loss': 0.4686, 'grad_norm': 5.16553253038635, 'learning_rate': 4.702658579717502e-06, 'epoch': 0.18} 18%|█▊ | 2243/12313 [1:40:43<7:37:43, 2.73s/it] 18%|█▊ | 2244/12313 [1:40:45<7:27:06, 2.66s/it] {'loss': 0.5184, 'grad_norm': 5.1964649031938395, 'learning_rate': 4.702347449457375e-06, 'epoch': 0.18} 18%|█▊ | 2244/12313 [1:40:45<7:27:06, 2.66s/it] 18%|█▊ | 2245/12313 [1:40:48<7:27:46, 2.67s/it] {'loss': 0.3698, 'grad_norm': 8.556515016420585, 'learning_rate': 4.702036166806461e-06, 'epoch': 0.18} 18%|█▊ | 2245/12313 [1:40:48<7:27:46, 2.67s/it] 18%|█▊ | 2246/12313 [1:40:51<7:38:47, 2.73s/it] {'loss': 0.5364, 'grad_norm': 7.054847431413247, 'learning_rate': 4.7017247317862976e-06, 'epoch': 0.18} 18%|█▊ | 2246/12313 [1:40:51<7:38:47, 2.73s/it] 18%|█▊ | 2247/12313 [1:40:53<7:39:28, 2.74s/it] {'loss': 0.4901, 'grad_norm': 5.019891726347235, 'learning_rate': 4.701413144418437e-06, 'epoch': 0.18} 18%|█▊ | 2247/12313 [1:40:53<7:39:28, 2.74s/it] 18%|█▊ | 2248/12313 [1:40:56<7:24:48, 2.65s/it] {'loss': 0.4601, 'grad_norm': 4.195016853555098, 'learning_rate': 4.701101404724435e-06, 'epoch': 0.18} 18%|█▊ | 2248/12313 [1:40:56<7:24:48, 2.65s/it] 18%|█▊ | 2249/12313 [1:40:58<7:26:15, 2.66s/it] {'loss': 0.6267, 'grad_norm': 5.378788487799258, 'learning_rate': 4.700789512725867e-06, 'epoch': 0.18} 18%|█▊ | 2249/12313 [1:40:58<7:26:15, 2.66s/it] 18%|█▊ | 2250/12313 [1:41:01<7:29:03, 2.68s/it] {'loss': 0.6438, 'grad_norm': 3.793906030224367, 'learning_rate': 4.700477468444311e-06, 'epoch': 0.18} 18%|█▊ | 2250/12313 [1:41:01<7:29:03, 2.68s/it] 18%|█▊ | 2251/12313 [1:41:04<7:24:17, 2.65s/it] {'loss': 0.6933, 'grad_norm': 4.868151253449179, 'learning_rate': 4.700165271901361e-06, 'epoch': 0.18} 18%|█▊ | 2251/12313 [1:41:04<7:24:17, 2.65s/it] 18%|█▊ | 2252/12313 [1:41:06<7:24:07, 2.65s/it] {'loss': 0.4894, 'grad_norm': 7.777361957555963, 'learning_rate': 4.699852923118618e-06, 'epoch': 0.18} 18%|█▊ | 2252/12313 [1:41:06<7:24:07, 2.65s/it] 18%|█▊ | 2253/12313 [1:41:09<7:30:21, 2.69s/it] {'loss': 0.4232, 'grad_norm': 4.106355721520342, 'learning_rate': 4.699540422117695e-06, 'epoch': 0.18} 18%|█▊ | 2253/12313 [1:41:09<7:30:21, 2.69s/it] 18%|█▊ | 2254/12313 [1:41:12<7:24:02, 2.65s/it] {'loss': 0.5387, 'grad_norm': 5.430574051132866, 'learning_rate': 4.699227768920216e-06, 'epoch': 0.18} 18%|█▊ | 2254/12313 [1:41:12<7:24:02, 2.65s/it] 18%|█▊ | 2255/12313 [1:41:14<7:28:23, 2.67s/it] {'loss': 0.5371, 'grad_norm': 6.575598696133923, 'learning_rate': 4.6989149635478145e-06, 'epoch': 0.18} 18%|█▊ | 2255/12313 [1:41:15<7:28:23, 2.67s/it] 18%|█▊ | 2256/12313 [1:41:17<7:33:01, 2.70s/it] {'loss': 0.5012, 'grad_norm': 5.4676777555816765, 'learning_rate': 4.698602006022136e-06, 'epoch': 0.18} 18%|█▊ | 2256/12313 [1:41:17<7:33:01, 2.70s/it] 18%|█▊ | 2257/12313 [1:41:20<7:26:59, 2.67s/it] {'loss': 0.6324, 'grad_norm': 4.576711281154184, 'learning_rate': 4.698288896364834e-06, 'epoch': 0.18} 18%|█▊ | 2257/12313 [1:41:20<7:26:59, 2.67s/it] 18%|█▊ | 2258/12313 [1:41:22<7:16:15, 2.60s/it] {'loss': 0.5405, 'grad_norm': 11.976819517596883, 'learning_rate': 4.697975634597574e-06, 'epoch': 0.18} 18%|█▊ | 2258/12313 [1:41:22<7:16:15, 2.60s/it] 18%|█▊ | 2259/12313 [1:41:25<7:12:56, 2.58s/it] {'loss': 0.5355, 'grad_norm': 6.301644428134142, 'learning_rate': 4.697662220742033e-06, 'epoch': 0.18} 18%|█▊ | 2259/12313 [1:41:25<7:12:56, 2.58s/it] 18%|█▊ | 2260/12313 [1:41:27<7:08:13, 2.56s/it] {'loss': 0.5812, 'grad_norm': 16.567162364593425, 'learning_rate': 4.697348654819898e-06, 'epoch': 0.18} 18%|█▊ | 2260/12313 [1:41:27<7:08:13, 2.56s/it] 18%|█▊ | 2261/12313 [1:41:30<7:06:00, 2.54s/it] {'loss': 0.6177, 'grad_norm': 5.845436104952721, 'learning_rate': 4.697034936852865e-06, 'epoch': 0.18} 18%|█▊ | 2261/12313 [1:41:30<7:06:00, 2.54s/it] 18%|█▊ | 2262/12313 [1:41:33<7:13:33, 2.59s/it] {'loss': 0.6487, 'grad_norm': 3.4478757836782608, 'learning_rate': 4.6967210668626415e-06, 'epoch': 0.18} 18%|█▊ | 2262/12313 [1:41:33<7:13:33, 2.59s/it] 18%|█▊ | 2263/12313 [1:41:35<7:09:28, 2.56s/it] {'loss': 0.6865, 'grad_norm': 8.26455582424995, 'learning_rate': 4.696407044870947e-06, 'epoch': 0.18} 18%|█▊ | 2263/12313 [1:41:35<7:09:28, 2.56s/it] 18%|█▊ | 2264/12313 [1:41:38<7:16:26, 2.61s/it] {'loss': 0.4881, 'grad_norm': 3.6489257014011924, 'learning_rate': 4.696092870899509e-06, 'epoch': 0.18} 18%|█▊ | 2264/12313 [1:41:38<7:16:26, 2.61s/it] 18%|█▊ | 2265/12313 [1:41:40<7:21:50, 2.64s/it] {'loss': 0.5365, 'grad_norm': 7.476743860852032, 'learning_rate': 4.695778544970066e-06, 'epoch': 0.18} 18%|█▊ | 2265/12313 [1:41:40<7:21:50, 2.64s/it] 18%|█▊ | 2266/12313 [1:41:43<7:18:49, 2.62s/it] {'loss': 0.5978, 'grad_norm': 5.335228274279977, 'learning_rate': 4.695464067104371e-06, 'epoch': 0.18} 18%|█▊ | 2266/12313 [1:41:43<7:18:49, 2.62s/it] 18%|█▊ | 2267/12313 [1:41:46<7:22:10, 2.64s/it] {'loss': 0.769, 'grad_norm': 7.0816544622030175, 'learning_rate': 4.6951494373241805e-06, 'epoch': 0.18} 18%|█▊ | 2267/12313 [1:41:46<7:22:10, 2.64s/it] 18%|█▊ | 2268/12313 [1:41:48<7:23:26, 2.65s/it] {'loss': 0.627, 'grad_norm': 3.851216245811756, 'learning_rate': 4.694834655651266e-06, 'epoch': 0.18} 18%|█▊ | 2268/12313 [1:41:48<7:23:26, 2.65s/it] 18%|█▊ | 2269/12313 [1:41:51<7:12:03, 2.58s/it] {'loss': 0.8659, 'grad_norm': 4.805289628209942, 'learning_rate': 4.6945197221074104e-06, 'epoch': 0.18} 18%|█▊ | 2269/12313 [1:41:51<7:12:03, 2.58s/it] 18%|█▊ | 2270/12313 [1:41:53<7:12:41, 2.59s/it] {'loss': 0.5485, 'grad_norm': 7.794985897939018, 'learning_rate': 4.694204636714403e-06, 'epoch': 0.18} 18%|█▊ | 2270/12313 [1:41:53<7:12:41, 2.59s/it] 18%|█▊ | 2271/12313 [1:41:57<7:49:13, 2.80s/it] {'loss': 0.5583, 'grad_norm': 20.84411786544806, 'learning_rate': 4.693889399494049e-06, 'epoch': 0.18} 18%|█▊ | 2271/12313 [1:41:57<7:49:13, 2.80s/it] 18%|█▊ | 2272/12313 [1:41:59<7:38:39, 2.74s/it] {'loss': 0.7422, 'grad_norm': 4.8093342027022326, 'learning_rate': 4.693574010468159e-06, 'epoch': 0.18} 18%|█▊ | 2272/12313 [1:41:59<7:38:39, 2.74s/it] 18%|█▊ | 2273/12313 [1:42:02<7:36:43, 2.73s/it] {'loss': 0.6693, 'grad_norm': 4.897421838824223, 'learning_rate': 4.693258469658557e-06, 'epoch': 0.18} 18%|█▊ | 2273/12313 [1:42:02<7:36:43, 2.73s/it] 18%|█▊ | 2274/12313 [1:42:05<7:31:34, 2.70s/it] {'loss': 0.5361, 'grad_norm': 5.418227314133965, 'learning_rate': 4.692942777087076e-06, 'epoch': 0.18} 18%|█▊ | 2274/12313 [1:42:05<7:31:34, 2.70s/it] 18%|█▊ | 2275/12313 [1:42:07<7:29:58, 2.69s/it] {'loss': 0.6755, 'grad_norm': 4.1871350066526745, 'learning_rate': 4.692626932775561e-06, 'epoch': 0.18} 18%|█▊ | 2275/12313 [1:42:07<7:29:58, 2.69s/it] 18%|█▊ | 2276/12313 [1:42:10<7:24:45, 2.66s/it] {'loss': 0.7373, 'grad_norm': 10.739798887484266, 'learning_rate': 4.6923109367458665e-06, 'epoch': 0.18} 18%|█▊ | 2276/12313 [1:42:10<7:24:45, 2.66s/it] 18%|█▊ | 2277/12313 [1:42:13<7:27:10, 2.67s/it] {'loss': 0.5737, 'grad_norm': 4.134435714566178, 'learning_rate': 4.6919947890198585e-06, 'epoch': 0.18} 18%|█▊ | 2277/12313 [1:42:13<7:27:10, 2.67s/it] 19%|█▊ | 2278/12313 [1:42:15<7:34:34, 2.72s/it] {'loss': 0.7376, 'grad_norm': 4.222974710982374, 'learning_rate': 4.691678489619411e-06, 'epoch': 0.19} 19%|█▊ | 2278/12313 [1:42:15<7:34:34, 2.72s/it] 19%|█▊ | 2279/12313 [1:42:18<7:41:35, 2.76s/it] {'loss': 0.6068, 'grad_norm': 4.6176435172773065, 'learning_rate': 4.691362038566411e-06, 'epoch': 0.19} 19%|█▊ | 2279/12313 [1:42:18<7:41:35, 2.76s/it] 19%|█▊ | 2280/12313 [1:42:21<7:31:38, 2.70s/it] {'loss': 0.5776, 'grad_norm': 3.8516607452654577, 'learning_rate': 4.691045435882758e-06, 'epoch': 0.19} 19%|█▊ | 2280/12313 [1:42:21<7:31:38, 2.70s/it] 19%|█▊ | 2281/12313 [1:42:24<7:32:00, 2.70s/it] {'loss': 0.4589, 'grad_norm': 4.380266220174846, 'learning_rate': 4.690728681590354e-06, 'epoch': 0.19} 19%|█▊ | 2281/12313 [1:42:24<7:32:00, 2.70s/it] 19%|█▊ | 2282/12313 [1:42:26<7:30:38, 2.70s/it] {'loss': 0.4806, 'grad_norm': 20.10704260314573, 'learning_rate': 4.6904117757111215e-06, 'epoch': 0.19} 19%|█▊ | 2282/12313 [1:42:26<7:30:38, 2.70s/it] 19%|█▊ | 2283/12313 [1:42:29<7:21:34, 2.64s/it] {'loss': 0.5437, 'grad_norm': 4.622570996308856, 'learning_rate': 4.6900947182669855e-06, 'epoch': 0.19} 19%|█▊ | 2283/12313 [1:42:29<7:21:34, 2.64s/it] 19%|█▊ | 2284/12313 [1:42:31<7:12:21, 2.59s/it] {'loss': 0.6253, 'grad_norm': 3.4999814335004125, 'learning_rate': 4.689777509279886e-06, 'epoch': 0.19} 19%|█▊ | 2284/12313 [1:42:31<7:12:21, 2.59s/it] 19%|█▊ | 2285/12313 [1:42:34<7:29:13, 2.69s/it] {'loss': 0.6941, 'grad_norm': 8.159646406598347, 'learning_rate': 4.689460148771773e-06, 'epoch': 0.19} 19%|█▊ | 2285/12313 [1:42:34<7:29:13, 2.69s/it] 19%|█▊ | 2286/12313 [1:42:37<7:41:30, 2.76s/it] {'loss': 0.5396, 'grad_norm': 4.119601877397449, 'learning_rate': 4.6891426367646046e-06, 'epoch': 0.19} 19%|█▊ | 2286/12313 [1:42:37<7:41:30, 2.76s/it] 19%|█▊ | 2287/12313 [1:42:40<7:27:54, 2.68s/it] {'loss': 0.3815, 'grad_norm': 5.783579437576225, 'learning_rate': 4.6888249732803516e-06, 'epoch': 0.19} 19%|█▊ | 2287/12313 [1:42:40<7:27:54, 2.68s/it] 19%|█▊ | 2288/12313 [1:42:42<7:22:52, 2.65s/it] {'loss': 0.5961, 'grad_norm': 4.479086903423486, 'learning_rate': 4.688507158340994e-06, 'epoch': 0.19} 19%|█▊ | 2288/12313 [1:42:42<7:22:52, 2.65s/it] 19%|█▊ | 2289/12313 [1:42:45<7:23:50, 2.66s/it] {'loss': 0.5213, 'grad_norm': 5.521280967881707, 'learning_rate': 4.688189191968524e-06, 'epoch': 0.19} 19%|█▊ | 2289/12313 [1:42:45<7:23:50, 2.66s/it] 19%|█▊ | 2290/12313 [1:42:48<7:30:07, 2.69s/it] {'loss': 0.5069, 'grad_norm': 5.560685189930327, 'learning_rate': 4.687871074184944e-06, 'epoch': 0.19} 19%|█▊ | 2290/12313 [1:42:48<7:30:07, 2.69s/it] 19%|█▊ | 2291/12313 [1:42:50<7:26:30, 2.67s/it] {'loss': 0.4698, 'grad_norm': 8.713540857848809, 'learning_rate': 4.687552805012263e-06, 'epoch': 0.19} 19%|█▊ | 2291/12313 [1:42:50<7:26:30, 2.67s/it] 19%|█▊ | 2292/12313 [1:42:53<7:28:28, 2.69s/it] {'loss': 0.5941, 'grad_norm': 7.120678804920857, 'learning_rate': 4.687234384472506e-06, 'epoch': 0.19} 19%|█▊ | 2292/12313 [1:42:53<7:28:28, 2.69s/it] 19%|█▊ | 2293/12313 [1:42:55<7:21:54, 2.65s/it] {'loss': 0.532, 'grad_norm': 3.7747766229799105, 'learning_rate': 4.686915812587706e-06, 'epoch': 0.19} 19%|█▊ | 2293/12313 [1:42:55<7:21:54, 2.65s/it] 19%|█▊ | 2294/12313 [1:42:58<7:15:22, 2.61s/it] {'loss': 0.6332, 'grad_norm': 5.78663404222734, 'learning_rate': 4.686597089379905e-06, 'epoch': 0.19} 19%|█▊ | 2294/12313 [1:42:58<7:15:22, 2.61s/it] 19%|█▊ | 2295/12313 [1:43:01<7:33:52, 2.72s/it] {'loss': 0.551, 'grad_norm': 5.344791467556041, 'learning_rate': 4.6862782148711584e-06, 'epoch': 0.19} 19%|█▊ | 2295/12313 [1:43:01<7:33:52, 2.72s/it] 19%|█▊ | 2296/12313 [1:43:04<7:36:43, 2.74s/it] {'loss': 0.7081, 'grad_norm': 3.2548480768417183, 'learning_rate': 4.685959189083531e-06, 'epoch': 0.19} 19%|█▊ | 2296/12313 [1:43:04<7:36:43, 2.74s/it] 19%|█▊ | 2297/12313 [1:43:06<7:29:17, 2.69s/it] {'loss': 0.6577, 'grad_norm': 4.522340647304046, 'learning_rate': 4.685640012039095e-06, 'epoch': 0.19} 19%|█▊ | 2297/12313 [1:43:06<7:29:17, 2.69s/it] 19%|█▊ | 2298/12313 [1:43:09<7:32:38, 2.71s/it] {'loss': 0.5544, 'grad_norm': 4.317215335517002, 'learning_rate': 4.685320683759939e-06, 'epoch': 0.19} 19%|█▊ | 2298/12313 [1:43:09<7:32:38, 2.71s/it] 19%|█▊ | 2299/12313 [1:43:12<7:27:45, 2.68s/it] {'loss': 0.4849, 'grad_norm': 5.363458185770196, 'learning_rate': 4.685001204268156e-06, 'epoch': 0.19} 19%|█▊ | 2299/12313 [1:43:12<7:27:45, 2.68s/it] 19%|█▊ | 2300/12313 [1:43:14<7:30:13, 2.70s/it] {'loss': 0.5735, 'grad_norm': 25.593738509492916, 'learning_rate': 4.684681573585854e-06, 'epoch': 0.19} 19%|█▊ | 2300/12313 [1:43:14<7:30:13, 2.70s/it] 19%|█▊ | 2301/12313 [1:43:17<7:15:30, 2.61s/it] {'loss': 0.5465, 'grad_norm': 6.172329696323195, 'learning_rate': 4.684361791735149e-06, 'epoch': 0.19} 19%|█▊ | 2301/12313 [1:43:17<7:15:30, 2.61s/it] 19%|█▊ | 2302/12313 [1:43:19<7:05:00, 2.55s/it] {'loss': 0.6078, 'grad_norm': 4.8669050067958874, 'learning_rate': 4.684041858738169e-06, 'epoch': 0.19} 19%|█▊ | 2302/12313 [1:43:19<7:05:00, 2.55s/it] 19%|█▊ | 2303/12313 [1:43:22<7:12:32, 2.59s/it] {'loss': 0.5647, 'grad_norm': 12.729610338065322, 'learning_rate': 4.683721774617052e-06, 'epoch': 0.19} 19%|█▊ | 2303/12313 [1:43:22<7:12:32, 2.59s/it] 19%|█▊ | 2304/12313 [1:43:25<7:17:13, 2.62s/it] {'loss': 0.5026, 'grad_norm': 5.8134799742582475, 'learning_rate': 4.6834015393939445e-06, 'epoch': 0.19} 19%|█▊ | 2304/12313 [1:43:25<7:17:13, 2.62s/it] 19%|█▊ | 2305/12313 [1:43:27<7:21:47, 2.65s/it] {'loss': 0.4811, 'grad_norm': 3.5817996920030257, 'learning_rate': 4.683081153091006e-06, 'epoch': 0.19} 19%|█▊ | 2305/12313 [1:43:27<7:21:47, 2.65s/it] 19%|█▊ | 2306/12313 [1:43:30<7:28:24, 2.69s/it] {'loss': 0.6439, 'grad_norm': 4.114121320477102, 'learning_rate': 4.682760615730405e-06, 'epoch': 0.19} 19%|█▊ | 2306/12313 [1:43:30<7:28:24, 2.69s/it] 19%|█▊ | 2307/12313 [1:43:33<7:19:08, 2.63s/it] {'loss': 0.6019, 'grad_norm': 7.202440009743923, 'learning_rate': 4.682439927334323e-06, 'epoch': 0.19} 19%|█▊ | 2307/12313 [1:43:33<7:19:08, 2.63s/it] 19%|█▊ | 2308/12313 [1:43:35<7:27:53, 2.69s/it] {'loss': 0.7116, 'grad_norm': 5.323112544773871, 'learning_rate': 4.682119087924948e-06, 'epoch': 0.19} 19%|█▊ | 2308/12313 [1:43:35<7:27:53, 2.69s/it] 19%|█▉ | 2309/12313 [1:43:38<7:35:15, 2.73s/it] {'loss': 0.6427, 'grad_norm': 3.6253125361558967, 'learning_rate': 4.681798097524479e-06, 'epoch': 0.19} 19%|█▉ | 2309/12313 [1:43:38<7:35:15, 2.73s/it] 19%|█▉ | 2310/12313 [1:43:41<7:29:03, 2.69s/it] {'loss': 0.4505, 'grad_norm': 4.355698784301835, 'learning_rate': 4.681476956155131e-06, 'epoch': 0.19} 19%|█▉ | 2310/12313 [1:43:41<7:29:03, 2.69s/it] 19%|█▉ | 2311/12313 [1:43:44<7:41:18, 2.77s/it] {'loss': 0.5654, 'grad_norm': 3.395486667581571, 'learning_rate': 4.681155663839122e-06, 'epoch': 0.19} 19%|█▉ | 2311/12313 [1:43:44<7:41:18, 2.77s/it] 19%|█▉ | 2312/12313 [1:43:47<7:39:57, 2.76s/it] {'loss': 0.5545, 'grad_norm': 5.363908094574731, 'learning_rate': 4.680834220598685e-06, 'epoch': 0.19} 19%|█▉ | 2312/12313 [1:43:47<7:39:57, 2.76s/it] 19%|█▉ | 2313/12313 [1:43:49<7:26:56, 2.68s/it] {'loss': 0.6393, 'grad_norm': 4.304796229403271, 'learning_rate': 4.6805126264560605e-06, 'epoch': 0.19} 19%|█▉ | 2313/12313 [1:43:49<7:26:56, 2.68s/it] 19%|█▉ | 2314/12313 [1:43:52<7:21:36, 2.65s/it] {'loss': 0.6884, 'grad_norm': 4.291548258367333, 'learning_rate': 4.680190881433504e-06, 'epoch': 0.19} 19%|█▉ | 2314/12313 [1:43:52<7:21:36, 2.65s/it] 19%|█▉ | 2315/12313 [1:43:54<7:20:50, 2.65s/it] {'loss': 0.5777, 'grad_norm': 9.784318271093905, 'learning_rate': 4.679868985553276e-06, 'epoch': 0.19} 19%|█▉ | 2315/12313 [1:43:54<7:20:50, 2.65s/it] 19%|█▉ | 2316/12313 [1:43:57<7:23:22, 2.66s/it] {'loss': 0.47, 'grad_norm': 7.761537889560317, 'learning_rate': 4.6795469388376525e-06, 'epoch': 0.19} 19%|█▉ | 2316/12313 [1:43:57<7:23:22, 2.66s/it] 19%|█▉ | 2317/12313 [1:44:00<7:19:05, 2.64s/it] {'loss': 0.6137, 'grad_norm': 6.995156273431482, 'learning_rate': 4.6792247413089145e-06, 'epoch': 0.19} 19%|█▉ | 2317/12313 [1:44:00<7:19:05, 2.64s/it] 19%|█▉ | 2318/12313 [1:44:02<7:06:33, 2.56s/it] {'loss': 0.6229, 'grad_norm': 4.313226522351422, 'learning_rate': 4.678902392989359e-06, 'epoch': 0.19} 19%|█▉ | 2318/12313 [1:44:02<7:06:33, 2.56s/it] 19%|█▉ | 2319/12313 [1:44:05<7:28:51, 2.69s/it] {'loss': 0.5712, 'grad_norm': 5.081068946857541, 'learning_rate': 4.678579893901288e-06, 'epoch': 0.19} 19%|█▉ | 2319/12313 [1:44:05<7:28:51, 2.69s/it] 19%|█▉ | 2320/12313 [1:44:07<7:16:13, 2.62s/it] {'loss': 0.5531, 'grad_norm': 4.809789843673943, 'learning_rate': 4.678257244067019e-06, 'epoch': 0.19} 19%|█▉ | 2320/12313 [1:44:07<7:16:13, 2.62s/it] 19%|█▉ | 2321/12313 [1:44:10<7:34:53, 2.73s/it] {'loss': 0.5659, 'grad_norm': 3.8452865625471255, 'learning_rate': 4.677934443508877e-06, 'epoch': 0.19} 19%|█▉ | 2321/12313 [1:44:10<7:34:53, 2.73s/it] 19%|█▉ | 2322/12313 [1:44:13<7:21:58, 2.65s/it] {'loss': 0.5518, 'grad_norm': 4.917273325549435, 'learning_rate': 4.6776114922491985e-06, 'epoch': 0.19} 19%|█▉ | 2322/12313 [1:44:13<7:21:58, 2.65s/it] 19%|█▉ | 2323/12313 [1:44:16<7:42:51, 2.78s/it] {'loss': 0.5119, 'grad_norm': 4.459277191818069, 'learning_rate': 4.67728839031033e-06, 'epoch': 0.19} 19%|█▉ | 2323/12313 [1:44:16<7:42:51, 2.78s/it] 19%|█▉ | 2324/12313 [1:44:18<7:31:22, 2.71s/it] {'loss': 0.5824, 'grad_norm': 8.098290875630656, 'learning_rate': 4.676965137714626e-06, 'epoch': 0.19} 19%|█▉ | 2324/12313 [1:44:19<7:31:22, 2.71s/it] 19%|█▉ | 2325/12313 [1:44:21<7:34:13, 2.73s/it] {'loss': 0.5991, 'grad_norm': 5.792865775052247, 'learning_rate': 4.676641734484457e-06, 'epoch': 0.19} 19%|█▉ | 2325/12313 [1:44:21<7:34:13, 2.73s/it] 19%|█▉ | 2326/12313 [1:44:24<7:14:41, 2.61s/it] {'loss': 0.6869, 'grad_norm': 6.158616589247779, 'learning_rate': 4.6763181806422e-06, 'epoch': 0.19} 19%|█▉ | 2326/12313 [1:44:24<7:14:41, 2.61s/it] 19%|█▉ | 2327/12313 [1:44:26<7:19:09, 2.64s/it] {'loss': 0.4876, 'grad_norm': 6.1672909888851395, 'learning_rate': 4.675994476210243e-06, 'epoch': 0.19} 19%|█▉ | 2327/12313 [1:44:26<7:19:09, 2.64s/it] 19%|█▉ | 2328/12313 [1:44:29<7:24:51, 2.67s/it] {'loss': 0.4851, 'grad_norm': 4.374487244986078, 'learning_rate': 4.675670621210985e-06, 'epoch': 0.19} 19%|█▉ | 2328/12313 [1:44:29<7:24:51, 2.67s/it] 19%|█▉ | 2329/12313 [1:44:32<7:22:13, 2.66s/it] {'loss': 0.5938, 'grad_norm': 4.378802636321053, 'learning_rate': 4.675346615666834e-06, 'epoch': 0.19} 19%|█▉ | 2329/12313 [1:44:32<7:22:13, 2.66s/it] 19%|█▉ | 2330/12313 [1:44:34<7:24:31, 2.67s/it] {'loss': 0.5517, 'grad_norm': 6.805865659937306, 'learning_rate': 4.675022459600209e-06, 'epoch': 0.19} 19%|█▉ | 2330/12313 [1:44:34<7:24:31, 2.67s/it] 19%|█▉ | 2331/12313 [1:44:37<7:16:00, 2.62s/it] {'loss': 0.5793, 'grad_norm': 10.069257270167085, 'learning_rate': 4.674698153033542e-06, 'epoch': 0.19} 19%|█▉ | 2331/12313 [1:44:37<7:16:00, 2.62s/it] 19%|█▉ | 2332/12313 [1:44:39<7:14:20, 2.61s/it] {'loss': 0.4501, 'grad_norm': 5.002867430656955, 'learning_rate': 4.674373695989272e-06, 'epoch': 0.19} 19%|█▉ | 2332/12313 [1:44:39<7:14:20, 2.61s/it] 19%|█▉ | 2333/12313 [1:44:42<7:17:10, 2.63s/it] {'loss': 0.6941, 'grad_norm': 6.5558875829671726, 'learning_rate': 4.67404908848985e-06, 'epoch': 0.19} 19%|█▉ | 2333/12313 [1:44:42<7:17:10, 2.63s/it] 19%|█▉ | 2334/12313 [1:44:45<7:41:54, 2.78s/it] {'loss': 0.5876, 'grad_norm': 3.310181376592309, 'learning_rate': 4.673724330557737e-06, 'epoch': 0.19} 19%|█▉ | 2334/12313 [1:44:45<7:41:54, 2.78s/it] 19%|█▉ | 2335/12313 [1:44:48<7:42:44, 2.78s/it] {'loss': 0.6729, 'grad_norm': 4.3922451080442615, 'learning_rate': 4.673399422215405e-06, 'epoch': 0.19} 19%|█▉ | 2335/12313 [1:44:48<7:42:44, 2.78s/it] 19%|█▉ | 2336/12313 [1:44:51<7:41:50, 2.78s/it] {'loss': 0.6935, 'grad_norm': 3.2619584312724124, 'learning_rate': 4.673074363485336e-06, 'epoch': 0.19} 19%|█▉ | 2336/12313 [1:44:51<7:41:50, 2.78s/it] 19%|█▉ | 2337/12313 [1:44:53<7:36:06, 2.74s/it] {'loss': 0.4633, 'grad_norm': 4.553167577761572, 'learning_rate': 4.672749154390021e-06, 'epoch': 0.19} 19%|█▉ | 2337/12313 [1:44:53<7:36:06, 2.74s/it] 19%|█▉ | 2338/12313 [1:44:56<7:28:50, 2.70s/it] {'loss': 0.6547, 'grad_norm': 3.149250992164633, 'learning_rate': 4.6724237949519635e-06, 'epoch': 0.19} 19%|█▉ | 2338/12313 [1:44:56<7:28:50, 2.70s/it] 19%|█▉ | 2339/12313 [1:44:59<7:38:50, 2.76s/it] {'loss': 0.5031, 'grad_norm': 3.8540236289067646, 'learning_rate': 4.672098285193677e-06, 'epoch': 0.19} 19%|█▉ | 2339/12313 [1:44:59<7:38:50, 2.76s/it] 19%|█▉ | 2340/12313 [1:45:02<7:30:15, 2.71s/it] {'loss': 0.5433, 'grad_norm': 5.029855666317853, 'learning_rate': 4.671772625137685e-06, 'epoch': 0.19} 19%|█▉ | 2340/12313 [1:45:02<7:30:15, 2.71s/it] 19%|█▉ | 2341/12313 [1:45:04<7:29:19, 2.70s/it] {'loss': 0.6162, 'grad_norm': 4.4089431032158, 'learning_rate': 4.6714468148065215e-06, 'epoch': 0.19} 19%|█▉ | 2341/12313 [1:45:04<7:29:19, 2.70s/it] 19%|█▉ | 2342/12313 [1:45:07<7:47:59, 2.82s/it] {'loss': 0.5549, 'grad_norm': 4.951136934389378, 'learning_rate': 4.67112085422273e-06, 'epoch': 0.19} 19%|█▉ | 2342/12313 [1:45:07<7:47:59, 2.82s/it] 19%|█▉ | 2343/12313 [1:45:10<7:32:34, 2.72s/it] {'loss': 0.5311, 'grad_norm': 4.969440488977865, 'learning_rate': 4.6707947434088665e-06, 'epoch': 0.19} 19%|█▉ | 2343/12313 [1:45:10<7:32:34, 2.72s/it] 19%|█▉ | 2344/12313 [1:45:13<7:34:37, 2.74s/it] {'loss': 0.5161, 'grad_norm': 7.467675141875417, 'learning_rate': 4.670468482387495e-06, 'epoch': 0.19} 19%|█▉ | 2344/12313 [1:45:13<7:34:37, 2.74s/it] 19%|█▉ | 2345/12313 [1:45:15<7:30:53, 2.71s/it] {'loss': 0.5573, 'grad_norm': 5.817463721744332, 'learning_rate': 4.670142071181192e-06, 'epoch': 0.19} 19%|█▉ | 2345/12313 [1:45:15<7:30:53, 2.71s/it] 19%|█▉ | 2346/12313 [1:45:18<7:28:31, 2.70s/it] {'loss': 0.4921, 'grad_norm': 3.588261641157244, 'learning_rate': 4.6698155098125435e-06, 'epoch': 0.19} 19%|█▉ | 2346/12313 [1:45:18<7:28:31, 2.70s/it] 19%|█▉ | 2347/12313 [1:45:20<7:20:13, 2.65s/it] {'loss': 0.6335, 'grad_norm': 5.247075946408668, 'learning_rate': 4.6694887983041434e-06, 'epoch': 0.19} 19%|█▉ | 2347/12313 [1:45:20<7:20:13, 2.65s/it] 19%|█▉ | 2348/12313 [1:45:23<7:17:35, 2.63s/it] {'loss': 0.6301, 'grad_norm': 5.98092813691066, 'learning_rate': 4.669161936678602e-06, 'epoch': 0.19} 19%|█▉ | 2348/12313 [1:45:23<7:17:35, 2.63s/it] 19%|█▉ | 2349/12313 [1:45:26<7:28:39, 2.70s/it] {'loss': 0.605, 'grad_norm': 4.882612458777484, 'learning_rate': 4.668834924958534e-06, 'epoch': 0.19} 19%|█▉ | 2349/12313 [1:45:26<7:28:39, 2.70s/it] 19%|█▉ | 2350/12313 [1:45:29<7:28:45, 2.70s/it] {'loss': 0.7428, 'grad_norm': 4.135928802898827, 'learning_rate': 4.668507763166568e-06, 'epoch': 0.19} 19%|█▉ | 2350/12313 [1:45:29<7:28:45, 2.70s/it] 19%|█▉ | 2351/12313 [1:45:31<7:24:05, 2.67s/it] {'loss': 0.6098, 'grad_norm': 6.436181992978232, 'learning_rate': 4.668180451325341e-06, 'epoch': 0.19} 19%|█▉ | 2351/12313 [1:45:31<7:24:05, 2.67s/it] 19%|█▉ | 2352/12313 [1:45:34<7:23:01, 2.67s/it] {'loss': 0.5165, 'grad_norm': 8.974548488637932, 'learning_rate': 4.667852989457502e-06, 'epoch': 0.19} 19%|█▉ | 2352/12313 [1:45:34<7:23:01, 2.67s/it] 19%|█▉ | 2353/12313 [1:45:37<7:20:12, 2.65s/it] {'loss': 0.5658, 'grad_norm': 4.064278950992003, 'learning_rate': 4.6675253775857096e-06, 'epoch': 0.19} 19%|█▉ | 2353/12313 [1:45:37<7:20:12, 2.65s/it] 19%|█▉ | 2354/12313 [1:45:39<7:16:53, 2.63s/it] {'loss': 0.6748, 'grad_norm': 4.497922853159247, 'learning_rate': 4.667197615732633e-06, 'epoch': 0.19} 19%|█▉ | 2354/12313 [1:45:39<7:16:53, 2.63s/it] 19%|█▉ | 2355/12313 [1:45:42<7:27:46, 2.70s/it] {'loss': 0.5488, 'grad_norm': 4.494597527974153, 'learning_rate': 4.66686970392095e-06, 'epoch': 0.19} 19%|█▉ | 2355/12313 [1:45:42<7:27:46, 2.70s/it] 19%|█▉ | 2356/12313 [1:45:45<7:34:50, 2.74s/it] {'loss': 0.5822, 'grad_norm': 5.505569141881063, 'learning_rate': 4.666541642173352e-06, 'epoch': 0.19} 19%|█▉ | 2356/12313 [1:45:45<7:34:50, 2.74s/it] 19%|█▉ | 2357/12313 [1:45:47<7:21:22, 2.66s/it] {'loss': 0.634, 'grad_norm': 3.730344521450571, 'learning_rate': 4.666213430512538e-06, 'epoch': 0.19} 19%|█▉ | 2357/12313 [1:45:47<7:21:22, 2.66s/it] 19%|█▉ | 2358/12313 [1:45:50<7:15:50, 2.63s/it] {'loss': 0.5967, 'grad_norm': 5.750851724066535, 'learning_rate': 4.66588506896122e-06, 'epoch': 0.19} 19%|█▉ | 2358/12313 [1:45:50<7:15:50, 2.63s/it] 19%|█▉ | 2359/12313 [1:45:52<7:15:34, 2.63s/it] {'loss': 0.7144, 'grad_norm': 5.307685486956423, 'learning_rate': 4.665556557542118e-06, 'epoch': 0.19} 19%|█▉ | 2359/12313 [1:45:52<7:15:34, 2.63s/it] 19%|█▉ | 2360/12313 [1:45:55<7:12:27, 2.61s/it] {'loss': 0.6383, 'grad_norm': 5.143047388945135, 'learning_rate': 4.6652278962779625e-06, 'epoch': 0.19} 19%|█▉ | 2360/12313 [1:45:55<7:12:27, 2.61s/it] 19%|█▉ | 2361/12313 [1:45:58<7:16:19, 2.63s/it] {'loss': 0.6477, 'grad_norm': 4.999760195133255, 'learning_rate': 4.664899085191496e-06, 'epoch': 0.19} 19%|█▉ | 2361/12313 [1:45:58<7:16:19, 2.63s/it] 19%|█▉ | 2362/12313 [1:46:00<7:17:40, 2.64s/it] {'loss': 0.5304, 'grad_norm': 7.787092748476295, 'learning_rate': 4.664570124305472e-06, 'epoch': 0.19} 19%|█▉ | 2362/12313 [1:46:00<7:17:40, 2.64s/it] 19%|█▉ | 2363/12313 [1:46:03<7:18:32, 2.64s/it] {'loss': 0.4677, 'grad_norm': 4.318743886643792, 'learning_rate': 4.66424101364265e-06, 'epoch': 0.19} 19%|█▉ | 2363/12313 [1:46:03<7:18:32, 2.64s/it] 19%|█▉ | 2364/12313 [1:46:06<7:11:57, 2.61s/it] {'loss': 0.5786, 'grad_norm': 4.9984105511208545, 'learning_rate': 4.663911753225803e-06, 'epoch': 0.19} 19%|█▉ | 2364/12313 [1:46:06<7:11:57, 2.61s/it] 19%|█▉ | 2365/12313 [1:46:09<7:38:36, 2.77s/it] {'loss': 0.4654, 'grad_norm': 3.7096978585490556, 'learning_rate': 4.663582343077716e-06, 'epoch': 0.19} 19%|█▉ | 2365/12313 [1:46:09<7:38:36, 2.77s/it] 19%|█▉ | 2366/12313 [1:46:11<7:30:28, 2.72s/it] {'loss': 0.4797, 'grad_norm': 5.0966976399052815, 'learning_rate': 4.663252783221182e-06, 'epoch': 0.19} 19%|█▉ | 2366/12313 [1:46:11<7:30:28, 2.72s/it] 19%|█▉ | 2367/12313 [1:46:14<7:26:32, 2.69s/it] {'loss': 0.5612, 'grad_norm': 11.273482637934793, 'learning_rate': 4.662923073679003e-06, 'epoch': 0.19} 19%|█▉ | 2367/12313 [1:46:14<7:26:32, 2.69s/it] 19%|█▉ | 2368/12313 [1:46:17<7:24:30, 2.68s/it] {'loss': 0.6027, 'grad_norm': 5.498434540946209, 'learning_rate': 4.662593214473995e-06, 'epoch': 0.19} 19%|█▉ | 2368/12313 [1:46:17<7:24:30, 2.68s/it] 19%|█▉ | 2369/12313 [1:46:19<7:25:19, 2.69s/it] {'loss': 0.7433, 'grad_norm': 3.859479042229787, 'learning_rate': 4.662263205628983e-06, 'epoch': 0.19} 19%|█▉ | 2369/12313 [1:46:19<7:25:19, 2.69s/it] 19%|█▉ | 2370/12313 [1:46:22<7:34:19, 2.74s/it] {'loss': 0.4785, 'grad_norm': 4.418822540532742, 'learning_rate': 4.661933047166799e-06, 'epoch': 0.19} 19%|█▉ | 2370/12313 [1:46:22<7:34:19, 2.74s/it] 19%|█▉ | 2371/12313 [1:46:25<7:31:30, 2.72s/it] {'loss': 0.5654, 'grad_norm': 5.051803928696753, 'learning_rate': 4.661602739110291e-06, 'epoch': 0.19} 19%|█▉ | 2371/12313 [1:46:25<7:31:30, 2.72s/it] 19%|█▉ | 2372/12313 [1:46:28<7:41:23, 2.78s/it] {'loss': 0.7034, 'grad_norm': 4.0841298739845895, 'learning_rate': 4.661272281482313e-06, 'epoch': 0.19} 19%|█▉ | 2372/12313 [1:46:28<7:41:23, 2.78s/it] 19%|█▉ | 2373/12313 [1:46:30<7:28:48, 2.71s/it] {'loss': 0.57, 'grad_norm': 7.928386324620138, 'learning_rate': 4.660941674305732e-06, 'epoch': 0.19} 19%|█▉ | 2373/12313 [1:46:30<7:28:48, 2.71s/it] 19%|█▉ | 2374/12313 [1:46:33<7:28:29, 2.71s/it] {'loss': 0.4108, 'grad_norm': 6.641168636182495, 'learning_rate': 4.660610917603423e-06, 'epoch': 0.19} 19%|█▉ | 2374/12313 [1:46:33<7:28:29, 2.71s/it] 19%|█▉ | 2375/12313 [1:46:36<7:30:12, 2.72s/it] {'loss': 0.5886, 'grad_norm': 7.233453638420033, 'learning_rate': 4.6602800113982746e-06, 'epoch': 0.19} 19%|█▉ | 2375/12313 [1:46:36<7:30:12, 2.72s/it] 19%|█▉ | 2376/12313 [1:46:38<7:26:51, 2.70s/it] {'loss': 0.6516, 'grad_norm': 5.3241909788384385, 'learning_rate': 4.659948955713181e-06, 'epoch': 0.19} 19%|█▉ | 2376/12313 [1:46:38<7:26:51, 2.70s/it] 19%|█▉ | 2377/12313 [1:46:41<7:20:49, 2.66s/it] {'loss': 0.5341, 'grad_norm': 6.517892337897202, 'learning_rate': 4.659617750571052e-06, 'epoch': 0.19} 19%|█▉ | 2377/12313 [1:46:41<7:20:49, 2.66s/it] 19%|█▉ | 2378/12313 [1:46:44<7:23:41, 2.68s/it] {'loss': 0.7056, 'grad_norm': 8.78691467540056, 'learning_rate': 4.659286395994806e-06, 'epoch': 0.19} 19%|█▉ | 2378/12313 [1:46:44<7:23:41, 2.68s/it] 19%|█▉ | 2379/12313 [1:46:46<7:23:18, 2.68s/it] {'loss': 0.4839, 'grad_norm': 5.510965559445886, 'learning_rate': 4.658954892007367e-06, 'epoch': 0.19} 19%|█▉ | 2379/12313 [1:46:46<7:23:18, 2.68s/it] 19%|█▉ | 2380/12313 [1:46:49<7:29:12, 2.71s/it] {'loss': 0.5688, 'grad_norm': 5.065066805659556, 'learning_rate': 4.658623238631675e-06, 'epoch': 0.19} 19%|█▉ | 2380/12313 [1:46:49<7:29:12, 2.71s/it] 19%|█▉ | 2381/12313 [1:46:52<7:15:51, 2.63s/it] {'loss': 0.5662, 'grad_norm': 4.049904522221411, 'learning_rate': 4.658291435890681e-06, 'epoch': 0.19} 19%|█▉ | 2381/12313 [1:46:52<7:15:51, 2.63s/it] 19%|█▉ | 2382/12313 [1:46:54<7:22:50, 2.68s/it] {'loss': 0.52, 'grad_norm': 5.778500245146091, 'learning_rate': 4.657959483807342e-06, 'epoch': 0.19} 19%|█▉ | 2382/12313 [1:46:54<7:22:50, 2.68s/it] 19%|█▉ | 2383/12313 [1:46:57<7:25:02, 2.69s/it] {'loss': 0.4357, 'grad_norm': 4.275606455147308, 'learning_rate': 4.657627382404627e-06, 'epoch': 0.19} 19%|█▉ | 2383/12313 [1:46:57<7:25:02, 2.69s/it] 19%|█▉ | 2384/12313 [1:47:00<7:24:04, 2.68s/it] {'loss': 0.6067, 'grad_norm': 4.9185646028031345, 'learning_rate': 4.657295131705516e-06, 'epoch': 0.19} 19%|█▉ | 2384/12313 [1:47:00<7:24:04, 2.68s/it] 19%|█▉ | 2385/12313 [1:47:03<7:31:04, 2.73s/it] {'loss': 0.58, 'grad_norm': 4.858322104090867, 'learning_rate': 4.6569627317329995e-06, 'epoch': 0.19} 19%|█▉ | 2385/12313 [1:47:03<7:31:04, 2.73s/it] 19%|█▉ | 2386/12313 [1:47:05<7:36:37, 2.76s/it] {'loss': 0.5096, 'grad_norm': 5.010825230384572, 'learning_rate': 4.656630182510078e-06, 'epoch': 0.19} 19%|█▉ | 2386/12313 [1:47:05<7:36:37, 2.76s/it] 19%|█▉ | 2387/12313 [1:47:08<7:27:34, 2.71s/it] {'loss': 0.5781, 'grad_norm': 4.971344756018251, 'learning_rate': 4.656297484059761e-06, 'epoch': 0.19} 19%|█▉ | 2387/12313 [1:47:08<7:27:34, 2.71s/it] 19%|█▉ | 2388/12313 [1:47:11<7:29:27, 2.72s/it] {'loss': 0.5301, 'grad_norm': 5.7140608376861906, 'learning_rate': 4.655964636405071e-06, 'epoch': 0.19} 19%|█▉ | 2388/12313 [1:47:11<7:29:27, 2.72s/it] 19%|█▉ | 2389/12313 [1:47:13<7:20:01, 2.66s/it] {'loss': 0.5345, 'grad_norm': 5.25952348586539, 'learning_rate': 4.655631639569037e-06, 'epoch': 0.19} 19%|█▉ | 2389/12313 [1:47:13<7:20:01, 2.66s/it] 19%|█▉ | 2390/12313 [1:47:16<7:16:33, 2.64s/it] {'loss': 0.5165, 'grad_norm': 4.858225371501829, 'learning_rate': 4.655298493574704e-06, 'epoch': 0.19} 19%|█▉ | 2390/12313 [1:47:16<7:16:33, 2.64s/it] 19%|█▉ | 2391/12313 [1:47:19<7:17:31, 2.65s/it] {'loss': 0.5336, 'grad_norm': 7.9815798526373145, 'learning_rate': 4.65496519844512e-06, 'epoch': 0.19} 19%|█▉ | 2391/12313 [1:47:19<7:17:31, 2.65s/it] 19%|█▉ | 2392/12313 [1:47:21<7:16:24, 2.64s/it] {'loss': 0.5233, 'grad_norm': 6.221631927317744, 'learning_rate': 4.654631754203351e-06, 'epoch': 0.19} 19%|█▉ | 2392/12313 [1:47:21<7:16:24, 2.64s/it] 19%|█▉ | 2393/12313 [1:47:24<7:19:34, 2.66s/it] {'loss': 0.5399, 'grad_norm': 5.199395166051189, 'learning_rate': 4.6542981608724665e-06, 'epoch': 0.19} 19%|█▉ | 2393/12313 [1:47:24<7:19:34, 2.66s/it] 19%|█▉ | 2394/12313 [1:47:27<7:24:58, 2.69s/it] {'loss': 0.6658, 'grad_norm': 5.85510080538778, 'learning_rate': 4.6539644184755515e-06, 'epoch': 0.19} 19%|█▉ | 2394/12313 [1:47:27<7:24:58, 2.69s/it] 19%|█▉ | 2395/12313 [1:47:29<7:22:43, 2.68s/it] {'loss': 0.5119, 'grad_norm': 5.187441178217061, 'learning_rate': 4.6536305270356975e-06, 'epoch': 0.19} 19%|█▉ | 2395/12313 [1:47:29<7:22:43, 2.68s/it] 19%|█▉ | 2396/12313 [1:47:32<7:18:35, 2.65s/it] {'loss': 0.3997, 'grad_norm': 4.41810789985307, 'learning_rate': 4.65329648657601e-06, 'epoch': 0.19} 19%|█▉ | 2396/12313 [1:47:32<7:18:35, 2.65s/it] 19%|█▉ | 2397/12313 [1:47:34<7:14:48, 2.63s/it] {'loss': 0.7692, 'grad_norm': 5.73516894480618, 'learning_rate': 4.652962297119601e-06, 'epoch': 0.19} 19%|█▉ | 2397/12313 [1:47:34<7:14:48, 2.63s/it] 19%|█▉ | 2398/12313 [1:47:37<7:18:57, 2.66s/it] {'loss': 0.4979, 'grad_norm': 7.959466481454268, 'learning_rate': 4.652627958689596e-06, 'epoch': 0.19} 19%|█▉ | 2398/12313 [1:47:37<7:18:57, 2.66s/it] 19%|█▉ | 2399/12313 [1:47:40<7:27:13, 2.71s/it] {'loss': 0.6685, 'grad_norm': 5.346996268227039, 'learning_rate': 4.65229347130913e-06, 'epoch': 0.19} 19%|█▉ | 2399/12313 [1:47:40<7:27:13, 2.71s/it] 19%|█▉ | 2400/12313 [1:47:43<7:25:31, 2.70s/it] {'loss': 0.5679, 'grad_norm': 14.105527004476242, 'learning_rate': 4.651958835001345e-06, 'epoch': 0.19} 19%|█▉ | 2400/12313 [1:47:43<7:25:31, 2.70s/it] 19%|█▉ | 2401/12313 [1:47:45<7:20:56, 2.67s/it] {'loss': 0.4162, 'grad_norm': 5.030593581231049, 'learning_rate': 4.651624049789397e-06, 'epoch': 0.19} 19%|█▉ | 2401/12313 [1:47:45<7:20:56, 2.67s/it] 20%|█▉ | 2402/12313 [1:47:48<7:17:15, 2.65s/it] {'loss': 0.479, 'grad_norm': 10.896072520758361, 'learning_rate': 4.651289115696454e-06, 'epoch': 0.2} 20%|█▉ | 2402/12313 [1:47:48<7:17:15, 2.65s/it] 20%|█▉ | 2403/12313 [1:47:51<7:28:43, 2.72s/it] {'loss': 0.5053, 'grad_norm': 4.958720750747442, 'learning_rate': 4.650954032745689e-06, 'epoch': 0.2} 20%|█▉ | 2403/12313 [1:47:51<7:28:43, 2.72s/it] 20%|█▉ | 2404/12313 [1:47:54<7:36:16, 2.76s/it] {'loss': 0.574, 'grad_norm': 3.974794068624415, 'learning_rate': 4.6506188009602885e-06, 'epoch': 0.2} 20%|█▉ | 2404/12313 [1:47:54<7:36:16, 2.76s/it] 20%|█▉ | 2405/12313 [1:47:56<7:19:29, 2.66s/it] {'loss': 0.514, 'grad_norm': 8.24891618792213, 'learning_rate': 4.65028342036345e-06, 'epoch': 0.2} 20%|█▉ | 2405/12313 [1:47:56<7:19:29, 2.66s/it] 20%|█▉ | 2406/12313 [1:47:59<7:22:40, 2.68s/it] {'loss': 0.427, 'grad_norm': 4.335525878369569, 'learning_rate': 4.6499478909783764e-06, 'epoch': 0.2} 20%|█▉ | 2406/12313 [1:47:59<7:22:40, 2.68s/it] 20%|█▉ | 2407/12313 [1:48:01<7:17:35, 2.65s/it] {'loss': 0.4575, 'grad_norm': 6.424849313389938, 'learning_rate': 4.649612212828289e-06, 'epoch': 0.2} 20%|█▉ | 2407/12313 [1:48:01<7:17:35, 2.65s/it] 20%|█▉ | 2408/12313 [1:48:04<7:19:51, 2.66s/it] {'loss': 0.6179, 'grad_norm': 4.535461924057379, 'learning_rate': 4.6492763859364134e-06, 'epoch': 0.2} 20%|█▉ | 2408/12313 [1:48:04<7:19:51, 2.66s/it] 20%|█▉ | 2409/12313 [1:48:07<7:21:16, 2.67s/it] {'loss': 0.6637, 'grad_norm': 7.409680541443817, 'learning_rate': 4.648940410325987e-06, 'epoch': 0.2} 20%|█▉ | 2409/12313 [1:48:07<7:21:16, 2.67s/it] 20%|█▉ | 2410/12313 [1:48:09<7:22:33, 2.68s/it] {'loss': 0.5738, 'grad_norm': 4.6863434003973845, 'learning_rate': 4.648604286020256e-06, 'epoch': 0.2} 20%|█▉ | 2410/12313 [1:48:09<7:22:33, 2.68s/it] 20%|█▉ | 2411/12313 [1:48:12<7:23:13, 2.69s/it] {'loss': 0.4905, 'grad_norm': 5.544427789484983, 'learning_rate': 4.64826801304248e-06, 'epoch': 0.2} 20%|█▉ | 2411/12313 [1:48:12<7:23:13, 2.69s/it] 20%|█▉ | 2412/12313 [1:48:15<7:33:21, 2.75s/it] {'loss': 0.7071, 'grad_norm': 6.119211167714467, 'learning_rate': 4.647931591415929e-06, 'epoch': 0.2} 20%|█▉ | 2412/12313 [1:48:15<7:33:21, 2.75s/it] 20%|█▉ | 2413/12313 [1:48:18<7:47:53, 2.84s/it] {'loss': 0.6431, 'grad_norm': 3.1340827150045714, 'learning_rate': 4.647595021163878e-06, 'epoch': 0.2} 20%|█▉ | 2413/12313 [1:48:18<7:47:53, 2.84s/it] 20%|█▉ | 2414/12313 [1:48:21<7:46:20, 2.83s/it] {'loss': 0.6288, 'grad_norm': 5.376283662779039, 'learning_rate': 4.647258302309618e-06, 'epoch': 0.2} 20%|█▉ | 2414/12313 [1:48:21<7:46:20, 2.83s/it] 20%|█▉ | 2415/12313 [1:48:23<7:32:44, 2.74s/it] {'loss': 0.4535, 'grad_norm': 6.177768802745785, 'learning_rate': 4.646921434876447e-06, 'epoch': 0.2} 20%|█▉ | 2415/12313 [1:48:23<7:32:44, 2.74s/it] 20%|█▉ | 2416/12313 [1:48:26<7:18:27, 2.66s/it] {'loss': 0.6907, 'grad_norm': 12.088856729370683, 'learning_rate': 4.646584418887675e-06, 'epoch': 0.2} 20%|█▉ | 2416/12313 [1:48:26<7:18:27, 2.66s/it] 20%|█▉ | 2417/12313 [1:48:28<7:15:18, 2.64s/it] {'loss': 0.5471, 'grad_norm': 8.938823511128325, 'learning_rate': 4.646247254366622e-06, 'epoch': 0.2} 20%|█▉ | 2417/12313 [1:48:28<7:15:18, 2.64s/it] 20%|█▉ | 2418/12313 [1:48:31<7:13:01, 2.63s/it] {'loss': 0.6261, 'grad_norm': 6.009413684644707, 'learning_rate': 4.645909941336619e-06, 'epoch': 0.2} 20%|█▉ | 2418/12313 [1:48:31<7:13:01, 2.63s/it] 20%|█▉ | 2419/12313 [1:48:34<7:15:12, 2.64s/it] {'loss': 0.5226, 'grad_norm': 5.0207649299589034, 'learning_rate': 4.645572479821004e-06, 'epoch': 0.2} 20%|█▉ | 2419/12313 [1:48:34<7:15:12, 2.64s/it] 20%|█▉ | 2420/12313 [1:48:36<7:11:55, 2.62s/it] {'loss': 0.7304, 'grad_norm': 5.8465718027493025, 'learning_rate': 4.645234869843129e-06, 'epoch': 0.2} 20%|█▉ | 2420/12313 [1:48:36<7:11:55, 2.62s/it] 20%|█▉ | 2421/12313 [1:48:39<7:14:57, 2.64s/it] {'loss': 0.4947, 'grad_norm': 4.44988664177105, 'learning_rate': 4.644897111426355e-06, 'epoch': 0.2} 20%|█▉ | 2421/12313 [1:48:39<7:14:57, 2.64s/it] 20%|█▉ | 2422/12313 [1:48:42<7:13:48, 2.63s/it] {'loss': 0.6638, 'grad_norm': 4.080957228552931, 'learning_rate': 4.6445592045940515e-06, 'epoch': 0.2} 20%|█▉ | 2422/12313 [1:48:42<7:13:48, 2.63s/it] 20%|█▉ | 2423/12313 [1:48:44<7:11:43, 2.62s/it] {'loss': 0.5218, 'grad_norm': 6.67424212054808, 'learning_rate': 4.644221149369602e-06, 'epoch': 0.2} 20%|█▉ | 2423/12313 [1:48:44<7:11:43, 2.62s/it] 20%|█▉ | 2424/12313 [1:48:47<7:07:45, 2.60s/it] {'loss': 0.5116, 'grad_norm': 4.3515760143355084, 'learning_rate': 4.643882945776397e-06, 'epoch': 0.2} 20%|█▉ | 2424/12313 [1:48:47<7:07:45, 2.60s/it] 20%|█▉ | 2425/12313 [1:48:49<7:07:49, 2.60s/it] {'loss': 0.5858, 'grad_norm': 4.665310299217762, 'learning_rate': 4.6435445938378375e-06, 'epoch': 0.2} 20%|█▉ | 2425/12313 [1:48:49<7:07:49, 2.60s/it] 20%|█▉ | 2426/12313 [1:48:52<7:13:16, 2.63s/it] {'loss': 0.3923, 'grad_norm': 3.4118287749102367, 'learning_rate': 4.643206093577338e-06, 'epoch': 0.2} 20%|█▉ | 2426/12313 [1:48:52<7:13:16, 2.63s/it] 20%|█▉ | 2427/12313 [1:48:55<7:18:24, 2.66s/it] {'loss': 0.5732, 'grad_norm': 4.722707242292216, 'learning_rate': 4.642867445018318e-06, 'epoch': 0.2} 20%|█▉ | 2427/12313 [1:48:55<7:18:24, 2.66s/it] 20%|█▉ | 2428/12313 [1:48:58<7:25:17, 2.70s/it] {'loss': 0.4799, 'grad_norm': 4.984185032243032, 'learning_rate': 4.642528648184213e-06, 'epoch': 0.2} 20%|█▉ | 2428/12313 [1:48:58<7:25:17, 2.70s/it] 20%|█▉ | 2429/12313 [1:49:00<7:25:38, 2.71s/it] {'loss': 0.5171, 'grad_norm': 4.175601292098027, 'learning_rate': 4.642189703098466e-06, 'epoch': 0.2} 20%|█▉ | 2429/12313 [1:49:00<7:25:38, 2.71s/it] 20%|█▉ | 2430/12313 [1:49:03<7:19:24, 2.67s/it] {'loss': 0.7051, 'grad_norm': 8.539390583313239, 'learning_rate': 4.6418506097845264e-06, 'epoch': 0.2} 20%|█▉ | 2430/12313 [1:49:03<7:19:24, 2.67s/it] 20%|█▉ | 2431/12313 [1:49:06<7:22:31, 2.69s/it] {'loss': 0.491, 'grad_norm': 6.857246634219236, 'learning_rate': 4.641511368265861e-06, 'epoch': 0.2} 20%|█▉ | 2431/12313 [1:49:06<7:22:31, 2.69s/it] 20%|█▉ | 2432/12313 [1:49:08<7:29:57, 2.73s/it] {'loss': 0.7512, 'grad_norm': 8.537762993370494, 'learning_rate': 4.641171978565943e-06, 'epoch': 0.2} 20%|█▉ | 2432/12313 [1:49:08<7:29:57, 2.73s/it] 20%|█▉ | 2433/12313 [1:49:11<7:15:29, 2.64s/it] {'loss': 0.6716, 'grad_norm': 4.408249935592557, 'learning_rate': 4.640832440708256e-06, 'epoch': 0.2} 20%|█▉ | 2433/12313 [1:49:11<7:15:29, 2.64s/it] 20%|█▉ | 2434/12313 [1:49:14<7:18:52, 2.67s/it] {'loss': 0.5133, 'grad_norm': 5.109570036404427, 'learning_rate': 4.640492754716294e-06, 'epoch': 0.2} 20%|█▉ | 2434/12313 [1:49:14<7:18:52, 2.67s/it] 20%|█▉ | 2435/12313 [1:49:16<7:24:09, 2.70s/it] {'loss': 0.9746, 'grad_norm': 4.736765394511924, 'learning_rate': 4.640152920613562e-06, 'epoch': 0.2} 20%|█▉ | 2435/12313 [1:49:16<7:24:09, 2.70s/it] 20%|█▉ | 2436/12313 [1:49:19<7:32:56, 2.75s/it] {'loss': 0.4769, 'grad_norm': 5.063603609846527, 'learning_rate': 4.639812938423574e-06, 'epoch': 0.2} 20%|█▉ | 2436/12313 [1:49:19<7:32:56, 2.75s/it] 20%|█▉ | 2437/12313 [1:49:22<7:30:03, 2.73s/it] {'loss': 0.5295, 'grad_norm': 5.272415199098991, 'learning_rate': 4.639472808169857e-06, 'epoch': 0.2} 20%|█▉ | 2437/12313 [1:49:22<7:30:03, 2.73s/it] 20%|█▉ | 2438/12313 [1:49:25<7:42:06, 2.81s/it] {'loss': 0.6467, 'grad_norm': 8.257542098149266, 'learning_rate': 4.639132529875943e-06, 'epoch': 0.2} 20%|█▉ | 2438/12313 [1:49:25<7:42:06, 2.81s/it] 20%|█▉ | 2439/12313 [1:49:27<7:30:25, 2.74s/it] {'loss': 0.5433, 'grad_norm': 5.3139070377916315, 'learning_rate': 4.63879210356538e-06, 'epoch': 0.2} 20%|█▉ | 2439/12313 [1:49:27<7:30:25, 2.74s/it] 20%|█▉ | 2440/12313 [1:49:30<7:36:32, 2.77s/it] {'loss': 0.5335, 'grad_norm': 6.841621560493213, 'learning_rate': 4.6384515292617226e-06, 'epoch': 0.2} 20%|█▉ | 2440/12313 [1:49:30<7:36:32, 2.77s/it] 20%|█▉ | 2441/12313 [1:49:33<7:29:12, 2.73s/it] {'loss': 0.5433, 'grad_norm': 8.402046267183342, 'learning_rate': 4.6381108069885376e-06, 'epoch': 0.2} 20%|█▉ | 2441/12313 [1:49:33<7:29:12, 2.73s/it] 20%|█▉ | 2442/12313 [1:49:36<7:41:50, 2.81s/it] {'loss': 0.5768, 'grad_norm': 3.6536735919997674, 'learning_rate': 4.6377699367694e-06, 'epoch': 0.2} 20%|█▉ | 2442/12313 [1:49:36<7:41:50, 2.81s/it] 20%|█▉ | 2443/12313 [1:49:39<7:34:39, 2.76s/it] {'loss': 0.5184, 'grad_norm': 6.961987557428991, 'learning_rate': 4.637428918627896e-06, 'epoch': 0.2} 20%|█▉ | 2443/12313 [1:49:39<7:34:39, 2.76s/it] 20%|█▉ | 2444/12313 [1:49:41<7:18:42, 2.67s/it] {'loss': 0.6723, 'grad_norm': 5.119773267513261, 'learning_rate': 4.637087752587624e-06, 'epoch': 0.2} 20%|█▉ | 2444/12313 [1:49:41<7:18:42, 2.67s/it] 20%|█▉ | 2445/12313 [1:49:44<7:10:58, 2.62s/it] {'loss': 0.4536, 'grad_norm': 5.37579569573017, 'learning_rate': 4.636746438672189e-06, 'epoch': 0.2} 20%|█▉ | 2445/12313 [1:49:44<7:10:58, 2.62s/it] 20%|█▉ | 2446/12313 [1:49:46<7:13:38, 2.64s/it] {'loss': 0.7141, 'grad_norm': 4.8653070857140825, 'learning_rate': 4.63640497690521e-06, 'epoch': 0.2} 20%|█▉ | 2446/12313 [1:49:46<7:13:38, 2.64s/it] 20%|█▉ | 2447/12313 [1:49:49<7:16:12, 2.65s/it] {'loss': 0.4026, 'grad_norm': 5.213638271403077, 'learning_rate': 4.636063367310313e-06, 'epoch': 0.2} 20%|█▉ | 2447/12313 [1:49:49<7:16:12, 2.65s/it] 20%|█▉ | 2448/12313 [1:49:51<7:11:05, 2.62s/it] {'loss': 0.8184, 'grad_norm': 4.022917999876865, 'learning_rate': 4.635721609911137e-06, 'epoch': 0.2} 20%|█▉ | 2448/12313 [1:49:51<7:11:05, 2.62s/it] 20%|█▉ | 2449/12313 [1:49:54<7:23:44, 2.70s/it] {'loss': 0.5173, 'grad_norm': 5.721442337840194, 'learning_rate': 4.635379704731327e-06, 'epoch': 0.2} 20%|█▉ | 2449/12313 [1:49:54<7:23:44, 2.70s/it] 20%|█▉ | 2450/12313 [1:49:57<7:26:08, 2.71s/it] {'loss': 0.6091, 'grad_norm': 4.746208323114987, 'learning_rate': 4.635037651794544e-06, 'epoch': 0.2} 20%|█▉ | 2450/12313 [1:49:57<7:26:08, 2.71s/it] 20%|█▉ | 2451/12313 [1:50:00<7:22:24, 2.69s/it] {'loss': 0.5977, 'grad_norm': 3.7466714374073393, 'learning_rate': 4.634695451124454e-06, 'epoch': 0.2} 20%|█▉ | 2451/12313 [1:50:00<7:22:24, 2.69s/it] 20%|█▉ | 2452/12313 [1:50:02<7:19:29, 2.67s/it] {'loss': 0.4463, 'grad_norm': 5.06981488833001, 'learning_rate': 4.634353102744737e-06, 'epoch': 0.2} 20%|█▉ | 2452/12313 [1:50:02<7:19:29, 2.67s/it] 20%|█▉ | 2453/12313 [1:50:05<7:19:35, 2.67s/it] {'loss': 0.5105, 'grad_norm': 4.625187502181156, 'learning_rate': 4.634010606679081e-06, 'epoch': 0.2} 20%|█▉ | 2453/12313 [1:50:05<7:19:35, 2.67s/it] 20%|█▉ | 2454/12313 [1:50:08<7:35:25, 2.77s/it] {'loss': 0.5703, 'grad_norm': 4.590440891315889, 'learning_rate': 4.633667962951186e-06, 'epoch': 0.2} 20%|█▉ | 2454/12313 [1:50:08<7:35:25, 2.77s/it] 20%|█▉ | 2455/12313 [1:50:11<7:26:49, 2.72s/it] {'loss': 0.5573, 'grad_norm': 8.282884396118154, 'learning_rate': 4.6333251715847595e-06, 'epoch': 0.2} 20%|█▉ | 2455/12313 [1:50:11<7:26:49, 2.72s/it] 20%|█▉ | 2456/12313 [1:50:13<7:20:59, 2.68s/it] {'loss': 0.6637, 'grad_norm': 5.99824415434856, 'learning_rate': 4.6329822326035214e-06, 'epoch': 0.2} 20%|█▉ | 2456/12313 [1:50:13<7:20:59, 2.68s/it] 20%|█▉ | 2457/12313 [1:50:16<7:26:48, 2.72s/it] {'loss': 0.548, 'grad_norm': 4.69181027886392, 'learning_rate': 4.632639146031201e-06, 'epoch': 0.2} 20%|█▉ | 2457/12313 [1:50:16<7:26:48, 2.72s/it] 20%|█▉ | 2458/12313 [1:50:19<7:37:40, 2.79s/it] {'loss': 0.5267, 'grad_norm': 4.774146261889797, 'learning_rate': 4.63229591189154e-06, 'epoch': 0.2} 20%|█▉ | 2458/12313 [1:50:19<7:37:40, 2.79s/it] 20%|█▉ | 2459/12313 [1:50:22<7:33:31, 2.76s/it] {'loss': 0.6004, 'grad_norm': 4.0388362634362505, 'learning_rate': 4.631952530208286e-06, 'epoch': 0.2} 20%|█▉ | 2459/12313 [1:50:22<7:33:31, 2.76s/it] 20%|█▉ | 2460/12313 [1:50:24<7:31:15, 2.75s/it] {'loss': 0.5377, 'grad_norm': 11.372135655939333, 'learning_rate': 4.6316090010052006e-06, 'epoch': 0.2} 20%|█▉ | 2460/12313 [1:50:24<7:31:15, 2.75s/it] 20%|█▉ | 2461/12313 [1:50:27<7:24:51, 2.71s/it] {'loss': 0.6255, 'grad_norm': 8.258738666338818, 'learning_rate': 4.631265324306053e-06, 'epoch': 0.2} 20%|█▉ | 2461/12313 [1:50:27<7:24:51, 2.71s/it] 20%|█▉ | 2462/12313 [1:50:30<7:13:14, 2.64s/it] {'loss': 0.513, 'grad_norm': 4.838265403873367, 'learning_rate': 4.630921500134625e-06, 'epoch': 0.2} 20%|█▉ | 2462/12313 [1:50:30<7:13:14, 2.64s/it] 20%|██ | 2463/12313 [1:50:32<7:21:11, 2.69s/it] {'loss': 0.5705, 'grad_norm': 6.210942277246636, 'learning_rate': 4.630577528514707e-06, 'epoch': 0.2} 20%|██ | 2463/12313 [1:50:32<7:21:11, 2.69s/it] 20%|██ | 2464/12313 [1:50:35<7:11:13, 2.63s/it] {'loss': 0.7147, 'grad_norm': 4.867014588828138, 'learning_rate': 4.6302334094701e-06, 'epoch': 0.2} 20%|██ | 2464/12313 [1:50:35<7:11:13, 2.63s/it] 20%|██ | 2465/12313 [1:50:38<7:21:06, 2.69s/it] {'loss': 0.4722, 'grad_norm': 3.4107828105330116, 'learning_rate': 4.629889143024615e-06, 'epoch': 0.2} 20%|██ | 2465/12313 [1:50:38<7:21:06, 2.69s/it] 20%|██ | 2466/12313 [1:50:40<7:11:39, 2.63s/it] {'loss': 0.5165, 'grad_norm': 6.2252845176976574, 'learning_rate': 4.6295447292020735e-06, 'epoch': 0.2} 20%|██ | 2466/12313 [1:50:40<7:11:39, 2.63s/it] 20%|██ | 2467/12313 [1:50:43<7:19:03, 2.68s/it] {'loss': 0.4561, 'grad_norm': 5.732379914152118, 'learning_rate': 4.629200168026307e-06, 'epoch': 0.2} 20%|██ | 2467/12313 [1:50:43<7:19:03, 2.68s/it] 20%|██ | 2468/12313 [1:50:45<7:12:59, 2.64s/it] {'loss': 0.5683, 'grad_norm': 6.463671611218234, 'learning_rate': 4.6288554595211575e-06, 'epoch': 0.2} 20%|██ | 2468/12313 [1:50:45<7:12:59, 2.64s/it] 20%|██ | 2469/12313 [1:50:48<7:13:30, 2.64s/it] {'loss': 0.5942, 'grad_norm': 6.359868474828741, 'learning_rate': 4.628510603710478e-06, 'epoch': 0.2} 20%|██ | 2469/12313 [1:50:48<7:13:30, 2.64s/it] 20%|██ | 2470/12313 [1:50:51<7:09:19, 2.62s/it] {'loss': 0.5336, 'grad_norm': 5.920271897280425, 'learning_rate': 4.628165600618129e-06, 'epoch': 0.2} 20%|██ | 2470/12313 [1:50:51<7:09:19, 2.62s/it] 20%|██ | 2471/12313 [1:50:54<7:20:49, 2.69s/it] {'loss': 0.4962, 'grad_norm': 4.578491403510653, 'learning_rate': 4.627820450267984e-06, 'epoch': 0.2} 20%|██ | 2471/12313 [1:50:54<7:20:49, 2.69s/it] 20%|██ | 2472/12313 [1:50:57<7:37:50, 2.79s/it] {'loss': 0.554, 'grad_norm': 3.829851338031233, 'learning_rate': 4.627475152683924e-06, 'epoch': 0.2} 20%|██ | 2472/12313 [1:50:57<7:37:50, 2.79s/it] 20%|██ | 2473/12313 [1:50:59<7:40:02, 2.81s/it] {'loss': 0.7465, 'grad_norm': 5.016784542180406, 'learning_rate': 4.627129707889843e-06, 'epoch': 0.2} 20%|██ | 2473/12313 [1:50:59<7:40:02, 2.81s/it] 20%|██ | 2474/12313 [1:51:02<7:20:40, 2.69s/it] {'loss': 0.5234, 'grad_norm': 7.808176500440273, 'learning_rate': 4.626784115909645e-06, 'epoch': 0.2} 20%|██ | 2474/12313 [1:51:02<7:20:40, 2.69s/it] 20%|██ | 2475/12313 [1:51:04<7:16:50, 2.66s/it] {'loss': 0.7113, 'grad_norm': 3.777859086820638, 'learning_rate': 4.626438376767241e-06, 'epoch': 0.2} 20%|██ | 2475/12313 [1:51:04<7:16:50, 2.66s/it] 20%|██ | 2476/12313 [1:51:07<7:14:13, 2.65s/it] {'loss': 0.5625, 'grad_norm': 6.167573513223816, 'learning_rate': 4.626092490486557e-06, 'epoch': 0.2} 20%|██ | 2476/12313 [1:51:07<7:14:13, 2.65s/it] 20%|██ | 2477/12313 [1:51:10<7:12:17, 2.64s/it] {'loss': 0.39, 'grad_norm': 4.609400429566531, 'learning_rate': 4.6257464570915235e-06, 'epoch': 0.2} 20%|██ | 2477/12313 [1:51:10<7:12:17, 2.64s/it] 20%|██ | 2478/12313 [1:51:12<7:15:52, 2.66s/it] {'loss': 0.4698, 'grad_norm': 4.207144293674372, 'learning_rate': 4.625400276606086e-06, 'epoch': 0.2} 20%|██ | 2478/12313 [1:51:12<7:15:52, 2.66s/it] 20%|██ | 2479/12313 [1:51:15<7:23:00, 2.70s/it] {'loss': 0.6785, 'grad_norm': 4.988407235613174, 'learning_rate': 4.625053949054198e-06, 'epoch': 0.2} 20%|██ | 2479/12313 [1:51:15<7:23:00, 2.70s/it] 20%|██ | 2480/12313 [1:51:18<7:19:25, 2.68s/it] {'loss': 0.7332, 'grad_norm': 3.9367620435534314, 'learning_rate': 4.6247074744598234e-06, 'epoch': 0.2} 20%|██ | 2480/12313 [1:51:18<7:19:25, 2.68s/it] 20%|██ | 2481/12313 [1:51:21<7:27:19, 2.73s/it] {'loss': 0.489, 'grad_norm': 3.9271827060994564, 'learning_rate': 4.6243608528469356e-06, 'epoch': 0.2} 20%|██ | 2481/12313 [1:51:21<7:27:19, 2.73s/it] 20%|██ | 2482/12313 [1:51:23<7:21:02, 2.69s/it] {'loss': 0.5228, 'grad_norm': 5.02867311978453, 'learning_rate': 4.6240140842395205e-06, 'epoch': 0.2} 20%|██ | 2482/12313 [1:51:23<7:21:02, 2.69s/it] 20%|██ | 2483/12313 [1:51:26<7:15:05, 2.66s/it] {'loss': 0.8265, 'grad_norm': 3.9679711814084406, 'learning_rate': 4.623667168661572e-06, 'epoch': 0.2} 20%|██ | 2483/12313 [1:51:26<7:15:05, 2.66s/it] 20%|██ | 2484/12313 [1:51:28<7:07:20, 2.61s/it] {'loss': 0.5777, 'grad_norm': 4.395768395942478, 'learning_rate': 4.623320106137095e-06, 'epoch': 0.2} 20%|██ | 2484/12313 [1:51:28<7:07:20, 2.61s/it] 20%|██ | 2485/12313 [1:51:31<7:03:06, 2.58s/it] {'loss': 0.7345, 'grad_norm': 4.203898879590677, 'learning_rate': 4.6229728966901036e-06, 'epoch': 0.2} 20%|██ | 2485/12313 [1:51:31<7:03:06, 2.58s/it] 20%|██ | 2486/12313 [1:51:34<7:10:11, 2.63s/it] {'loss': 0.6419, 'grad_norm': 3.5440873158516673, 'learning_rate': 4.622625540344623e-06, 'epoch': 0.2} 20%|██ | 2486/12313 [1:51:34<7:10:11, 2.63s/it] 20%|██ | 2487/12313 [1:51:36<7:08:20, 2.62s/it] {'loss': 0.6928, 'grad_norm': 2.545901550496782, 'learning_rate': 4.62227803712469e-06, 'epoch': 0.2} 20%|██ | 2487/12313 [1:51:36<7:08:20, 2.62s/it] 20%|██ | 2488/12313 [1:51:39<7:11:25, 2.63s/it] {'loss': 0.5345, 'grad_norm': 4.89784077750463, 'learning_rate': 4.621930387054349e-06, 'epoch': 0.2} 20%|██ | 2488/12313 [1:51:39<7:11:25, 2.63s/it] 20%|██ | 2489/12313 [1:51:42<7:17:46, 2.67s/it] {'loss': 0.5168, 'grad_norm': 5.15304789976238, 'learning_rate': 4.621582590157654e-06, 'epoch': 0.2} 20%|██ | 2489/12313 [1:51:42<7:17:46, 2.67s/it] 20%|██ | 2490/12313 [1:51:44<7:18:30, 2.68s/it] {'loss': 0.6469, 'grad_norm': 3.6944859365300062, 'learning_rate': 4.621234646458673e-06, 'epoch': 0.2} 20%|██ | 2490/12313 [1:51:44<7:18:30, 2.68s/it] 20%|██ | 2491/12313 [1:51:47<7:24:19, 2.71s/it] {'loss': 0.6652, 'grad_norm': 4.253884701651634, 'learning_rate': 4.6208865559814805e-06, 'epoch': 0.2} 20%|██ | 2491/12313 [1:51:47<7:24:19, 2.71s/it] 20%|██ | 2492/12313 [1:51:50<7:21:55, 2.70s/it] {'loss': 0.5256, 'grad_norm': 5.40192379256261, 'learning_rate': 4.620538318750163e-06, 'epoch': 0.2} 20%|██ | 2492/12313 [1:51:50<7:21:55, 2.70s/it] 20%|██ | 2493/12313 [1:51:52<7:21:33, 2.70s/it] {'loss': 0.6897, 'grad_norm': 3.4100834853157704, 'learning_rate': 4.620189934788817e-06, 'epoch': 0.2} 20%|██ | 2493/12313 [1:51:52<7:21:33, 2.70s/it] 20%|██ | 2494/12313 [1:51:55<7:12:21, 2.64s/it] {'loss': 0.5001, 'grad_norm': 3.8199295718971102, 'learning_rate': 4.6198414041215484e-06, 'epoch': 0.2} 20%|██ | 2494/12313 [1:51:55<7:12:21, 2.64s/it] 20%|██ | 2495/12313 [1:51:57<7:02:51, 2.58s/it] {'loss': 0.6773, 'grad_norm': 3.145895750195355, 'learning_rate': 4.619492726772473e-06, 'epoch': 0.2} 20%|██ | 2495/12313 [1:51:57<7:02:51, 2.58s/it] 20%|██ | 2496/12313 [1:52:00<7:12:45, 2.64s/it] {'loss': 0.6648, 'grad_norm': 4.373203705255476, 'learning_rate': 4.619143902765719e-06, 'epoch': 0.2} 20%|██ | 2496/12313 [1:52:00<7:12:45, 2.64s/it] 20%|██ | 2497/12313 [1:52:03<7:11:37, 2.64s/it] {'loss': 0.4791, 'grad_norm': 6.177848983200393, 'learning_rate': 4.618794932125422e-06, 'epoch': 0.2} 20%|██ | 2497/12313 [1:52:03<7:11:37, 2.64s/it] 20%|██ | 2498/12313 [1:52:05<7:12:40, 2.64s/it] {'loss': 0.8067, 'grad_norm': 3.952402300031322, 'learning_rate': 4.61844581487573e-06, 'epoch': 0.2} 20%|██ | 2498/12313 [1:52:05<7:12:40, 2.64s/it] 20%|██ | 2499/12313 [1:52:08<7:10:54, 2.63s/it] {'loss': 0.502, 'grad_norm': 3.670343152102567, 'learning_rate': 4.618096551040798e-06, 'epoch': 0.2} 20%|██ | 2499/12313 [1:52:08<7:10:54, 2.63s/it] 20%|██ | 2500/12313 [1:52:11<7:17:32, 2.68s/it] {'loss': 0.5255, 'grad_norm': 4.359414649768362, 'learning_rate': 4.617747140644796e-06, 'epoch': 0.2} 20%|██ | 2500/12313 [1:52:11<7:17:32, 2.68s/it] 20%|██ | 2501/12313 [1:52:14<7:31:09, 2.76s/it] {'loss': 0.6263, 'grad_norm': 6.653021158021161, 'learning_rate': 4.617397583711899e-06, 'epoch': 0.2} 20%|██ | 2501/12313 [1:52:14<7:31:09, 2.76s/it] 20%|██ | 2502/12313 [1:52:16<7:24:32, 2.72s/it] {'loss': 0.6699, 'grad_norm': 4.748016005088914, 'learning_rate': 4.617047880266295e-06, 'epoch': 0.2} 20%|██ | 2502/12313 [1:52:16<7:24:32, 2.72s/it] 20%|██ | 2503/12313 [1:52:19<7:26:34, 2.73s/it] {'loss': 0.4972, 'grad_norm': 3.35437246059798, 'learning_rate': 4.616698030332183e-06, 'epoch': 0.2} 20%|██ | 2503/12313 [1:52:19<7:26:34, 2.73s/it] 20%|██ | 2504/12313 [1:52:22<7:26:13, 2.73s/it] {'loss': 0.5515, 'grad_norm': 14.870459669223077, 'learning_rate': 4.616348033933769e-06, 'epoch': 0.2} 20%|██ | 2504/12313 [1:52:22<7:26:13, 2.73s/it] 20%|██ | 2505/12313 [1:52:25<7:32:35, 2.77s/it] {'loss': 0.6022, 'grad_norm': 5.088873364458127, 'learning_rate': 4.615997891095272e-06, 'epoch': 0.2} 20%|██ | 2505/12313 [1:52:25<7:32:35, 2.77s/it] 20%|██ | 2506/12313 [1:52:28<7:34:56, 2.78s/it] {'loss': 0.6326, 'grad_norm': 3.7990841422951447, 'learning_rate': 4.6156476018409204e-06, 'epoch': 0.2} 20%|██ | 2506/12313 [1:52:28<7:34:56, 2.78s/it] 20%|██ | 2507/12313 [1:52:30<7:40:08, 2.82s/it] {'loss': 0.6267, 'grad_norm': 4.333396061155222, 'learning_rate': 4.61529716619495e-06, 'epoch': 0.2} 20%|██ | 2507/12313 [1:52:30<7:40:08, 2.82s/it] 20%|██ | 2508/12313 [1:52:33<7:22:30, 2.71s/it] {'loss': 0.6191, 'grad_norm': 4.685302979204798, 'learning_rate': 4.614946584181612e-06, 'epoch': 0.2} 20%|██ | 2508/12313 [1:52:33<7:22:30, 2.71s/it] 20%|██ | 2509/12313 [1:52:36<7:26:47, 2.73s/it] {'loss': 0.4998, 'grad_norm': 4.124779647332827, 'learning_rate': 4.614595855825164e-06, 'epoch': 0.2} 20%|██ | 2509/12313 [1:52:36<7:26:47, 2.73s/it] 20%|██ | 2510/12313 [1:52:38<7:19:25, 2.69s/it] {'loss': 0.5782, 'grad_norm': 4.501692024239236, 'learning_rate': 4.6142449811498725e-06, 'epoch': 0.2} 20%|██ | 2510/12313 [1:52:38<7:19:25, 2.69s/it] 20%|██ | 2511/12313 [1:52:41<7:14:47, 2.66s/it] {'loss': 0.5297, 'grad_norm': 4.707923879718902, 'learning_rate': 4.613893960180018e-06, 'epoch': 0.2} 20%|██ | 2511/12313 [1:52:41<7:14:47, 2.66s/it] 20%|██ | 2512/12313 [1:52:44<7:34:17, 2.78s/it] {'loss': 0.7369, 'grad_norm': 7.3215672264551035, 'learning_rate': 4.613542792939891e-06, 'epoch': 0.2} 20%|██ | 2512/12313 [1:52:44<7:34:17, 2.78s/it] 20%|██ | 2513/12313 [1:52:47<7:59:12, 2.93s/it] {'loss': 0.8081, 'grad_norm': 5.2423338527201615, 'learning_rate': 4.613191479453787e-06, 'epoch': 0.2} 20%|██ | 2513/12313 [1:52:47<7:59:12, 2.93s/it] 20%|██ | 2514/12313 [1:52:50<7:56:46, 2.92s/it] {'loss': 0.67, 'grad_norm': 5.115816586092132, 'learning_rate': 4.612840019746016e-06, 'epoch': 0.2} 20%|██ | 2514/12313 [1:52:50<7:56:46, 2.92s/it] 20%|██ | 2515/12313 [1:52:53<7:37:18, 2.80s/it] {'loss': 0.8183, 'grad_norm': 3.4887642989488348, 'learning_rate': 4.612488413840899e-06, 'epoch': 0.2} 20%|██ | 2515/12313 [1:52:53<7:37:18, 2.80s/it] 20%|██ | 2516/12313 [1:52:55<7:20:32, 2.70s/it] {'loss': 0.4956, 'grad_norm': 5.202669123420781, 'learning_rate': 4.6121366617627635e-06, 'epoch': 0.2} 20%|██ | 2516/12313 [1:52:55<7:20:32, 2.70s/it] 20%|██ | 2517/12313 [1:52:58<7:13:37, 2.66s/it] {'loss': 0.5427, 'grad_norm': 4.783952508433326, 'learning_rate': 4.6117847635359494e-06, 'epoch': 0.2} 20%|██ | 2517/12313 [1:52:58<7:13:37, 2.66s/it] 20%|██ | 2518/12313 [1:53:00<7:02:47, 2.59s/it] {'loss': 0.7236, 'grad_norm': 3.8836746569798, 'learning_rate': 4.611432719184806e-06, 'epoch': 0.2} 20%|██ | 2518/12313 [1:53:00<7:02:47, 2.59s/it] 20%|██ | 2519/12313 [1:53:03<7:07:24, 2.62s/it] {'loss': 0.5379, 'grad_norm': 4.277705470777233, 'learning_rate': 4.611080528733693e-06, 'epoch': 0.2} 20%|██ | 2519/12313 [1:53:03<7:07:24, 2.62s/it] 20%|██ | 2520/12313 [1:53:06<7:26:04, 2.73s/it] {'loss': 0.6815, 'grad_norm': 4.597920469594989, 'learning_rate': 4.6107281922069805e-06, 'epoch': 0.2} 20%|██ | 2520/12313 [1:53:06<7:26:04, 2.73s/it] 20%|██ | 2521/12313 [1:53:09<7:28:53, 2.75s/it] {'loss': 0.5213, 'grad_norm': 5.510027910722754, 'learning_rate': 4.610375709629047e-06, 'epoch': 0.2} 20%|██ | 2521/12313 [1:53:09<7:28:53, 2.75s/it] 20%|██ | 2522/12313 [1:53:11<7:31:19, 2.77s/it] {'loss': 0.6494, 'grad_norm': 26.571846192694977, 'learning_rate': 4.610023081024284e-06, 'epoch': 0.2} 20%|██ | 2522/12313 [1:53:11<7:31:19, 2.77s/it] 20%|██ | 2523/12313 [1:53:14<7:26:04, 2.73s/it] {'loss': 0.5739, 'grad_norm': 7.215116215564211, 'learning_rate': 4.6096703064170915e-06, 'epoch': 0.2} 20%|██ | 2523/12313 [1:53:14<7:26:04, 2.73s/it] 20%|██ | 2524/12313 [1:53:17<7:20:41, 2.70s/it] {'loss': 0.7281, 'grad_norm': 5.568682872058116, 'learning_rate': 4.609317385831879e-06, 'epoch': 0.2} 20%|██ | 2524/12313 [1:53:17<7:20:41, 2.70s/it] 21%|██ | 2525/12313 [1:53:19<7:15:10, 2.67s/it] {'loss': 0.6219, 'grad_norm': 7.457927768406811, 'learning_rate': 4.608964319293066e-06, 'epoch': 0.21} 21%|██ | 2525/12313 [1:53:19<7:15:10, 2.67s/it] 21%|██ | 2526/12313 [1:53:22<7:04:26, 2.60s/it] {'loss': 0.5108, 'grad_norm': 4.3162604433484155, 'learning_rate': 4.6086111068250834e-06, 'epoch': 0.21} 21%|██ | 2526/12313 [1:53:22<7:04:26, 2.60s/it] 21%|██ | 2527/12313 [1:53:24<6:59:00, 2.57s/it] {'loss': 0.4834, 'grad_norm': 5.339053755707136, 'learning_rate': 4.608257748452372e-06, 'epoch': 0.21} 21%|██ | 2527/12313 [1:53:24<6:59:00, 2.57s/it] 21%|██ | 2528/12313 [1:53:27<7:02:10, 2.59s/it] {'loss': 0.6459, 'grad_norm': 3.319564859771241, 'learning_rate': 4.607904244199384e-06, 'epoch': 0.21} 21%|██ | 2528/12313 [1:53:27<7:02:10, 2.59s/it] 21%|██ | 2529/12313 [1:53:30<7:04:59, 2.61s/it] {'loss': 0.8157, 'grad_norm': 3.3949377280505657, 'learning_rate': 4.6075505940905765e-06, 'epoch': 0.21} 21%|██ | 2529/12313 [1:53:30<7:04:59, 2.61s/it] 21%|██ | 2530/12313 [1:53:32<7:01:15, 2.58s/it] {'loss': 0.6862, 'grad_norm': 4.846601443639954, 'learning_rate': 4.607196798150423e-06, 'epoch': 0.21} 21%|██ | 2530/12313 [1:53:32<7:01:15, 2.58s/it] 21%|██ | 2531/12313 [1:53:35<6:59:02, 2.57s/it] {'loss': 0.4921, 'grad_norm': 3.634653145365433, 'learning_rate': 4.606842856403402e-06, 'epoch': 0.21} 21%|██ | 2531/12313 [1:53:35<6:59:02, 2.57s/it] 21%|██ | 2532/12313 [1:53:37<6:48:45, 2.51s/it] {'loss': 0.7135, 'grad_norm': 3.453187944742966, 'learning_rate': 4.6064887688740065e-06, 'epoch': 0.21} 21%|██ | 2532/12313 [1:53:37<6:48:45, 2.51s/it] 21%|██ | 2533/12313 [1:53:39<6:48:25, 2.51s/it] {'loss': 0.7577, 'grad_norm': 6.8028434368171, 'learning_rate': 4.606134535586737e-06, 'epoch': 0.21} 21%|██ | 2533/12313 [1:53:39<6:48:25, 2.51s/it] 21%|██ | 2534/12313 [1:53:42<6:53:57, 2.54s/it] {'loss': 0.6454, 'grad_norm': 5.81957789301719, 'learning_rate': 4.605780156566103e-06, 'epoch': 0.21} 21%|██ | 2534/12313 [1:53:42<6:53:57, 2.54s/it] 21%|██ | 2535/12313 [1:53:45<6:54:25, 2.54s/it] {'loss': 0.5258, 'grad_norm': 3.726486607180334, 'learning_rate': 4.6054256318366275e-06, 'epoch': 0.21} 21%|██ | 2535/12313 [1:53:45<6:54:25, 2.54s/it] 21%|██ | 2536/12313 [1:53:47<6:51:45, 2.53s/it] {'loss': 0.6497, 'grad_norm': 6.553307065329235, 'learning_rate': 4.6050709614228416e-06, 'epoch': 0.21} 21%|██ | 2536/12313 [1:53:47<6:51:45, 2.53s/it] 21%|██ | 2537/12313 [1:53:50<6:55:46, 2.55s/it] {'loss': 0.5451, 'grad_norm': 6.182500794690838, 'learning_rate': 4.604716145349285e-06, 'epoch': 0.21} 21%|██ | 2537/12313 [1:53:50<6:55:46, 2.55s/it] 21%|██ | 2538/12313 [1:53:52<7:00:10, 2.58s/it] {'loss': 0.5455, 'grad_norm': 3.621812427485916, 'learning_rate': 4.604361183640511e-06, 'epoch': 0.21} 21%|██ | 2538/12313 [1:53:52<7:00:10, 2.58s/it] 21%|██ | 2539/12313 [1:53:55<7:04:12, 2.60s/it] {'loss': 0.5335, 'grad_norm': 6.106652776271362, 'learning_rate': 4.60400607632108e-06, 'epoch': 0.21} 21%|██ | 2539/12313 [1:53:55<7:04:12, 2.60s/it] 21%|██ | 2540/12313 [1:53:58<7:02:39, 2.59s/it] {'loss': 0.5764, 'grad_norm': 8.175917425138248, 'learning_rate': 4.603650823415563e-06, 'epoch': 0.21} 21%|██ | 2540/12313 [1:53:58<7:02:39, 2.59s/it] 21%|██ | 2541/12313 [1:54:00<7:11:06, 2.65s/it] {'loss': 0.514, 'grad_norm': 4.015447399797649, 'learning_rate': 4.603295424948544e-06, 'epoch': 0.21} 21%|██ | 2541/12313 [1:54:00<7:11:06, 2.65s/it] 21%|██ | 2542/12313 [1:54:03<7:13:53, 2.66s/it] {'loss': 0.5623, 'grad_norm': 4.883808409792754, 'learning_rate': 4.602939880944612e-06, 'epoch': 0.21} 21%|██ | 2542/12313 [1:54:03<7:13:53, 2.66s/it] 21%|██ | 2543/12313 [1:54:06<7:05:05, 2.61s/it] {'loss': 0.5176, 'grad_norm': 3.0577448372785745, 'learning_rate': 4.6025841914283705e-06, 'epoch': 0.21} 21%|██ | 2543/12313 [1:54:06<7:05:05, 2.61s/it] 21%|██ | 2544/12313 [1:54:08<7:19:34, 2.70s/it] {'loss': 0.5816, 'grad_norm': 3.73178717833002, 'learning_rate': 4.602228356424431e-06, 'epoch': 0.21} 21%|██ | 2544/12313 [1:54:08<7:19:34, 2.70s/it] 21%|██ | 2545/12313 [1:54:11<7:20:45, 2.71s/it] {'loss': 0.7313, 'grad_norm': 3.105000510085083, 'learning_rate': 4.601872375957414e-06, 'epoch': 0.21} 21%|██ | 2545/12313 [1:54:11<7:20:45, 2.71s/it] 21%|██ | 2546/12313 [1:54:14<7:20:16, 2.70s/it] {'loss': 0.4578, 'grad_norm': 5.053370814100823, 'learning_rate': 4.601516250051954e-06, 'epoch': 0.21} 21%|██ | 2546/12313 [1:54:14<7:20:16, 2.70s/it] 21%|██ | 2547/12313 [1:54:17<7:23:13, 2.72s/it] {'loss': 0.5377, 'grad_norm': 3.758806458666759, 'learning_rate': 4.601159978732691e-06, 'epoch': 0.21} 21%|██ | 2547/12313 [1:54:17<7:23:13, 2.72s/it] 21%|██ | 2548/12313 [1:54:19<7:20:58, 2.71s/it] {'loss': 0.6031, 'grad_norm': 3.486408365033022, 'learning_rate': 4.600803562024277e-06, 'epoch': 0.21} 21%|██ | 2548/12313 [1:54:19<7:20:58, 2.71s/it] 21%|██ | 2549/12313 [1:54:22<7:17:35, 2.69s/it] {'loss': 0.6561, 'grad_norm': 4.0286968600378925, 'learning_rate': 4.6004469999513755e-06, 'epoch': 0.21} 21%|██ | 2549/12313 [1:54:22<7:17:35, 2.69s/it] 21%|██ | 2550/12313 [1:54:25<7:15:12, 2.67s/it] {'loss': 0.5098, 'grad_norm': 4.653795014118944, 'learning_rate': 4.600090292538658e-06, 'epoch': 0.21} 21%|██ | 2550/12313 [1:54:25<7:15:12, 2.67s/it] 21%|██ | 2551/12313 [1:54:28<7:30:29, 2.77s/it] {'loss': 0.596, 'grad_norm': 4.008738162793329, 'learning_rate': 4.599733439810807e-06, 'epoch': 0.21} 21%|██ | 2551/12313 [1:54:28<7:30:29, 2.77s/it] 21%|██ | 2552/12313 [1:54:30<7:27:07, 2.75s/it] {'loss': 0.4166, 'grad_norm': 6.379187521913423, 'learning_rate': 4.5993764417925145e-06, 'epoch': 0.21} 21%|██ | 2552/12313 [1:54:30<7:27:07, 2.75s/it] 21%|██ | 2553/12313 [1:54:33<7:15:19, 2.68s/it] {'loss': 0.6534, 'grad_norm': 5.162913747178339, 'learning_rate': 4.599019298508482e-06, 'epoch': 0.21} 21%|██ | 2553/12313 [1:54:33<7:15:19, 2.68s/it] 21%|██ | 2554/12313 [1:54:35<7:13:04, 2.66s/it] {'loss': 0.459, 'grad_norm': 4.626210052826402, 'learning_rate': 4.598662009983424e-06, 'epoch': 0.21} 21%|██ | 2554/12313 [1:54:35<7:13:04, 2.66s/it] 21%|██ | 2555/12313 [1:54:38<7:14:45, 2.67s/it] {'loss': 0.5086, 'grad_norm': 4.343058752050401, 'learning_rate': 4.598304576242063e-06, 'epoch': 0.21} 21%|██ | 2555/12313 [1:54:38<7:14:45, 2.67s/it] 21%|██ | 2556/12313 [1:54:41<7:16:26, 2.68s/it] {'loss': 0.5035, 'grad_norm': 7.300206881482361, 'learning_rate': 4.597946997309129e-06, 'epoch': 0.21} 21%|██ | 2556/12313 [1:54:41<7:16:26, 2.68s/it] 21%|██ | 2557/12313 [1:54:43<7:10:13, 2.65s/it] {'loss': 0.5187, 'grad_norm': 3.0899015840363075, 'learning_rate': 4.597589273209366e-06, 'epoch': 0.21} 21%|██ | 2557/12313 [1:54:43<7:10:13, 2.65s/it] 21%|██ | 2558/12313 [1:54:46<7:03:46, 2.61s/it] {'loss': 0.5379, 'grad_norm': 6.052813956883026, 'learning_rate': 4.597231403967527e-06, 'epoch': 0.21} 21%|██ | 2558/12313 [1:54:46<7:03:46, 2.61s/it] 21%|██ | 2559/12313 [1:54:48<6:54:25, 2.55s/it] {'loss': 0.8438, 'grad_norm': 5.517294439794208, 'learning_rate': 4.5968733896083745e-06, 'epoch': 0.21} 21%|██ | 2559/12313 [1:54:48<6:54:25, 2.55s/it] 21%|██ | 2560/12313 [1:54:51<6:59:30, 2.58s/it] {'loss': 0.6151, 'grad_norm': 4.128375503613039, 'learning_rate': 4.59651523015668e-06, 'epoch': 0.21} 21%|██ | 2560/12313 [1:54:51<6:59:30, 2.58s/it] 21%|██ | 2561/12313 [1:54:54<7:02:12, 2.60s/it] {'loss': 0.5982, 'grad_norm': 6.118250286731413, 'learning_rate': 4.5961569256372285e-06, 'epoch': 0.21} 21%|██ | 2561/12313 [1:54:54<7:02:12, 2.60s/it] 21%|██ | 2562/12313 [1:54:56<7:10:28, 2.65s/it] {'loss': 0.4833, 'grad_norm': 6.553871079659764, 'learning_rate': 4.595798476074811e-06, 'epoch': 0.21} 21%|██ | 2562/12313 [1:54:56<7:10:28, 2.65s/it] 21%|██ | 2563/12313 [1:54:59<7:01:32, 2.59s/it] {'loss': 0.506, 'grad_norm': 5.380395025356601, 'learning_rate': 4.59543988149423e-06, 'epoch': 0.21} 21%|██ | 2563/12313 [1:54:59<7:01:32, 2.59s/it] 21%|██ | 2564/12313 [1:55:02<7:05:51, 2.62s/it] {'loss': 0.4393, 'grad_norm': 5.37381105799694, 'learning_rate': 4.595081141920301e-06, 'epoch': 0.21} 21%|██ | 2564/12313 [1:55:02<7:05:51, 2.62s/it] 21%|██ | 2565/12313 [1:55:04<7:00:14, 2.59s/it] {'loss': 0.6313, 'grad_norm': 4.617124660591345, 'learning_rate': 4.594722257377844e-06, 'epoch': 0.21} 21%|██ | 2565/12313 [1:55:04<7:00:14, 2.59s/it] 21%|██ | 2566/12313 [1:55:07<7:00:07, 2.59s/it] {'loss': 0.6085, 'grad_norm': 3.1785490526390974, 'learning_rate': 4.594363227891693e-06, 'epoch': 0.21} 21%|██ | 2566/12313 [1:55:07<7:00:07, 2.59s/it] 21%|██ | 2567/12313 [1:55:09<6:55:26, 2.56s/it] {'loss': 0.8759, 'grad_norm': 4.209871969565939, 'learning_rate': 4.5940040534866905e-06, 'epoch': 0.21} 21%|██ | 2567/12313 [1:55:09<6:55:26, 2.56s/it] 21%|██ | 2568/12313 [1:55:12<6:56:48, 2.57s/it] {'loss': 0.4414, 'grad_norm': 5.084178440847334, 'learning_rate': 4.59364473418769e-06, 'epoch': 0.21} 21%|██ | 2568/12313 [1:55:12<6:56:48, 2.57s/it] 21%|██ | 2569/12313 [1:55:14<6:51:46, 2.54s/it] {'loss': 0.4927, 'grad_norm': 4.799592885282473, 'learning_rate': 4.593285270019555e-06, 'epoch': 0.21} 21%|██ | 2569/12313 [1:55:14<6:51:46, 2.54s/it] 21%|██ | 2570/12313 [1:55:17<6:48:12, 2.51s/it] {'loss': 0.7891, 'grad_norm': 3.748127121013163, 'learning_rate': 4.592925661007157e-06, 'epoch': 0.21} 21%|██ | 2570/12313 [1:55:17<6:48:12, 2.51s/it] 21%|██ | 2571/12313 [1:55:19<6:45:15, 2.50s/it] {'loss': 0.4914, 'grad_norm': 7.646211226477787, 'learning_rate': 4.592565907175381e-06, 'epoch': 0.21} 21%|██ | 2571/12313 [1:55:19<6:45:15, 2.50s/it] 21%|██ | 2572/12313 [1:55:22<6:42:01, 2.48s/it] {'loss': 0.5923, 'grad_norm': 4.10621409423765, 'learning_rate': 4.592206008549118e-06, 'epoch': 0.21} 21%|██ | 2572/12313 [1:55:22<6:42:01, 2.48s/it] 21%|██ | 2573/12313 [1:55:24<6:45:00, 2.49s/it] {'loss': 0.6719, 'grad_norm': 5.917514515415448, 'learning_rate': 4.591845965153272e-06, 'epoch': 0.21} 21%|██ | 2573/12313 [1:55:24<6:45:00, 2.49s/it] 21%|██ | 2574/12313 [1:55:27<6:46:24, 2.50s/it] {'loss': 0.636, 'grad_norm': 4.0238274192183985, 'learning_rate': 4.591485777012757e-06, 'epoch': 0.21} 21%|██ | 2574/12313 [1:55:27<6:46:24, 2.50s/it] 21%|██ | 2575/12313 [1:55:29<6:54:58, 2.56s/it] {'loss': 0.7144, 'grad_norm': 5.0280430135245116, 'learning_rate': 4.591125444152495e-06, 'epoch': 0.21} 21%|██ | 2575/12313 [1:55:29<6:54:58, 2.56s/it] 21%|██ | 2576/12313 [1:55:32<6:49:19, 2.52s/it] {'loss': 0.5855, 'grad_norm': 5.449721642811628, 'learning_rate': 4.590764966597419e-06, 'epoch': 0.21} 21%|██ | 2576/12313 [1:55:32<6:49:19, 2.52s/it] 21%|██ | 2577/12313 [1:55:34<6:59:24, 2.58s/it] {'loss': 0.5043, 'grad_norm': 5.452281756916645, 'learning_rate': 4.590404344372472e-06, 'epoch': 0.21} 21%|██ | 2577/12313 [1:55:34<6:59:24, 2.58s/it] 21%|██ | 2578/12313 [1:55:37<7:13:49, 2.67s/it] {'loss': 0.5569, 'grad_norm': 7.9224346525989375, 'learning_rate': 4.590043577502609e-06, 'epoch': 0.21} 21%|██ | 2578/12313 [1:55:37<7:13:49, 2.67s/it] 21%|██ | 2579/12313 [1:55:40<7:12:27, 2.67s/it] {'loss': 0.5516, 'grad_norm': 5.570197940829358, 'learning_rate': 4.589682666012791e-06, 'epoch': 0.21} 21%|██ | 2579/12313 [1:55:40<7:12:27, 2.67s/it] 21%|██ | 2580/12313 [1:55:42<7:05:13, 2.62s/it] {'loss': 0.7413, 'grad_norm': 4.7228930255767425, 'learning_rate': 4.5893216099279925e-06, 'epoch': 0.21} 21%|██ | 2580/12313 [1:55:42<7:05:13, 2.62s/it] 21%|██ | 2581/12313 [1:55:45<7:04:57, 2.62s/it] {'loss': 0.6607, 'grad_norm': 4.327696809821123, 'learning_rate': 4.588960409273196e-06, 'epoch': 0.21} 21%|██ | 2581/12313 [1:55:45<7:04:57, 2.62s/it] 21%|██ | 2582/12313 [1:55:48<7:09:43, 2.65s/it] {'loss': 0.6295, 'grad_norm': 3.271070383423843, 'learning_rate': 4.588599064073395e-06, 'epoch': 0.21} 21%|██ | 2582/12313 [1:55:48<7:09:43, 2.65s/it] 21%|██ | 2583/12313 [1:55:50<7:09:56, 2.65s/it] {'loss': 0.4415, 'grad_norm': 4.928203939520188, 'learning_rate': 4.588237574353592e-06, 'epoch': 0.21} 21%|██ | 2583/12313 [1:55:50<7:09:56, 2.65s/it] 21%|██ | 2584/12313 [1:55:53<7:12:29, 2.67s/it] {'loss': 0.4493, 'grad_norm': 4.524773784130613, 'learning_rate': 4.587875940138801e-06, 'epoch': 0.21} 21%|██ | 2584/12313 [1:55:53<7:12:29, 2.67s/it] 21%|██ | 2585/12313 [1:55:56<7:09:20, 2.65s/it] {'loss': 0.6702, 'grad_norm': 5.4612869658664005, 'learning_rate': 4.587514161454045e-06, 'epoch': 0.21} 21%|██ | 2585/12313 [1:55:56<7:09:20, 2.65s/it] 21%|██ | 2586/12313 [1:55:58<7:09:21, 2.65s/it] {'loss': 0.5599, 'grad_norm': 5.877089399039422, 'learning_rate': 4.587152238324357e-06, 'epoch': 0.21} 21%|██ | 2586/12313 [1:55:58<7:09:21, 2.65s/it] 21%|██ | 2587/12313 [1:56:01<7:12:21, 2.67s/it] {'loss': 0.6475, 'grad_norm': 5.615645343923605, 'learning_rate': 4.58679017077478e-06, 'epoch': 0.21} 21%|██ | 2587/12313 [1:56:01<7:12:21, 2.67s/it] 21%|██ | 2588/12313 [1:56:04<7:36:32, 2.82s/it] {'loss': 0.7045, 'grad_norm': 4.466358103769594, 'learning_rate': 4.586427958830367e-06, 'epoch': 0.21} 21%|██ | 2588/12313 [1:56:04<7:36:32, 2.82s/it] 21%|██ | 2589/12313 [1:56:07<7:40:33, 2.84s/it] {'loss': 0.5548, 'grad_norm': 3.905668647886352, 'learning_rate': 4.586065602516182e-06, 'epoch': 0.21} 21%|██ | 2589/12313 [1:56:07<7:40:33, 2.84s/it] 21%|██ | 2590/12313 [1:56:10<7:26:38, 2.76s/it] {'loss': 0.4948, 'grad_norm': 5.834395303055314, 'learning_rate': 4.585703101857298e-06, 'epoch': 0.21} 21%|██ | 2590/12313 [1:56:10<7:26:38, 2.76s/it] 21%|██ | 2591/12313 [1:56:12<7:13:13, 2.67s/it] {'loss': 0.4951, 'grad_norm': 5.414482998853449, 'learning_rate': 4.585340456878798e-06, 'epoch': 0.21} 21%|██ | 2591/12313 [1:56:12<7:13:13, 2.67s/it] 21%|██ | 2592/12313 [1:56:15<7:03:24, 2.61s/it] {'loss': 0.5363, 'grad_norm': 3.1815206113624535, 'learning_rate': 4.584977667605774e-06, 'epoch': 0.21} 21%|██ | 2592/12313 [1:56:15<7:03:24, 2.61s/it] 21%|██ | 2593/12313 [1:56:18<7:22:36, 2.73s/it] {'loss': 0.5075, 'grad_norm': 3.1419631204303915, 'learning_rate': 4.5846147340633305e-06, 'epoch': 0.21} 21%|██ | 2593/12313 [1:56:18<7:22:36, 2.73s/it] 21%|██ | 2594/12313 [1:56:20<7:22:40, 2.73s/it] {'loss': 0.543, 'grad_norm': 3.5141257327029205, 'learning_rate': 4.58425165627658e-06, 'epoch': 0.21} 21%|██ | 2594/12313 [1:56:20<7:22:40, 2.73s/it] 21%|██ | 2595/12313 [1:56:23<7:32:31, 2.79s/it] {'loss': 0.5039, 'grad_norm': 4.156860647624618, 'learning_rate': 4.583888434270645e-06, 'epoch': 0.21} 21%|██ | 2595/12313 [1:56:23<7:32:31, 2.79s/it] 21%|██ | 2596/12313 [1:56:26<7:27:42, 2.76s/it] {'loss': 0.5138, 'grad_norm': 4.750813630585856, 'learning_rate': 4.58352506807066e-06, 'epoch': 0.21} 21%|██ | 2596/12313 [1:56:26<7:27:42, 2.76s/it] 21%|██ | 2597/12313 [1:56:28<7:08:14, 2.64s/it] {'loss': 0.5763, 'grad_norm': 9.312381581764441, 'learning_rate': 4.583161557701767e-06, 'epoch': 0.21} 21%|██ | 2597/12313 [1:56:28<7:08:14, 2.64s/it] 21%|██ | 2598/12313 [1:56:31<7:09:02, 2.65s/it] {'loss': 0.5809, 'grad_norm': 8.915059751168238, 'learning_rate': 4.582797903189119e-06, 'epoch': 0.21} 21%|██ | 2598/12313 [1:56:31<7:09:02, 2.65s/it] 21%|██ | 2599/12313 [1:56:34<7:12:41, 2.67s/it] {'loss': 0.482, 'grad_norm': 3.648070795784496, 'learning_rate': 4.582434104557879e-06, 'epoch': 0.21} 21%|██ | 2599/12313 [1:56:34<7:12:41, 2.67s/it] 21%|██ | 2600/12313 [1:56:37<7:23:52, 2.74s/it] {'loss': 0.5429, 'grad_norm': 4.2519549882216845, 'learning_rate': 4.582070161833221e-06, 'epoch': 0.21} 21%|██ | 2600/12313 [1:56:37<7:23:52, 2.74s/it] 21%|██ | 2601/12313 [1:56:39<7:17:49, 2.70s/it] {'loss': 0.5398, 'grad_norm': 4.546548576906945, 'learning_rate': 4.581706075040326e-06, 'epoch': 0.21} 21%|██ | 2601/12313 [1:56:39<7:17:49, 2.70s/it] 21%|██ | 2602/12313 [1:56:42<7:07:14, 2.64s/it] {'loss': 0.5318, 'grad_norm': 5.001987184534323, 'learning_rate': 4.5813418442043885e-06, 'epoch': 0.21} 21%|██ | 2602/12313 [1:56:42<7:07:14, 2.64s/it] 21%|██ | 2603/12313 [1:56:44<7:06:16, 2.63s/it] {'loss': 0.5363, 'grad_norm': 6.362483259653969, 'learning_rate': 4.58097746935061e-06, 'epoch': 0.21} 21%|██ | 2603/12313 [1:56:44<7:06:16, 2.63s/it] 21%|██ | 2604/12313 [1:56:47<7:09:14, 2.65s/it] {'loss': 0.5545, 'grad_norm': 4.0989340540279064, 'learning_rate': 4.580612950504204e-06, 'epoch': 0.21} 21%|██ | 2604/12313 [1:56:47<7:09:14, 2.65s/it] 21%|██ | 2605/12313 [1:56:50<7:07:08, 2.64s/it] {'loss': 0.4948, 'grad_norm': 5.132444357319185, 'learning_rate': 4.580248287690394e-06, 'epoch': 0.21} 21%|██ | 2605/12313 [1:56:50<7:07:08, 2.64s/it] 21%|██ | 2606/12313 [1:56:52<7:09:32, 2.66s/it] {'loss': 0.5281, 'grad_norm': 7.374853871968238, 'learning_rate': 4.579883480934413e-06, 'epoch': 0.21} 21%|██ | 2606/12313 [1:56:52<7:09:32, 2.66s/it] 21%|██ | 2607/12313 [1:56:55<7:11:04, 2.66s/it] {'loss': 0.6517, 'grad_norm': 8.014579215103465, 'learning_rate': 4.579518530261501e-06, 'epoch': 0.21} 21%|██ | 2607/12313 [1:56:55<7:11:04, 2.66s/it] 21%|██ | 2608/12313 [1:56:58<7:06:19, 2.64s/it] {'loss': 0.5649, 'grad_norm': 4.7524183309310315, 'learning_rate': 4.579153435696913e-06, 'epoch': 0.21} 21%|██ | 2608/12313 [1:56:58<7:06:19, 2.64s/it] 21%|██ | 2609/12313 [1:57:00<7:01:11, 2.60s/it] {'loss': 0.4996, 'grad_norm': 3.2946505830547697, 'learning_rate': 4.578788197265911e-06, 'epoch': 0.21} 21%|██ | 2609/12313 [1:57:00<7:01:11, 2.60s/it] 21%|██ | 2610/12313 [1:57:03<7:02:21, 2.61s/it] {'loss': 0.5894, 'grad_norm': 4.2400296703901645, 'learning_rate': 4.578422814993768e-06, 'epoch': 0.21} 21%|██ | 2610/12313 [1:57:03<7:02:21, 2.61s/it] 21%|██ | 2611/12313 [1:57:05<6:58:30, 2.59s/it] {'loss': 0.6345, 'grad_norm': 5.74324372003261, 'learning_rate': 4.578057288905766e-06, 'epoch': 0.21} 21%|██ | 2611/12313 [1:57:05<6:58:30, 2.59s/it] 21%|██ | 2612/12313 [1:57:08<7:12:37, 2.68s/it] {'loss': 0.5196, 'grad_norm': 3.66763262733531, 'learning_rate': 4.577691619027197e-06, 'epoch': 0.21} 21%|██ | 2612/12313 [1:57:08<7:12:37, 2.68s/it] 21%|██ | 2613/12313 [1:57:11<7:09:26, 2.66s/it] {'loss': 0.4639, 'grad_norm': 7.157752603621458, 'learning_rate': 4.577325805383364e-06, 'epoch': 0.21} 21%|██ | 2613/12313 [1:57:11<7:09:26, 2.66s/it] 21%|██ | 2614/12313 [1:57:14<7:10:12, 2.66s/it] {'loss': 0.5971, 'grad_norm': 6.775920682174733, 'learning_rate': 4.57695984799958e-06, 'epoch': 0.21} 21%|██ | 2614/12313 [1:57:14<7:10:12, 2.66s/it] 21%|██ | 2615/12313 [1:57:16<7:14:59, 2.69s/it] {'loss': 0.6259, 'grad_norm': 5.5027350147320275, 'learning_rate': 4.576593746901166e-06, 'epoch': 0.21} 21%|██ | 2615/12313 [1:57:16<7:14:59, 2.69s/it] 21%|██ | 2616/12313 [1:57:19<7:17:17, 2.71s/it] {'loss': 0.5943, 'grad_norm': 4.893766528880051, 'learning_rate': 4.576227502113455e-06, 'epoch': 0.21} 21%|██ | 2616/12313 [1:57:19<7:17:17, 2.71s/it] 21%|██▏ | 2617/12313 [1:57:22<7:14:05, 2.69s/it] {'loss': 0.7361, 'grad_norm': 4.639805956669454, 'learning_rate': 4.575861113661791e-06, 'epoch': 0.21} 21%|██▏ | 2617/12313 [1:57:22<7:14:05, 2.69s/it] 21%|██▏ | 2618/12313 [1:57:24<7:17:49, 2.71s/it] {'loss': 0.3826, 'grad_norm': 4.20860766014754, 'learning_rate': 4.575494581571521e-06, 'epoch': 0.21} 21%|██▏ | 2618/12313 [1:57:24<7:17:49, 2.71s/it] 21%|██▏ | 2619/12313 [1:57:27<7:17:34, 2.71s/it] {'loss': 0.5891, 'grad_norm': 5.655888239003785, 'learning_rate': 4.575127905868013e-06, 'epoch': 0.21} 21%|██▏ | 2619/12313 [1:57:27<7:17:34, 2.71s/it] 21%|██▏ | 2620/12313 [1:57:30<7:12:12, 2.68s/it] {'loss': 0.4914, 'grad_norm': 5.032532015247516, 'learning_rate': 4.574761086576635e-06, 'epoch': 0.21} 21%|██▏ | 2620/12313 [1:57:30<7:12:12, 2.68s/it] 21%|██▏ | 2621/12313 [1:57:33<7:15:11, 2.69s/it] {'loss': 0.4658, 'grad_norm': 11.837098472717457, 'learning_rate': 4.57439412372277e-06, 'epoch': 0.21} 21%|██▏ | 2621/12313 [1:57:33<7:15:11, 2.69s/it] 21%|██▏ | 2622/12313 [1:57:35<7:08:56, 2.66s/it] {'loss': 0.5611, 'grad_norm': 14.561447419694415, 'learning_rate': 4.574027017331812e-06, 'epoch': 0.21} 21%|██▏ | 2622/12313 [1:57:35<7:08:56, 2.66s/it] 21%|██▏ | 2623/12313 [1:57:38<7:13:17, 2.68s/it] {'loss': 0.5933, 'grad_norm': 3.633636275262492, 'learning_rate': 4.57365976742916e-06, 'epoch': 0.21} 21%|██▏ | 2623/12313 [1:57:38<7:13:17, 2.68s/it] 21%|██▏ | 2624/12313 [1:57:40<7:10:32, 2.67s/it] {'loss': 0.5984, 'grad_norm': 6.065413078817048, 'learning_rate': 4.573292374040227e-06, 'epoch': 0.21} 21%|██▏ | 2624/12313 [1:57:40<7:10:32, 2.67s/it] 21%|██▏ | 2625/12313 [1:57:43<7:04:33, 2.63s/it] {'loss': 0.4037, 'grad_norm': 12.083369823179202, 'learning_rate': 4.572924837190434e-06, 'epoch': 0.21} 21%|██▏ | 2625/12313 [1:57:43<7:04:33, 2.63s/it] 21%|██▏ | 2626/12313 [1:57:46<7:07:25, 2.65s/it] {'loss': 0.6599, 'grad_norm': 4.512701432450698, 'learning_rate': 4.572557156905213e-06, 'epoch': 0.21} 21%|██▏ | 2626/12313 [1:57:46<7:07:25, 2.65s/it] 21%|██▏ | 2627/12313 [1:57:48<7:08:39, 2.66s/it] {'loss': 0.6408, 'grad_norm': 4.466963402457013, 'learning_rate': 4.572189333210007e-06, 'epoch': 0.21} 21%|██▏ | 2627/12313 [1:57:48<7:08:39, 2.66s/it] 21%|██▏ | 2628/12313 [1:57:51<7:03:35, 2.62s/it] {'loss': 0.4764, 'grad_norm': 4.5137793929351115, 'learning_rate': 4.571821366130265e-06, 'epoch': 0.21} 21%|██▏ | 2628/12313 [1:57:51<7:03:35, 2.62s/it] 21%|██▏ | 2629/12313 [1:57:54<7:03:52, 2.63s/it] {'loss': 0.4507, 'grad_norm': 3.8656056596236184, 'learning_rate': 4.571453255691449e-06, 'epoch': 0.21} 21%|██▏ | 2629/12313 [1:57:54<7:03:52, 2.63s/it] 21%|██▏ | 2630/12313 [1:57:56<7:05:44, 2.64s/it] {'loss': 0.6934, 'grad_norm': 4.382341801906267, 'learning_rate': 4.571085001919031e-06, 'epoch': 0.21} 21%|██▏ | 2630/12313 [1:57:56<7:05:44, 2.64s/it] 21%|██▏ | 2631/12313 [1:57:59<7:09:50, 2.66s/it] {'loss': 0.6249, 'grad_norm': 8.589875710292333, 'learning_rate': 4.570716604838492e-06, 'epoch': 0.21} 21%|██▏ | 2631/12313 [1:57:59<7:09:50, 2.66s/it] 21%|██▏ | 2632/12313 [1:58:01<7:04:10, 2.63s/it] {'loss': 0.6744, 'grad_norm': 4.007630827099167, 'learning_rate': 4.570348064475323e-06, 'epoch': 0.21} 21%|██▏ | 2632/12313 [1:58:01<7:04:10, 2.63s/it] 21%|██▏ | 2633/12313 [1:58:04<7:03:16, 2.62s/it] {'loss': 0.5493, 'grad_norm': 5.484359947965143, 'learning_rate': 4.569979380855025e-06, 'epoch': 0.21} 21%|██▏ | 2633/12313 [1:58:04<7:03:16, 2.62s/it] 21%|██▏ | 2634/12313 [1:58:07<7:18:28, 2.72s/it] {'loss': 0.5519, 'grad_norm': 4.493010697403236, 'learning_rate': 4.56961055400311e-06, 'epoch': 0.21} 21%|██▏ | 2634/12313 [1:58:07<7:18:28, 2.72s/it] 21%|██▏ | 2635/12313 [1:58:10<7:18:34, 2.72s/it] {'loss': 0.59, 'grad_norm': 4.045159603215263, 'learning_rate': 4.5692415839450965e-06, 'epoch': 0.21} 21%|██▏ | 2635/12313 [1:58:10<7:18:34, 2.72s/it] 21%|██▏ | 2636/12313 [1:58:12<7:11:42, 2.68s/it] {'loss': 0.3977, 'grad_norm': 4.821023174942057, 'learning_rate': 4.568872470706518e-06, 'epoch': 0.21} 21%|██▏ | 2636/12313 [1:58:12<7:11:42, 2.68s/it] 21%|██▏ | 2637/12313 [1:58:15<7:11:40, 2.68s/it] {'loss': 0.5558, 'grad_norm': 5.675194155783661, 'learning_rate': 4.568503214312913e-06, 'epoch': 0.21} 21%|██▏ | 2637/12313 [1:58:15<7:11:40, 2.68s/it] 21%|██▏ | 2638/12313 [1:58:18<7:18:32, 2.72s/it] {'loss': 0.5638, 'grad_norm': 4.7601824577948015, 'learning_rate': 4.568133814789833e-06, 'epoch': 0.21} 21%|██▏ | 2638/12313 [1:58:18<7:18:32, 2.72s/it] 21%|██▏ | 2639/12313 [1:58:20<7:11:49, 2.68s/it] {'loss': 0.5493, 'grad_norm': 5.107493849030825, 'learning_rate': 4.567764272162839e-06, 'epoch': 0.21} 21%|██▏ | 2639/12313 [1:58:20<7:11:49, 2.68s/it] 21%|██▏ | 2640/12313 [1:58:23<7:14:01, 2.69s/it] {'loss': 0.5961, 'grad_norm': 4.8431041110711766, 'learning_rate': 4.567394586457501e-06, 'epoch': 0.21} 21%|██▏ | 2640/12313 [1:58:23<7:14:01, 2.69s/it] 21%|██▏ | 2641/12313 [1:58:26<7:10:15, 2.67s/it] {'loss': 0.5455, 'grad_norm': 6.24655286207524, 'learning_rate': 4.567024757699399e-06, 'epoch': 0.21} 21%|██▏ | 2641/12313 [1:58:26<7:10:15, 2.67s/it] 21%|██▏ | 2642/12313 [1:58:29<7:21:19, 2.74s/it] {'loss': 0.5451, 'grad_norm': 4.819548308107245, 'learning_rate': 4.566654785914123e-06, 'epoch': 0.21} 21%|██▏ | 2642/12313 [1:58:29<7:21:19, 2.74s/it] 21%|██▏ | 2643/12313 [1:58:31<7:24:23, 2.76s/it] {'loss': 0.4268, 'grad_norm': 8.148594207313314, 'learning_rate': 4.566284671127273e-06, 'epoch': 0.21} 21%|██▏ | 2643/12313 [1:58:31<7:24:23, 2.76s/it] 21%|██▏ | 2644/12313 [1:58:34<7:12:52, 2.69s/it] {'loss': 0.5667, 'grad_norm': 5.200028510658096, 'learning_rate': 4.56591441336446e-06, 'epoch': 0.21} 21%|██▏ | 2644/12313 [1:58:34<7:12:52, 2.69s/it] 21%|██▏ | 2645/12313 [1:58:36<7:03:52, 2.63s/it] {'loss': 0.9163, 'grad_norm': 3.953005099419873, 'learning_rate': 4.565544012651304e-06, 'epoch': 0.21} 21%|██▏ | 2645/12313 [1:58:36<7:03:52, 2.63s/it] 21%|██▏ | 2646/12313 [1:58:39<6:54:27, 2.57s/it] {'loss': 0.4929, 'grad_norm': 6.166119347251547, 'learning_rate': 4.565173469013432e-06, 'epoch': 0.21} 21%|██▏ | 2646/12313 [1:58:39<6:54:27, 2.57s/it] 21%|██▏ | 2647/12313 [1:58:42<7:07:16, 2.65s/it] {'loss': 0.5905, 'grad_norm': 17.196433928853956, 'learning_rate': 4.564802782476487e-06, 'epoch': 0.21} 21%|██▏ | 2647/12313 [1:58:42<7:07:16, 2.65s/it] 22%|██▏ | 2648/12313 [1:58:44<7:10:05, 2.67s/it] {'loss': 0.7147, 'grad_norm': 6.178896351260531, 'learning_rate': 4.564431953066118e-06, 'epoch': 0.22} 22%|██▏ | 2648/12313 [1:58:44<7:10:05, 2.67s/it] 22%|██▏ | 2649/12313 [1:58:47<7:09:21, 2.67s/it] {'loss': 0.7101, 'grad_norm': 6.440122516675341, 'learning_rate': 4.564060980807983e-06, 'epoch': 0.22} 22%|██▏ | 2649/12313 [1:58:47<7:09:21, 2.67s/it] 22%|██▏ | 2650/12313 [1:58:50<7:29:13, 2.79s/it] {'loss': 0.6096, 'grad_norm': 5.2000555006994125, 'learning_rate': 4.563689865727752e-06, 'epoch': 0.22} 22%|██▏ | 2650/12313 [1:58:50<7:29:13, 2.79s/it] 22%|██▏ | 2651/12313 [1:58:53<7:43:06, 2.88s/it] {'loss': 0.5928, 'grad_norm': 4.6419662478035155, 'learning_rate': 4.563318607851104e-06, 'epoch': 0.22} 22%|██▏ | 2651/12313 [1:58:53<7:43:06, 2.88s/it] 22%|██▏ | 2652/12313 [1:58:56<7:37:41, 2.84s/it] {'loss': 0.4954, 'grad_norm': 8.141047545711643, 'learning_rate': 4.562947207203728e-06, 'epoch': 0.22} 22%|██▏ | 2652/12313 [1:58:56<7:37:41, 2.84s/it] 22%|██▏ | 2653/12313 [1:58:59<7:25:16, 2.77s/it] {'loss': 0.6568, 'grad_norm': 3.9486721093318153, 'learning_rate': 4.562575663811324e-06, 'epoch': 0.22} 22%|██▏ | 2653/12313 [1:58:59<7:25:16, 2.77s/it] 22%|██▏ | 2654/12313 [1:59:01<7:21:25, 2.74s/it] {'loss': 0.5941, 'grad_norm': 5.149290757056804, 'learning_rate': 4.5622039776996006e-06, 'epoch': 0.22} 22%|██▏ | 2654/12313 [1:59:01<7:21:25, 2.74s/it] 22%|██▏ | 2655/12313 [1:59:04<7:31:25, 2.80s/it] {'loss': 0.4204, 'grad_norm': 6.391781782039198, 'learning_rate': 4.561832148894275e-06, 'epoch': 0.22} 22%|██▏ | 2655/12313 [1:59:04<7:31:25, 2.80s/it] 22%|██▏ | 2656/12313 [1:59:07<7:21:58, 2.75s/it] {'loss': 0.5599, 'grad_norm': 9.968941914044036, 'learning_rate': 4.561460177421078e-06, 'epoch': 0.22} 22%|██▏ | 2656/12313 [1:59:07<7:21:58, 2.75s/it] 22%|██▏ | 2657/12313 [1:59:09<7:09:15, 2.67s/it] {'loss': 0.6469, 'grad_norm': 5.522046324373681, 'learning_rate': 4.561088063305745e-06, 'epoch': 0.22} 22%|██▏ | 2657/12313 [1:59:09<7:09:15, 2.67s/it] 22%|██▏ | 2658/12313 [1:59:12<7:22:04, 2.75s/it] {'loss': 0.6275, 'grad_norm': 3.8464087079545277, 'learning_rate': 4.560715806574028e-06, 'epoch': 0.22} 22%|██▏ | 2658/12313 [1:59:12<7:22:04, 2.75s/it] 22%|██▏ | 2659/12313 [1:59:15<7:15:22, 2.71s/it] {'loss': 0.5032, 'grad_norm': 5.078649015582283, 'learning_rate': 4.560343407251682e-06, 'epoch': 0.22} 22%|██▏ | 2659/12313 [1:59:15<7:15:22, 2.71s/it] 22%|██▏ | 2660/12313 [1:59:18<7:11:33, 2.68s/it] {'loss': 0.6853, 'grad_norm': 4.334473048243304, 'learning_rate': 4.559970865364477e-06, 'epoch': 0.22} 22%|██▏ | 2660/12313 [1:59:18<7:11:33, 2.68s/it] 22%|██▏ | 2661/12313 [1:59:20<7:00:02, 2.61s/it] {'loss': 0.5029, 'grad_norm': 6.419291069877329, 'learning_rate': 4.55959818093819e-06, 'epoch': 0.22} 22%|██▏ | 2661/12313 [1:59:20<7:00:02, 2.61s/it] 22%|██▏ | 2662/12313 [1:59:23<7:01:15, 2.62s/it] {'loss': 0.4812, 'grad_norm': 5.8025560914497385, 'learning_rate': 4.559225353998609e-06, 'epoch': 0.22} 22%|██▏ | 2662/12313 [1:59:23<7:01:15, 2.62s/it] 22%|██▏ | 2663/12313 [1:59:25<7:06:24, 2.65s/it] {'loss': 0.5198, 'grad_norm': 4.30038594406105, 'learning_rate': 4.558852384571533e-06, 'epoch': 0.22} 22%|██▏ | 2663/12313 [1:59:25<7:06:24, 2.65s/it] 22%|██▏ | 2664/12313 [1:59:28<7:10:47, 2.68s/it] {'loss': 0.6267, 'grad_norm': 4.101496826162754, 'learning_rate': 4.558479272682768e-06, 'epoch': 0.22} 22%|██▏ | 2664/12313 [1:59:28<7:10:47, 2.68s/it] 22%|██▏ | 2665/12313 [1:59:31<7:08:57, 2.67s/it] {'loss': 0.5187, 'grad_norm': 5.736035126104208, 'learning_rate': 4.558106018358131e-06, 'epoch': 0.22} 22%|██▏ | 2665/12313 [1:59:31<7:08:57, 2.67s/it] 22%|██▏ | 2666/12313 [1:59:34<7:26:49, 2.78s/it] {'loss': 0.5716, 'grad_norm': 5.345843996754261, 'learning_rate': 4.557732621623449e-06, 'epoch': 0.22} 22%|██▏ | 2666/12313 [1:59:34<7:26:49, 2.78s/it] 22%|██▏ | 2667/12313 [1:59:36<7:18:13, 2.73s/it] {'loss': 0.4583, 'grad_norm': 4.853630853027857, 'learning_rate': 4.557359082504562e-06, 'epoch': 0.22} 22%|██▏ | 2667/12313 [1:59:36<7:18:13, 2.73s/it] 22%|██▏ | 2668/12313 [1:59:39<7:15:52, 2.71s/it] {'loss': 0.4872, 'grad_norm': 3.473558582520133, 'learning_rate': 4.556985401027314e-06, 'epoch': 0.22} 22%|██▏ | 2668/12313 [1:59:39<7:15:52, 2.71s/it] 22%|██▏ | 2669/12313 [1:59:42<7:16:21, 2.71s/it] {'loss': 0.5814, 'grad_norm': 6.011752197100141, 'learning_rate': 4.556611577217563e-06, 'epoch': 0.22} 22%|██▏ | 2669/12313 [1:59:42<7:16:21, 2.71s/it] 22%|██▏ | 2670/12313 [1:59:45<7:18:49, 2.73s/it] {'loss': 0.4709, 'grad_norm': 4.738766968848329, 'learning_rate': 4.5562376111011745e-06, 'epoch': 0.22} 22%|██▏ | 2670/12313 [1:59:45<7:18:49, 2.73s/it] 22%|██▏ | 2671/12313 [1:59:47<7:04:07, 2.64s/it] {'loss': 0.542, 'grad_norm': 9.10744576822217, 'learning_rate': 4.5558635027040265e-06, 'epoch': 0.22} 22%|██▏ | 2671/12313 [1:59:47<7:04:07, 2.64s/it] 22%|██▏ | 2672/12313 [1:59:50<7:03:52, 2.64s/it] {'loss': 0.6779, 'grad_norm': 3.811056086790161, 'learning_rate': 4.555489252052005e-06, 'epoch': 0.22} 22%|██▏ | 2672/12313 [1:59:50<7:03:52, 2.64s/it] 22%|██▏ | 2673/12313 [1:59:53<7:52:50, 2.94s/it] {'loss': 0.491, 'grad_norm': 5.148351495523538, 'learning_rate': 4.5551148591710045e-06, 'epoch': 0.22} 22%|██▏ | 2673/12313 [1:59:53<7:52:50, 2.94s/it] 22%|██▏ | 2674/12313 [1:59:56<7:34:46, 2.83s/it] {'loss': 0.6449, 'grad_norm': 5.581434436680359, 'learning_rate': 4.5547403240869335e-06, 'epoch': 0.22} 22%|██▏ | 2674/12313 [1:59:56<7:34:46, 2.83s/it] 22%|██▏ | 2675/12313 [1:59:58<7:20:06, 2.74s/it] {'loss': 0.7014, 'grad_norm': 3.742718625839809, 'learning_rate': 4.554365646825706e-06, 'epoch': 0.22} 22%|██▏ | 2675/12313 [1:59:58<7:20:06, 2.74s/it] 22%|██▏ | 2676/12313 [2:00:01<7:33:05, 2.82s/it] {'loss': 0.596, 'grad_norm': 6.612469532237822, 'learning_rate': 4.5539908274132485e-06, 'epoch': 0.22} 22%|██▏ | 2676/12313 [2:00:01<7:33:05, 2.82s/it] 22%|██▏ | 2677/12313 [2:00:04<7:26:35, 2.78s/it] {'loss': 0.7381, 'grad_norm': 4.306520677998056, 'learning_rate': 4.553615865875496e-06, 'epoch': 0.22} 22%|██▏ | 2677/12313 [2:00:04<7:26:35, 2.78s/it] 22%|██▏ | 2678/12313 [2:00:07<7:17:49, 2.73s/it] {'loss': 0.6964, 'grad_norm': 4.614381185208202, 'learning_rate': 4.553240762238394e-06, 'epoch': 0.22} 22%|██▏ | 2678/12313 [2:00:07<7:17:49, 2.73s/it] 22%|██▏ | 2679/12313 [2:00:09<7:10:04, 2.68s/it] {'loss': 0.5004, 'grad_norm': 7.406343400623716, 'learning_rate': 4.552865516527899e-06, 'epoch': 0.22} 22%|██▏ | 2679/12313 [2:00:09<7:10:04, 2.68s/it] 22%|██▏ | 2680/12313 [2:00:12<7:04:59, 2.65s/it] {'loss': 0.6269, 'grad_norm': 3.9606433285981266, 'learning_rate': 4.552490128769975e-06, 'epoch': 0.22} 22%|██▏ | 2680/12313 [2:00:12<7:04:59, 2.65s/it] 22%|██▏ | 2681/12313 [2:00:14<7:04:39, 2.65s/it] {'loss': 0.5763, 'grad_norm': 4.11495963139638, 'learning_rate': 4.5521145989905955e-06, 'epoch': 0.22} 22%|██▏ | 2681/12313 [2:00:14<7:04:39, 2.65s/it] 22%|██▏ | 2682/12313 [2:00:17<6:49:44, 2.55s/it] {'loss': 0.4674, 'grad_norm': 4.588404789142679, 'learning_rate': 4.551738927215747e-06, 'epoch': 0.22} 22%|██▏ | 2682/12313 [2:00:17<6:49:44, 2.55s/it] 22%|██▏ | 2683/12313 [2:00:20<7:03:18, 2.64s/it] {'loss': 0.5801, 'grad_norm': 7.297594706002203, 'learning_rate': 4.5513631134714235e-06, 'epoch': 0.22} 22%|██▏ | 2683/12313 [2:00:20<7:03:18, 2.64s/it] 22%|██▏ | 2684/12313 [2:00:22<6:58:46, 2.61s/it] {'loss': 0.4813, 'grad_norm': 4.3047090539293045, 'learning_rate': 4.550987157783629e-06, 'epoch': 0.22} 22%|██▏ | 2684/12313 [2:00:22<6:58:46, 2.61s/it] 22%|██▏ | 2685/12313 [2:00:25<6:49:53, 2.55s/it] {'loss': 0.567, 'grad_norm': 5.768465577812249, 'learning_rate': 4.550611060178378e-06, 'epoch': 0.22} 22%|██▏ | 2685/12313 [2:00:25<6:49:53, 2.55s/it] 22%|██▏ | 2686/12313 [2:00:27<6:53:22, 2.58s/it] {'loss': 0.6873, 'grad_norm': 5.651861901253761, 'learning_rate': 4.550234820681695e-06, 'epoch': 0.22} 22%|██▏ | 2686/12313 [2:00:27<6:53:22, 2.58s/it] 22%|██▏ | 2687/12313 [2:00:30<6:46:54, 2.54s/it] {'loss': 0.6324, 'grad_norm': 9.183694716367905, 'learning_rate': 4.549858439319612e-06, 'epoch': 0.22} 22%|██▏ | 2687/12313 [2:00:30<6:46:54, 2.54s/it] 22%|██▏ | 2688/12313 [2:00:32<6:51:26, 2.56s/it] {'loss': 0.5873, 'grad_norm': 7.585915804173281, 'learning_rate': 4.549481916118174e-06, 'epoch': 0.22} 22%|██▏ | 2688/12313 [2:00:32<6:51:26, 2.56s/it] 22%|██▏ | 2689/12313 [2:00:35<7:06:31, 2.66s/it] {'loss': 0.6382, 'grad_norm': 3.760740144941256, 'learning_rate': 4.5491052511034345e-06, 'epoch': 0.22} 22%|██▏ | 2689/12313 [2:00:35<7:06:31, 2.66s/it] 22%|██▏ | 2690/12313 [2:00:38<6:58:45, 2.61s/it] {'loss': 0.5818, 'grad_norm': 4.945745347505512, 'learning_rate': 4.548728444301456e-06, 'epoch': 0.22} 22%|██▏ | 2690/12313 [2:00:38<6:58:45, 2.61s/it] 22%|██▏ | 2691/12313 [2:00:40<7:00:05, 2.62s/it] {'loss': 0.7123, 'grad_norm': 4.9769795576864215, 'learning_rate': 4.548351495738312e-06, 'epoch': 0.22} 22%|██▏ | 2691/12313 [2:00:40<7:00:05, 2.62s/it] 22%|██▏ | 2692/12313 [2:00:43<7:00:02, 2.62s/it] {'loss': 0.4976, 'grad_norm': 4.481119259930459, 'learning_rate': 4.547974405440085e-06, 'epoch': 0.22} 22%|██▏ | 2692/12313 [2:00:43<7:00:02, 2.62s/it] 22%|██▏ | 2693/12313 [2:00:46<7:08:39, 2.67s/it] {'loss': 0.5487, 'grad_norm': 5.7374511609884955, 'learning_rate': 4.547597173432869e-06, 'epoch': 0.22} 22%|██▏ | 2693/12313 [2:00:46<7:08:39, 2.67s/it] 22%|██▏ | 2694/12313 [2:00:48<7:11:15, 2.69s/it] {'loss': 0.7687, 'grad_norm': 5.502760303179846, 'learning_rate': 4.547219799742765e-06, 'epoch': 0.22} 22%|██▏ | 2694/12313 [2:00:48<7:11:15, 2.69s/it] 22%|██▏ | 2695/12313 [2:00:51<7:04:07, 2.65s/it] {'loss': 0.5052, 'grad_norm': 10.421957536535567, 'learning_rate': 4.5468422843958845e-06, 'epoch': 0.22} 22%|██▏ | 2695/12313 [2:00:51<7:04:07, 2.65s/it] 22%|██▏ | 2696/12313 [2:00:54<7:06:09, 2.66s/it] {'loss': 0.5888, 'grad_norm': 3.6707221757502917, 'learning_rate': 4.546464627418351e-06, 'epoch': 0.22} 22%|██▏ | 2696/12313 [2:00:54<7:06:09, 2.66s/it] 22%|██▏ | 2697/12313 [2:00:56<7:04:39, 2.65s/it] {'loss': 0.6277, 'grad_norm': 3.304249792857001, 'learning_rate': 4.546086828836297e-06, 'epoch': 0.22} 22%|██▏ | 2697/12313 [2:00:56<7:04:39, 2.65s/it] 22%|██▏ | 2698/12313 [2:00:59<7:05:14, 2.65s/it] {'loss': 0.6074, 'grad_norm': 4.76062394973188, 'learning_rate': 4.545708888675862e-06, 'epoch': 0.22} 22%|██▏ | 2698/12313 [2:00:59<7:05:14, 2.65s/it] 22%|██▏ | 2699/12313 [2:01:02<7:10:06, 2.68s/it] {'loss': 0.5367, 'grad_norm': 4.462680720514244, 'learning_rate': 4.5453308069632e-06, 'epoch': 0.22} 22%|██▏ | 2699/12313 [2:01:02<7:10:06, 2.68s/it] 22%|██▏ | 2700/12313 [2:01:04<7:09:17, 2.68s/it] {'loss': 0.5074, 'grad_norm': 7.372740705732592, 'learning_rate': 4.54495258372447e-06, 'epoch': 0.22} 22%|██▏ | 2700/12313 [2:01:04<7:09:17, 2.68s/it] 22%|██▏ | 2701/12313 [2:01:07<7:04:08, 2.65s/it] {'loss': 0.4761, 'grad_norm': 7.860054087415267, 'learning_rate': 4.544574218985845e-06, 'epoch': 0.22} 22%|██▏ | 2701/12313 [2:01:07<7:04:08, 2.65s/it] 22%|██▏ | 2702/12313 [2:01:10<7:07:28, 2.67s/it] {'loss': 0.5659, 'grad_norm': 8.081036919499534, 'learning_rate': 4.544195712773504e-06, 'epoch': 0.22} 22%|██▏ | 2702/12313 [2:01:10<7:07:28, 2.67s/it] 22%|██▏ | 2703/12313 [2:01:12<7:03:47, 2.65s/it] {'loss': 0.462, 'grad_norm': 5.8269057668228985, 'learning_rate': 4.543817065113638e-06, 'epoch': 0.22} 22%|██▏ | 2703/12313 [2:01:12<7:03:47, 2.65s/it] 22%|██▏ | 2704/12313 [2:01:15<7:08:54, 2.68s/it] {'loss': 0.4101, 'grad_norm': 4.2668146590202305, 'learning_rate': 4.543438276032448e-06, 'epoch': 0.22} 22%|██▏ | 2704/12313 [2:01:15<7:08:54, 2.68s/it] 22%|██▏ | 2705/12313 [2:01:18<7:02:30, 2.64s/it] {'loss': 0.48, 'grad_norm': 5.140738961702395, 'learning_rate': 4.543059345556145e-06, 'epoch': 0.22} 22%|██▏ | 2705/12313 [2:01:18<7:02:30, 2.64s/it] 22%|██▏ | 2706/12313 [2:01:20<6:59:24, 2.62s/it] {'loss': 0.5453, 'grad_norm': 5.170374218467899, 'learning_rate': 4.542680273710947e-06, 'epoch': 0.22} 22%|██▏ | 2706/12313 [2:01:20<6:59:24, 2.62s/it] 22%|██▏ | 2707/12313 [2:01:23<7:03:53, 2.65s/it] {'loss': 0.5962, 'grad_norm': 5.753753477369303, 'learning_rate': 4.542301060523086e-06, 'epoch': 0.22} 22%|██▏ | 2707/12313 [2:01:23<7:03:53, 2.65s/it] 22%|██▏ | 2708/12313 [2:01:25<6:58:51, 2.62s/it] {'loss': 0.561, 'grad_norm': 4.532213136821538, 'learning_rate': 4.541921706018799e-06, 'epoch': 0.22} 22%|██▏ | 2708/12313 [2:01:25<6:58:51, 2.62s/it] 22%|██▏ | 2709/12313 [2:01:28<7:00:46, 2.63s/it] {'loss': 0.6049, 'grad_norm': 5.5635492887810125, 'learning_rate': 4.541542210224337e-06, 'epoch': 0.22} 22%|██▏ | 2709/12313 [2:01:28<7:00:46, 2.63s/it] 22%|██▏ | 2710/12313 [2:01:31<7:01:24, 2.63s/it] {'loss': 0.5815, 'grad_norm': 4.789661183080903, 'learning_rate': 4.5411625731659595e-06, 'epoch': 0.22} 22%|██▏ | 2710/12313 [2:01:31<7:01:24, 2.63s/it] 22%|██▏ | 2711/12313 [2:01:33<7:01:28, 2.63s/it] {'loss': 0.5184, 'grad_norm': 3.231674262508777, 'learning_rate': 4.540782794869933e-06, 'epoch': 0.22} 22%|██▏ | 2711/12313 [2:01:33<7:01:28, 2.63s/it] 22%|██▏ | 2712/12313 [2:01:36<6:58:58, 2.62s/it] {'loss': 0.4839, 'grad_norm': 6.341088144720809, 'learning_rate': 4.5404028753625396e-06, 'epoch': 0.22} 22%|██▏ | 2712/12313 [2:01:36<6:58:58, 2.62s/it] 22%|██▏ | 2713/12313 [2:01:39<7:00:47, 2.63s/it] {'loss': 0.6509, 'grad_norm': 4.816690291787395, 'learning_rate': 4.5400228146700654e-06, 'epoch': 0.22} 22%|██▏ | 2713/12313 [2:01:39<7:00:47, 2.63s/it] 22%|██▏ | 2714/12313 [2:01:41<7:05:28, 2.66s/it] {'loss': 0.627, 'grad_norm': 3.6214196801416474, 'learning_rate': 4.539642612818809e-06, 'epoch': 0.22} 22%|██▏ | 2714/12313 [2:01:41<7:05:28, 2.66s/it] 22%|██▏ | 2715/12313 [2:01:44<6:59:03, 2.62s/it] {'loss': 0.4846, 'grad_norm': 7.585870445781305, 'learning_rate': 4.539262269835078e-06, 'epoch': 0.22} 22%|██▏ | 2715/12313 [2:01:44<6:59:03, 2.62s/it] 22%|██▏ | 2716/12313 [2:01:47<7:06:57, 2.67s/it] {'loss': 0.5169, 'grad_norm': 5.185206149837472, 'learning_rate': 4.538881785745191e-06, 'epoch': 0.22} 22%|██▏ | 2716/12313 [2:01:47<7:06:57, 2.67s/it] 22%|██▏ | 2717/12313 [2:01:49<6:59:49, 2.62s/it] {'loss': 0.5466, 'grad_norm': 9.120252331770077, 'learning_rate': 4.538501160575475e-06, 'epoch': 0.22} 22%|██▏ | 2717/12313 [2:01:49<6:59:49, 2.62s/it] 22%|██▏ | 2718/12313 [2:01:52<7:17:57, 2.74s/it] {'loss': 0.5654, 'grad_norm': 4.271633224077144, 'learning_rate': 4.538120394352267e-06, 'epoch': 0.22} 22%|██▏ | 2718/12313 [2:01:52<7:17:57, 2.74s/it] 22%|██▏ | 2719/12313 [2:01:55<7:11:43, 2.70s/it] {'loss': 0.5984, 'grad_norm': 6.425265204493728, 'learning_rate': 4.5377394871019145e-06, 'epoch': 0.22} 22%|██▏ | 2719/12313 [2:01:55<7:11:43, 2.70s/it] 22%|██▏ | 2720/12313 [2:01:57<7:09:34, 2.69s/it] {'loss': 0.5098, 'grad_norm': 5.662199724686198, 'learning_rate': 4.5373584388507745e-06, 'epoch': 0.22} 22%|██▏ | 2720/12313 [2:01:57<7:09:34, 2.69s/it] 22%|██▏ | 2721/12313 [2:02:00<7:09:22, 2.69s/it] {'loss': 0.49, 'grad_norm': 7.194165344057847, 'learning_rate': 4.536977249625213e-06, 'epoch': 0.22} 22%|██▏ | 2721/12313 [2:02:00<7:09:22, 2.69s/it] 22%|██▏ | 2722/12313 [2:02:03<7:42:03, 2.89s/it] {'loss': 0.6383, 'grad_norm': 6.981685701050804, 'learning_rate': 4.536595919451606e-06, 'epoch': 0.22} 22%|██▏ | 2722/12313 [2:02:03<7:42:03, 2.89s/it] 22%|██▏ | 2723/12313 [2:02:06<7:25:23, 2.79s/it] {'loss': 0.5659, 'grad_norm': 4.116671598914044, 'learning_rate': 4.53621444835634e-06, 'epoch': 0.22} 22%|██▏ | 2723/12313 [2:02:06<7:25:23, 2.79s/it] 22%|██▏ | 2724/12313 [2:02:09<7:13:38, 2.71s/it] {'loss': 0.4805, 'grad_norm': 5.134961603692183, 'learning_rate': 4.535832836365811e-06, 'epoch': 0.22} 22%|██▏ | 2724/12313 [2:02:09<7:13:38, 2.71s/it] 22%|██▏ | 2725/12313 [2:02:11<7:12:53, 2.71s/it] {'loss': 0.7364, 'grad_norm': 4.576284219113115, 'learning_rate': 4.535451083506424e-06, 'epoch': 0.22} 22%|██▏ | 2725/12313 [2:02:11<7:12:53, 2.71s/it] 22%|██▏ | 2726/12313 [2:02:14<7:07:33, 2.68s/it] {'loss': 0.5578, 'grad_norm': 6.0501880740602365, 'learning_rate': 4.535069189804594e-06, 'epoch': 0.22} 22%|██▏ | 2726/12313 [2:02:14<7:07:33, 2.68s/it] 22%|██▏ | 2727/12313 [2:02:16<7:02:08, 2.64s/it] {'loss': 0.5017, 'grad_norm': 4.658417802807258, 'learning_rate': 4.534687155286747e-06, 'epoch': 0.22} 22%|██▏ | 2727/12313 [2:02:16<7:02:08, 2.64s/it] 22%|██▏ | 2728/12313 [2:02:19<7:06:10, 2.67s/it] {'loss': 0.5166, 'grad_norm': 3.5188270873838117, 'learning_rate': 4.534304979979317e-06, 'epoch': 0.22} 22%|██▏ | 2728/12313 [2:02:19<7:06:10, 2.67s/it] 22%|██▏ | 2729/12313 [2:02:22<7:02:05, 2.64s/it] {'loss': 0.717, 'grad_norm': 3.6771178180337865, 'learning_rate': 4.53392266390875e-06, 'epoch': 0.22} 22%|██▏ | 2729/12313 [2:02:22<7:02:05, 2.64s/it] 22%|██▏ | 2730/12313 [2:02:25<7:49:58, 2.94s/it] {'loss': 0.676, 'grad_norm': 4.874076485337167, 'learning_rate': 4.533540207101498e-06, 'epoch': 0.22} 22%|██▏ | 2730/12313 [2:02:25<7:49:58, 2.94s/it] 22%|██▏ | 2731/12313 [2:02:28<7:36:58, 2.86s/it] {'loss': 0.7047, 'grad_norm': 3.3053339197225307, 'learning_rate': 4.533157609584026e-06, 'epoch': 0.22} 22%|██▏ | 2731/12313 [2:02:28<7:36:58, 2.86s/it] 22%|██▏ | 2732/12313 [2:02:31<7:28:11, 2.81s/it] {'loss': 0.6683, 'grad_norm': 4.0334080892398845, 'learning_rate': 4.532774871382807e-06, 'epoch': 0.22} 22%|██▏ | 2732/12313 [2:02:31<7:28:11, 2.81s/it] 22%|██▏ | 2733/12313 [2:02:33<7:18:11, 2.74s/it] {'loss': 0.5633, 'grad_norm': 7.9679246058681885, 'learning_rate': 4.532391992524327e-06, 'epoch': 0.22} 22%|██▏ | 2733/12313 [2:02:33<7:18:11, 2.74s/it] 22%|██▏ | 2734/12313 [2:02:36<7:08:38, 2.68s/it] {'loss': 0.6868, 'grad_norm': 5.020060727741971, 'learning_rate': 4.532008973035076e-06, 'epoch': 0.22} 22%|██▏ | 2734/12313 [2:02:36<7:08:38, 2.68s/it] 22%|██▏ | 2735/12313 [2:02:38<7:04:32, 2.66s/it] {'loss': 0.5032, 'grad_norm': 3.77106126429378, 'learning_rate': 4.531625812941559e-06, 'epoch': 0.22} 22%|██▏ | 2735/12313 [2:02:38<7:04:32, 2.66s/it] 22%|██▏ | 2736/12313 [2:02:41<7:10:41, 2.70s/it] {'loss': 0.7004, 'grad_norm': 3.682569403476453, 'learning_rate': 4.531242512270287e-06, 'epoch': 0.22} 22%|██▏ | 2736/12313 [2:02:41<7:10:41, 2.70s/it] 22%|██▏ | 2737/12313 [2:02:44<7:11:40, 2.70s/it] {'loss': 0.5239, 'grad_norm': 4.487348592439573, 'learning_rate': 4.530859071047785e-06, 'epoch': 0.22} 22%|██▏ | 2737/12313 [2:02:44<7:11:40, 2.70s/it] 22%|██▏ | 2738/12313 [2:02:46<7:02:30, 2.65s/it] {'loss': 0.4732, 'grad_norm': 4.188000337446031, 'learning_rate': 4.530475489300583e-06, 'epoch': 0.22} 22%|██▏ | 2738/12313 [2:02:46<7:02:30, 2.65s/it] 22%|██▏ | 2739/12313 [2:02:49<7:01:11, 2.64s/it] {'loss': 0.4986, 'grad_norm': 8.498349813305607, 'learning_rate': 4.530091767055223e-06, 'epoch': 0.22} 22%|██▏ | 2739/12313 [2:02:49<7:01:11, 2.64s/it] 22%|██▏ | 2740/12313 [2:02:52<6:55:03, 2.60s/it] {'loss': 0.6785, 'grad_norm': 5.52528756591483, 'learning_rate': 4.5297079043382566e-06, 'epoch': 0.22} 22%|██▏ | 2740/12313 [2:02:52<6:55:03, 2.60s/it] 22%|██▏ | 2741/12313 [2:02:54<6:52:25, 2.59s/it] {'loss': 0.4531, 'grad_norm': 5.752497762609521, 'learning_rate': 4.529323901176245e-06, 'epoch': 0.22} 22%|██▏ | 2741/12313 [2:02:54<6:52:25, 2.59s/it] 22%|██▏ | 2742/12313 [2:02:57<6:59:35, 2.63s/it] {'loss': 0.7052, 'grad_norm': 3.092790657666851, 'learning_rate': 4.52893975759576e-06, 'epoch': 0.22} 22%|██▏ | 2742/12313 [2:02:57<6:59:35, 2.63s/it] 22%|██▏ | 2743/12313 [2:02:59<6:54:20, 2.60s/it] {'loss': 0.5464, 'grad_norm': 4.559669547606855, 'learning_rate': 4.528555473623381e-06, 'epoch': 0.22} 22%|██▏ | 2743/12313 [2:02:59<6:54:20, 2.60s/it] 22%|██▏ | 2744/12313 [2:03:02<6:59:06, 2.63s/it] {'loss': 0.6876, 'grad_norm': 3.5783978557136127, 'learning_rate': 4.5281710492857e-06, 'epoch': 0.22} 22%|██▏ | 2744/12313 [2:03:02<6:59:06, 2.63s/it] 22%|██▏ | 2745/12313 [2:03:05<7:03:37, 2.66s/it] {'loss': 0.564, 'grad_norm': 4.328813594256833, 'learning_rate': 4.527786484609316e-06, 'epoch': 0.22} 22%|██▏ | 2745/12313 [2:03:05<7:03:37, 2.66s/it] 22%|██▏ | 2746/12313 [2:03:08<7:08:33, 2.69s/it] {'loss': 0.5783, 'grad_norm': 3.952478916018802, 'learning_rate': 4.52740177962084e-06, 'epoch': 0.22} 22%|██▏ | 2746/12313 [2:03:08<7:08:33, 2.69s/it] 22%|██▏ | 2747/12313 [2:03:10<7:05:03, 2.67s/it] {'loss': 0.7729, 'grad_norm': 4.924032799834654, 'learning_rate': 4.52701693434689e-06, 'epoch': 0.22} 22%|██▏ | 2747/12313 [2:03:10<7:05:03, 2.67s/it] 22%|██▏ | 2748/12313 [2:03:13<7:00:08, 2.64s/it] {'loss': 0.5408, 'grad_norm': 5.139762166088582, 'learning_rate': 4.526631948814096e-06, 'epoch': 0.22} 22%|██▏ | 2748/12313 [2:03:13<7:00:08, 2.64s/it] 22%|██▏ | 2749/12313 [2:03:15<6:59:02, 2.63s/it] {'loss': 0.6876, 'grad_norm': 4.468079919633165, 'learning_rate': 4.5262468230490975e-06, 'epoch': 0.22} 22%|██▏ | 2749/12313 [2:03:15<6:59:02, 2.63s/it] 22%|██▏ | 2750/12313 [2:03:18<7:00:48, 2.64s/it] {'loss': 0.7465, 'grad_norm': 4.463738118530915, 'learning_rate': 4.525861557078542e-06, 'epoch': 0.22} 22%|██▏ | 2750/12313 [2:03:18<7:00:48, 2.64s/it] 22%|██▏ | 2751/12313 [2:03:21<7:13:33, 2.72s/it] {'loss': 0.5134, 'grad_norm': 7.852415705301734, 'learning_rate': 4.525476150929089e-06, 'epoch': 0.22} 22%|██▏ | 2751/12313 [2:03:21<7:13:33, 2.72s/it] 22%|██▏ | 2752/12313 [2:03:23<7:04:04, 2.66s/it] {'loss': 0.5476, 'grad_norm': 4.963966987188751, 'learning_rate': 4.525090604627406e-06, 'epoch': 0.22} 22%|██▏ | 2752/12313 [2:03:23<7:04:04, 2.66s/it] 22%|██▏ | 2753/12313 [2:03:26<6:52:20, 2.59s/it] {'loss': 0.4523, 'grad_norm': 6.07590271752492, 'learning_rate': 4.52470491820017e-06, 'epoch': 0.22} 22%|██▏ | 2753/12313 [2:03:26<6:52:20, 2.59s/it] 22%|██▏ | 2754/12313 [2:03:29<6:58:00, 2.62s/it] {'loss': 0.6684, 'grad_norm': 4.302538032685536, 'learning_rate': 4.52431909167407e-06, 'epoch': 0.22} 22%|██▏ | 2754/12313 [2:03:29<6:58:00, 2.62s/it] 22%|██▏ | 2755/12313 [2:03:31<6:57:22, 2.62s/it] {'loss': 0.5804, 'grad_norm': 4.256570317991106, 'learning_rate': 4.5239331250758025e-06, 'epoch': 0.22} 22%|██▏ | 2755/12313 [2:03:31<6:57:22, 2.62s/it] 22%|██▏ | 2756/12313 [2:03:34<6:52:49, 2.59s/it] {'loss': 0.5361, 'grad_norm': 3.987772037631392, 'learning_rate': 4.523547018432074e-06, 'epoch': 0.22} 22%|██▏ | 2756/12313 [2:03:34<6:52:49, 2.59s/it] 22%|██▏ | 2757/12313 [2:03:36<6:48:14, 2.56s/it] {'loss': 0.5403, 'grad_norm': 4.25893742048742, 'learning_rate': 4.523160771769602e-06, 'epoch': 0.22} 22%|██▏ | 2757/12313 [2:03:36<6:48:14, 2.56s/it] 22%|██▏ | 2758/12313 [2:03:39<7:11:15, 2.71s/it] {'loss': 0.6913, 'grad_norm': 4.754148515001833, 'learning_rate': 4.52277438511511e-06, 'epoch': 0.22} 22%|██▏ | 2758/12313 [2:03:39<7:11:15, 2.71s/it] 22%|██▏ | 2759/12313 [2:03:42<7:08:39, 2.69s/it] {'loss': 0.4877, 'grad_norm': 11.158110474612203, 'learning_rate': 4.522387858495337e-06, 'epoch': 0.22} 22%|██▏ | 2759/12313 [2:03:42<7:08:39, 2.69s/it] 22%|██▏ | 2760/12313 [2:03:45<7:16:05, 2.74s/it] {'loss': 0.4932, 'grad_norm': 5.122504334003654, 'learning_rate': 4.522001191937028e-06, 'epoch': 0.22} 22%|██▏ | 2760/12313 [2:03:45<7:16:05, 2.74s/it] 22%|██▏ | 2761/12313 [2:03:48<7:17:50, 2.75s/it] {'loss': 0.5527, 'grad_norm': 4.861511116519465, 'learning_rate': 4.521614385466938e-06, 'epoch': 0.22} 22%|██▏ | 2761/12313 [2:03:48<7:17:50, 2.75s/it] 22%|██▏ | 2762/12313 [2:03:50<7:23:15, 2.78s/it] {'loss': 0.7121, 'grad_norm': 3.6596485107845003, 'learning_rate': 4.521227439111831e-06, 'epoch': 0.22} 22%|██▏ | 2762/12313 [2:03:50<7:23:15, 2.78s/it] 22%|██▏ | 2763/12313 [2:03:53<7:15:38, 2.74s/it] {'loss': 0.5672, 'grad_norm': 4.522445266045689, 'learning_rate': 4.520840352898483e-06, 'epoch': 0.22} 22%|██▏ | 2763/12313 [2:03:53<7:15:38, 2.74s/it] 22%|██▏ | 2764/12313 [2:03:56<7:16:18, 2.74s/it] {'loss': 0.5862, 'grad_norm': 7.737848493782743, 'learning_rate': 4.520453126853677e-06, 'epoch': 0.22} 22%|██▏ | 2764/12313 [2:03:56<7:16:18, 2.74s/it] 22%|██▏ | 2765/12313 [2:03:58<7:00:41, 2.64s/it] {'loss': 0.5703, 'grad_norm': 6.141302741029408, 'learning_rate': 4.520065761004209e-06, 'epoch': 0.22} 22%|██▏ | 2765/12313 [2:03:58<7:00:41, 2.64s/it] 22%|██▏ | 2766/12313 [2:04:01<6:56:04, 2.61s/it] {'loss': 0.5038, 'grad_norm': 6.569966061095918, 'learning_rate': 4.51967825537688e-06, 'epoch': 0.22} 22%|██▏ | 2766/12313 [2:04:01<6:56:04, 2.61s/it] 22%|██▏ | 2767/12313 [2:04:03<6:56:34, 2.62s/it] {'loss': 0.5216, 'grad_norm': 5.8858674844088235, 'learning_rate': 4.5192906099985055e-06, 'epoch': 0.22} 22%|██▏ | 2767/12313 [2:04:03<6:56:34, 2.62s/it] 22%|██▏ | 2768/12313 [2:04:06<7:09:40, 2.70s/it] {'loss': 0.4604, 'grad_norm': 4.116267752952551, 'learning_rate': 4.518902824895908e-06, 'epoch': 0.22} 22%|██▏ | 2768/12313 [2:04:06<7:09:40, 2.70s/it] 22%|██▏ | 2769/12313 [2:04:09<7:03:15, 2.66s/it] {'loss': 0.466, 'grad_norm': 6.293436608006783, 'learning_rate': 4.518514900095919e-06, 'epoch': 0.22} 22%|██▏ | 2769/12313 [2:04:09<7:03:15, 2.66s/it] 22%|██▏ | 2770/12313 [2:04:12<7:04:17, 2.67s/it] {'loss': 0.5357, 'grad_norm': 8.339236282987327, 'learning_rate': 4.518126835625382e-06, 'epoch': 0.22} 22%|██▏ | 2770/12313 [2:04:12<7:04:17, 2.67s/it] 23%|██▎ | 2771/12313 [2:04:14<7:10:47, 2.71s/it] {'loss': 0.6255, 'grad_norm': 4.726236800961924, 'learning_rate': 4.51773863151115e-06, 'epoch': 0.23} 23%|██▎ | 2771/12313 [2:04:14<7:10:47, 2.71s/it] 23%|██▎ | 2772/12313 [2:04:17<7:08:07, 2.69s/it] {'loss': 0.553, 'grad_norm': 4.1928620515740365, 'learning_rate': 4.517350287780081e-06, 'epoch': 0.23} 23%|██▎ | 2772/12313 [2:04:17<7:08:07, 2.69s/it] 23%|██▎ | 2773/12313 [2:04:20<7:04:55, 2.67s/it] {'loss': 0.4875, 'grad_norm': 6.445972691759003, 'learning_rate': 4.51696180445905e-06, 'epoch': 0.23} 23%|██▎ | 2773/12313 [2:04:20<7:04:55, 2.67s/it] 23%|██▎ | 2774/12313 [2:04:23<7:23:40, 2.79s/it] {'loss': 0.5604, 'grad_norm': 14.794833956873413, 'learning_rate': 4.516573181574937e-06, 'epoch': 0.23} 23%|██▎ | 2774/12313 [2:04:23<7:23:40, 2.79s/it] 23%|██▎ | 2775/12313 [2:04:25<7:17:48, 2.75s/it] {'loss': 0.5572, 'grad_norm': 5.562154003766993, 'learning_rate': 4.516184419154633e-06, 'epoch': 0.23} 23%|██▎ | 2775/12313 [2:04:25<7:17:48, 2.75s/it] 23%|██▎ | 2776/12313 [2:04:28<7:11:06, 2.71s/it] {'loss': 0.617, 'grad_norm': 3.647681045658577, 'learning_rate': 4.515795517225037e-06, 'epoch': 0.23} 23%|██▎ | 2776/12313 [2:04:28<7:11:06, 2.71s/it] 23%|██▎ | 2777/12313 [2:04:31<7:06:58, 2.69s/it] {'loss': 0.412, 'grad_norm': 5.695037511735684, 'learning_rate': 4.51540647581306e-06, 'epoch': 0.23} 23%|██▎ | 2777/12313 [2:04:31<7:06:58, 2.69s/it] 23%|██▎ | 2778/12313 [2:04:33<6:55:17, 2.61s/it] {'loss': 0.6703, 'grad_norm': 4.475228081800305, 'learning_rate': 4.51501729494562e-06, 'epoch': 0.23} 23%|██▎ | 2778/12313 [2:04:33<6:55:17, 2.61s/it] 23%|██▎ | 2779/12313 [2:04:36<7:01:03, 2.65s/it] {'loss': 0.6964, 'grad_norm': 4.611656081080196, 'learning_rate': 4.514627974649649e-06, 'epoch': 0.23} 23%|██▎ | 2779/12313 [2:04:36<7:01:03, 2.65s/it] 23%|██▎ | 2780/12313 [2:04:38<7:01:35, 2.65s/it] {'loss': 0.486, 'grad_norm': 3.968385112319059, 'learning_rate': 4.514238514952084e-06, 'epoch': 0.23} 23%|██▎ | 2780/12313 [2:04:38<7:01:35, 2.65s/it] 23%|██▎ | 2781/12313 [2:04:41<6:59:53, 2.64s/it] {'loss': 0.501, 'grad_norm': 4.069304366874197, 'learning_rate': 4.513848915879874e-06, 'epoch': 0.23} 23%|██▎ | 2781/12313 [2:04:41<6:59:53, 2.64s/it] 23%|██▎ | 2782/12313 [2:04:44<7:04:34, 2.67s/it] {'loss': 0.6377, 'grad_norm': 5.496815740257597, 'learning_rate': 4.513459177459977e-06, 'epoch': 0.23} 23%|██▎ | 2782/12313 [2:04:44<7:04:34, 2.67s/it] 23%|██▎ | 2783/12313 [2:04:46<7:00:13, 2.65s/it] {'loss': 0.5332, 'grad_norm': 6.857579000068127, 'learning_rate': 4.513069299719361e-06, 'epoch': 0.23} 23%|██▎ | 2783/12313 [2:04:46<7:00:13, 2.65s/it] 23%|██▎ | 2784/12313 [2:04:49<7:04:14, 2.67s/it] {'loss': 0.7389, 'grad_norm': 3.9422493886680354, 'learning_rate': 4.512679282685003e-06, 'epoch': 0.23} 23%|██▎ | 2784/12313 [2:04:49<7:04:14, 2.67s/it] 23%|██▎ | 2785/12313 [2:04:52<7:16:34, 2.75s/it] {'loss': 0.4416, 'grad_norm': 3.4503956109068135, 'learning_rate': 4.512289126383892e-06, 'epoch': 0.23} 23%|██▎ | 2785/12313 [2:04:52<7:16:34, 2.75s/it] 23%|██▎ | 2786/12313 [2:04:55<7:12:38, 2.72s/it] {'loss': 0.5942, 'grad_norm': 4.6592622473563665, 'learning_rate': 4.511898830843022e-06, 'epoch': 0.23} 23%|██▎ | 2786/12313 [2:04:55<7:12:38, 2.72s/it] 23%|██▎ | 2787/12313 [2:04:57<7:09:59, 2.71s/it] {'loss': 0.5971, 'grad_norm': 5.905176155853469, 'learning_rate': 4.511508396089401e-06, 'epoch': 0.23} 23%|██▎ | 2787/12313 [2:04:57<7:09:59, 2.71s/it] 23%|██▎ | 2788/12313 [2:05:00<7:07:29, 2.69s/it] {'loss': 0.5056, 'grad_norm': 3.3680130138338744, 'learning_rate': 4.5111178221500455e-06, 'epoch': 0.23} 23%|██▎ | 2788/12313 [2:05:00<7:07:29, 2.69s/it] 23%|██▎ | 2789/12313 [2:05:03<7:02:25, 2.66s/it] {'loss': 0.623, 'grad_norm': 3.6474870201796006, 'learning_rate': 4.51072710905198e-06, 'epoch': 0.23} 23%|██▎ | 2789/12313 [2:05:03<7:02:25, 2.66s/it] 23%|██▎ | 2790/12313 [2:05:05<7:06:00, 2.68s/it] {'loss': 0.5094, 'grad_norm': 5.380202578661765, 'learning_rate': 4.5103362568222395e-06, 'epoch': 0.23} 23%|██▎ | 2790/12313 [2:05:05<7:06:00, 2.68s/it] 23%|██▎ | 2791/12313 [2:05:08<7:04:37, 2.68s/it] {'loss': 0.5929, 'grad_norm': 3.51083847592959, 'learning_rate': 4.509945265487871e-06, 'epoch': 0.23} 23%|██▎ | 2791/12313 [2:05:08<7:04:37, 2.68s/it] 23%|██▎ | 2792/12313 [2:05:11<7:06:49, 2.69s/it] {'loss': 0.5545, 'grad_norm': 4.91331163188237, 'learning_rate': 4.5095541350759265e-06, 'epoch': 0.23} 23%|██▎ | 2792/12313 [2:05:11<7:06:49, 2.69s/it] 23%|██▎ | 2793/12313 [2:05:13<7:06:16, 2.69s/it] {'loss': 0.5104, 'grad_norm': 3.869643139237871, 'learning_rate': 4.5091628656134715e-06, 'epoch': 0.23} 23%|██▎ | 2793/12313 [2:05:13<7:06:16, 2.69s/it] 23%|██▎ | 2794/12313 [2:05:16<7:17:24, 2.76s/it] {'loss': 0.4783, 'grad_norm': 11.289546788075414, 'learning_rate': 4.508771457127579e-06, 'epoch': 0.23} 23%|██▎ | 2794/12313 [2:05:16<7:17:24, 2.76s/it] 23%|██▎ | 2795/12313 [2:05:19<7:05:52, 2.68s/it] {'loss': 0.6242, 'grad_norm': 4.701834247380548, 'learning_rate': 4.508379909645334e-06, 'epoch': 0.23} 23%|██▎ | 2795/12313 [2:05:19<7:05:52, 2.68s/it] 23%|██▎ | 2796/12313 [2:05:22<7:06:11, 2.69s/it] {'loss': 0.6682, 'grad_norm': 3.8081500149721608, 'learning_rate': 4.5079882231938274e-06, 'epoch': 0.23} 23%|██▎ | 2796/12313 [2:05:22<7:06:11, 2.69s/it] 23%|██▎ | 2797/12313 [2:05:24<7:00:42, 2.65s/it] {'loss': 0.5618, 'grad_norm': 8.196015374000737, 'learning_rate': 4.5075963978001634e-06, 'epoch': 0.23} 23%|██▎ | 2797/12313 [2:05:24<7:00:42, 2.65s/it] 23%|██▎ | 2798/12313 [2:05:27<7:03:03, 2.67s/it] {'loss': 0.4528, 'grad_norm': 3.4819053096272743, 'learning_rate': 4.5072044334914546e-06, 'epoch': 0.23} 23%|██▎ | 2798/12313 [2:05:27<7:03:03, 2.67s/it] 23%|██▎ | 2799/12313 [2:05:30<7:05:25, 2.68s/it] {'loss': 0.5095, 'grad_norm': 5.156427328355886, 'learning_rate': 4.506812330294821e-06, 'epoch': 0.23} 23%|██▎ | 2799/12313 [2:05:30<7:05:25, 2.68s/it] 23%|██▎ | 2800/12313 [2:05:32<7:03:49, 2.67s/it] {'loss': 0.6707, 'grad_norm': 3.7634245699967477, 'learning_rate': 4.506420088237395e-06, 'epoch': 0.23} 23%|██▎ | 2800/12313 [2:05:32<7:03:49, 2.67s/it] 23%|██▎ | 2801/12313 [2:05:35<7:26:51, 2.82s/it] {'loss': 0.566, 'grad_norm': 3.431587308735926, 'learning_rate': 4.5060277073463174e-06, 'epoch': 0.23} 23%|██▎ | 2801/12313 [2:05:35<7:26:51, 2.82s/it] 23%|██▎ | 2802/12313 [2:05:38<7:24:18, 2.80s/it] {'loss': 0.7123, 'grad_norm': 4.437013694098261, 'learning_rate': 4.50563518764874e-06, 'epoch': 0.23} 23%|██▎ | 2802/12313 [2:05:38<7:24:18, 2.80s/it] 23%|██▎ | 2803/12313 [2:05:41<7:31:56, 2.85s/it] {'loss': 0.5152, 'grad_norm': 4.567183665178771, 'learning_rate': 4.505242529171822e-06, 'epoch': 0.23} 23%|██▎ | 2803/12313 [2:05:41<7:31:56, 2.85s/it] 23%|██▎ | 2804/12313 [2:05:44<7:17:50, 2.76s/it] {'loss': 0.5201, 'grad_norm': 3.9017580026940664, 'learning_rate': 4.504849731942734e-06, 'epoch': 0.23} 23%|██▎ | 2804/12313 [2:05:44<7:17:50, 2.76s/it] 23%|██▎ | 2805/12313 [2:05:46<7:14:53, 2.74s/it] {'loss': 0.6886, 'grad_norm': 4.536121689617816, 'learning_rate': 4.504456795988654e-06, 'epoch': 0.23} 23%|██▎ | 2805/12313 [2:05:46<7:14:53, 2.74s/it] 23%|██▎ | 2806/12313 [2:05:49<7:13:02, 2.73s/it] {'loss': 0.5154, 'grad_norm': 10.84869509304053, 'learning_rate': 4.504063721336773e-06, 'epoch': 0.23} 23%|██▎ | 2806/12313 [2:05:49<7:13:02, 2.73s/it] 23%|██▎ | 2807/12313 [2:05:51<6:58:43, 2.64s/it] {'loss': 0.5609, 'grad_norm': 5.404245994389873, 'learning_rate': 4.503670508014289e-06, 'epoch': 0.23} 23%|██▎ | 2807/12313 [2:05:51<6:58:43, 2.64s/it] 23%|██▎ | 2808/12313 [2:05:54<7:01:09, 2.66s/it] {'loss': 0.523, 'grad_norm': 6.579796252814421, 'learning_rate': 4.50327715604841e-06, 'epoch': 0.23} 23%|██▎ | 2808/12313 [2:05:54<7:01:09, 2.66s/it] 23%|██▎ | 2809/12313 [2:05:57<6:54:09, 2.61s/it] {'loss': 0.5637, 'grad_norm': 3.973362927458198, 'learning_rate': 4.5028836654663535e-06, 'epoch': 0.23} 23%|██▎ | 2809/12313 [2:05:57<6:54:09, 2.61s/it] 23%|██▎ | 2810/12313 [2:05:59<6:44:56, 2.56s/it] {'loss': 0.5813, 'grad_norm': 5.793419725047325, 'learning_rate': 4.502490036295348e-06, 'epoch': 0.23} 23%|██▎ | 2810/12313 [2:05:59<6:44:56, 2.56s/it] 23%|██▎ | 2811/12313 [2:06:02<6:43:27, 2.55s/it] {'loss': 0.3738, 'grad_norm': 4.201946994996218, 'learning_rate': 4.50209626856263e-06, 'epoch': 0.23} 23%|██▎ | 2811/12313 [2:06:02<6:43:27, 2.55s/it] 23%|██▎ | 2812/12313 [2:06:04<6:46:47, 2.57s/it] {'loss': 0.5465, 'grad_norm': 4.182121020616893, 'learning_rate': 4.501702362295446e-06, 'epoch': 0.23} 23%|██▎ | 2812/12313 [2:06:04<6:46:47, 2.57s/it] 23%|██▎ | 2813/12313 [2:06:07<7:03:04, 2.67s/it] {'loss': 0.5189, 'grad_norm': 3.5727363113753645, 'learning_rate': 4.501308317521052e-06, 'epoch': 0.23} 23%|██▎ | 2813/12313 [2:06:07<7:03:04, 2.67s/it] 23%|██▎ | 2814/12313 [2:06:10<7:05:39, 2.69s/it] {'loss': 0.8021, 'grad_norm': 5.356568404800836, 'learning_rate': 4.500914134266715e-06, 'epoch': 0.23} 23%|██▎ | 2814/12313 [2:06:10<7:05:39, 2.69s/it] 23%|██▎ | 2815/12313 [2:06:12<7:01:07, 2.66s/it] {'loss': 0.583, 'grad_norm': 4.550631173863838, 'learning_rate': 4.500519812559709e-06, 'epoch': 0.23} 23%|██▎ | 2815/12313 [2:06:12<7:01:07, 2.66s/it] 23%|██▎ | 2816/12313 [2:06:15<7:13:30, 2.74s/it] {'loss': 0.4888, 'grad_norm': 4.328142108387039, 'learning_rate': 4.50012535242732e-06, 'epoch': 0.23} 23%|██▎ | 2816/12313 [2:06:15<7:13:30, 2.74s/it] 23%|██▎ | 2817/12313 [2:06:18<7:02:33, 2.67s/it] {'loss': 0.4892, 'grad_norm': 4.774451426228126, 'learning_rate': 4.499730753896841e-06, 'epoch': 0.23} 23%|██▎ | 2817/12313 [2:06:18<7:02:33, 2.67s/it] 23%|██▎ | 2818/12313 [2:06:21<7:11:13, 2.72s/it] {'loss': 0.4906, 'grad_norm': 4.5207191018836355, 'learning_rate': 4.4993360169955784e-06, 'epoch': 0.23} 23%|██▎ | 2818/12313 [2:06:21<7:11:13, 2.72s/it] 23%|██▎ | 2819/12313 [2:06:24<7:24:21, 2.81s/it] {'loss': 0.6733, 'grad_norm': 3.12349199747274, 'learning_rate': 4.498941141750845e-06, 'epoch': 0.23} 23%|██▎ | 2819/12313 [2:06:24<7:24:21, 2.81s/it] 23%|██▎ | 2820/12313 [2:06:26<7:15:14, 2.75s/it] {'loss': 0.5263, 'grad_norm': 3.911794379996745, 'learning_rate': 4.498546128189963e-06, 'epoch': 0.23} 23%|██▎ | 2820/12313 [2:06:26<7:15:14, 2.75s/it] 23%|██▎ | 2821/12313 [2:06:29<7:07:38, 2.70s/it] {'loss': 0.5766, 'grad_norm': 5.139451559263501, 'learning_rate': 4.498150976340266e-06, 'epoch': 0.23} 23%|██▎ | 2821/12313 [2:06:29<7:07:38, 2.70s/it] 23%|██▎ | 2822/12313 [2:06:32<7:18:50, 2.77s/it] {'loss': 0.529, 'grad_norm': 4.27868628115041, 'learning_rate': 4.497755686229097e-06, 'epoch': 0.23} 23%|██▎ | 2822/12313 [2:06:32<7:18:50, 2.77s/it] 23%|██▎ | 2823/12313 [2:06:35<7:11:28, 2.73s/it] {'loss': 0.5311, 'grad_norm': 3.9641320918354372, 'learning_rate': 4.497360257883808e-06, 'epoch': 0.23} 23%|██▎ | 2823/12313 [2:06:35<7:11:28, 2.73s/it] 23%|██▎ | 2824/12313 [2:06:37<6:59:38, 2.65s/it] {'loss': 0.5227, 'grad_norm': 3.5108973371563055, 'learning_rate': 4.496964691331759e-06, 'epoch': 0.23} 23%|██▎ | 2824/12313 [2:06:37<6:59:38, 2.65s/it] 23%|██▎ | 2825/12313 [2:06:40<6:59:40, 2.65s/it] {'loss': 0.9042, 'grad_norm': 5.403815218671819, 'learning_rate': 4.496568986600323e-06, 'epoch': 0.23} 23%|██▎ | 2825/12313 [2:06:40<6:59:40, 2.65s/it] 23%|██▎ | 2826/12313 [2:06:42<6:53:05, 2.61s/it] {'loss': 0.7359, 'grad_norm': 4.12742663870129, 'learning_rate': 4.4961731437168795e-06, 'epoch': 0.23} 23%|██▎ | 2826/12313 [2:06:42<6:53:05, 2.61s/it] 23%|██▎ | 2827/12313 [2:06:46<7:40:11, 2.91s/it] {'loss': 0.6484, 'grad_norm': 3.568114706661449, 'learning_rate': 4.4957771627088185e-06, 'epoch': 0.23} 23%|██▎ | 2827/12313 [2:06:46<7:40:11, 2.91s/it] 23%|██▎ | 2828/12313 [2:06:49<7:31:52, 2.86s/it] {'loss': 0.6022, 'grad_norm': 5.842995081486275, 'learning_rate': 4.495381043603541e-06, 'epoch': 0.23} 23%|██▎ | 2828/12313 [2:06:49<7:31:52, 2.86s/it] 23%|██▎ | 2829/12313 [2:06:52<7:56:23, 3.01s/it] {'loss': 0.6084, 'grad_norm': 5.602774830593073, 'learning_rate': 4.494984786428455e-06, 'epoch': 0.23} 23%|██▎ | 2829/12313 [2:06:52<7:56:23, 3.01s/it] 23%|██▎ | 2830/12313 [2:06:55<7:39:20, 2.91s/it] {'loss': 0.5428, 'grad_norm': 5.779320891496907, 'learning_rate': 4.494588391210981e-06, 'epoch': 0.23} 23%|██▎ | 2830/12313 [2:06:55<7:39:20, 2.91s/it] 23%|██▎ | 2831/12313 [2:06:57<7:31:39, 2.86s/it] {'loss': 0.5494, 'grad_norm': 5.734619447546111, 'learning_rate': 4.494191857978546e-06, 'epoch': 0.23} 23%|██▎ | 2831/12313 [2:06:57<7:31:39, 2.86s/it] 23%|██▎ | 2832/12313 [2:07:00<7:20:23, 2.79s/it] {'loss': 0.6195, 'grad_norm': 3.540278577269194, 'learning_rate': 4.493795186758589e-06, 'epoch': 0.23} 23%|██▎ | 2832/12313 [2:07:00<7:20:23, 2.79s/it] 23%|██▎ | 2833/12313 [2:07:02<7:04:12, 2.68s/it] {'loss': 0.5911, 'grad_norm': 5.458025507421001, 'learning_rate': 4.493398377578557e-06, 'epoch': 0.23} 23%|██▎ | 2833/12313 [2:07:02<7:04:12, 2.68s/it] 23%|██▎ | 2834/12313 [2:07:05<7:10:38, 2.73s/it] {'loss': 0.6099, 'grad_norm': 3.0958255282858596, 'learning_rate': 4.4930014304659066e-06, 'epoch': 0.23} 23%|██▎ | 2834/12313 [2:07:05<7:10:38, 2.73s/it] 23%|██▎ | 2835/12313 [2:07:08<7:16:04, 2.76s/it] {'loss': 0.5688, 'grad_norm': 4.075827299092992, 'learning_rate': 4.492604345448106e-06, 'epoch': 0.23} 23%|██▎ | 2835/12313 [2:07:08<7:16:04, 2.76s/it] 23%|██▎ | 2836/12313 [2:07:11<7:07:12, 2.70s/it] {'loss': 0.6, 'grad_norm': 5.7956480372939065, 'learning_rate': 4.492207122552629e-06, 'epoch': 0.23} 23%|██▎ | 2836/12313 [2:07:11<7:07:12, 2.70s/it] 23%|██▎ | 2837/12313 [2:07:13<7:01:03, 2.67s/it] {'loss': 0.5496, 'grad_norm': 5.13610708333408, 'learning_rate': 4.491809761806964e-06, 'epoch': 0.23} 23%|██▎ | 2837/12313 [2:07:13<7:01:03, 2.67s/it] 23%|██▎ | 2838/12313 [2:07:16<7:00:35, 2.66s/it] {'loss': 0.4669, 'grad_norm': 5.910698565673017, 'learning_rate': 4.491412263238605e-06, 'epoch': 0.23} 23%|██▎ | 2838/12313 [2:07:16<7:00:35, 2.66s/it] 23%|██▎ | 2839/12313 [2:07:19<7:11:03, 2.73s/it] {'loss': 0.5895, 'grad_norm': 3.4340630962930305, 'learning_rate': 4.4910146268750555e-06, 'epoch': 0.23} 23%|██▎ | 2839/12313 [2:07:19<7:11:03, 2.73s/it] 23%|██▎ | 2840/12313 [2:07:22<7:15:07, 2.76s/it] {'loss': 0.5887, 'grad_norm': 4.029802742225141, 'learning_rate': 4.490616852743832e-06, 'epoch': 0.23} 23%|██▎ | 2840/12313 [2:07:22<7:15:07, 2.76s/it] 23%|██▎ | 2841/12313 [2:07:24<7:18:29, 2.78s/it] {'loss': 0.6715, 'grad_norm': 5.37046614141328, 'learning_rate': 4.490218940872457e-06, 'epoch': 0.23} 23%|██▎ | 2841/12313 [2:07:24<7:18:29, 2.78s/it] 23%|██▎ | 2842/12313 [2:07:27<7:11:35, 2.73s/it] {'loss': 0.6401, 'grad_norm': 6.574976668750347, 'learning_rate': 4.489820891288466e-06, 'epoch': 0.23} 23%|██▎ | 2842/12313 [2:07:27<7:11:35, 2.73s/it] 23%|██▎ | 2843/12313 [2:07:30<7:11:27, 2.73s/it] {'loss': 0.6287, 'grad_norm': 4.12936501875235, 'learning_rate': 4.489422704019399e-06, 'epoch': 0.23} 23%|██▎ | 2843/12313 [2:07:30<7:11:27, 2.73s/it] 23%|██▎ | 2844/12313 [2:07:33<7:12:17, 2.74s/it] {'loss': 0.5193, 'grad_norm': 4.078499752253989, 'learning_rate': 4.489024379092809e-06, 'epoch': 0.23} 23%|██▎ | 2844/12313 [2:07:33<7:12:17, 2.74s/it] 23%|██▎ | 2845/12313 [2:07:35<7:04:48, 2.69s/it] {'loss': 0.415, 'grad_norm': 8.139115119863513, 'learning_rate': 4.48862591653626e-06, 'epoch': 0.23} 23%|██▎ | 2845/12313 [2:07:35<7:04:48, 2.69s/it] 23%|██▎ | 2846/12313 [2:07:38<6:58:57, 2.66s/it] {'loss': 0.5214, 'grad_norm': 6.159373760750005, 'learning_rate': 4.488227316377322e-06, 'epoch': 0.23} 23%|██▎ | 2846/12313 [2:07:38<6:58:57, 2.66s/it] 23%|██▎ | 2847/12313 [2:07:40<6:54:06, 2.62s/it] {'loss': 0.541, 'grad_norm': 3.4411255569736956, 'learning_rate': 4.487828578643576e-06, 'epoch': 0.23} 23%|██▎ | 2847/12313 [2:07:40<6:54:06, 2.62s/it] 23%|██▎ | 2848/12313 [2:07:43<7:22:48, 2.81s/it] {'loss': 0.6158, 'grad_norm': 4.121631308793716, 'learning_rate': 4.4874297033626126e-06, 'epoch': 0.23} 23%|██▎ | 2848/12313 [2:07:43<7:22:48, 2.81s/it] 23%|██▎ | 2849/12313 [2:07:46<7:17:18, 2.77s/it] {'loss': 0.5957, 'grad_norm': 3.8683262343429647, 'learning_rate': 4.487030690562032e-06, 'epoch': 0.23} 23%|██▎ | 2849/12313 [2:07:46<7:17:18, 2.77s/it] 23%|██▎ | 2850/12313 [2:07:49<7:15:02, 2.76s/it] {'loss': 0.5908, 'grad_norm': 4.36066324441673, 'learning_rate': 4.486631540269445e-06, 'epoch': 0.23} 23%|██▎ | 2850/12313 [2:07:49<7:15:02, 2.76s/it] 23%|██▎ | 2851/12313 [2:07:51<7:07:32, 2.71s/it] {'loss': 0.6196, 'grad_norm': 23.856582803005605, 'learning_rate': 4.486232252512468e-06, 'epoch': 0.23} 23%|██▎ | 2851/12313 [2:07:51<7:07:32, 2.71s/it] 23%|██▎ | 2852/12313 [2:07:54<7:05:50, 2.70s/it] {'loss': 0.517, 'grad_norm': 4.565500746372573, 'learning_rate': 4.485832827318733e-06, 'epoch': 0.23} 23%|██▎ | 2852/12313 [2:07:54<7:05:50, 2.70s/it] 23%|██▎ | 2853/12313 [2:07:57<7:05:32, 2.70s/it] {'loss': 0.7045, 'grad_norm': 5.415515197041984, 'learning_rate': 4.485433264715874e-06, 'epoch': 0.23} 23%|██▎ | 2853/12313 [2:07:57<7:05:32, 2.70s/it] 23%|██▎ | 2854/12313 [2:07:59<6:53:31, 2.62s/it] {'loss': 0.632, 'grad_norm': 7.611491735007953, 'learning_rate': 4.485033564731542e-06, 'epoch': 0.23} 23%|██▎ | 2854/12313 [2:07:59<6:53:31, 2.62s/it] 23%|██▎ | 2855/12313 [2:08:02<6:58:00, 2.65s/it] {'loss': 0.5768, 'grad_norm': 5.094950046339584, 'learning_rate': 4.484633727393393e-06, 'epoch': 0.23} 23%|██▎ | 2855/12313 [2:08:02<6:58:00, 2.65s/it] 23%|██▎ | 2856/12313 [2:08:04<6:47:57, 2.59s/it] {'loss': 0.5038, 'grad_norm': 46.65401837853409, 'learning_rate': 4.484233752729093e-06, 'epoch': 0.23} 23%|██▎ | 2856/12313 [2:08:04<6:47:57, 2.59s/it] 23%|██▎ | 2857/12313 [2:08:07<7:07:30, 2.71s/it] {'loss': 0.6252, 'grad_norm': 25.235913155436222, 'learning_rate': 4.483833640766319e-06, 'epoch': 0.23} 23%|██▎ | 2857/12313 [2:08:07<7:07:30, 2.71s/it] 23%|██▎ | 2858/12313 [2:08:10<7:08:04, 2.72s/it] {'loss': 0.6505, 'grad_norm': 7.36939234328555, 'learning_rate': 4.4834333915327564e-06, 'epoch': 0.23} 23%|██▎ | 2858/12313 [2:08:10<7:08:04, 2.72s/it] 23%|██▎ | 2859/12313 [2:08:13<7:11:53, 2.74s/it] {'loss': 0.5895, 'grad_norm': 4.830643690141342, 'learning_rate': 4.483033005056101e-06, 'epoch': 0.23} 23%|██▎ | 2859/12313 [2:08:13<7:11:53, 2.74s/it] 23%|██▎ | 2860/12313 [2:08:16<7:10:34, 2.73s/it] {'loss': 0.4614, 'grad_norm': 9.183888328065596, 'learning_rate': 4.482632481364055e-06, 'epoch': 0.23} 23%|██▎ | 2860/12313 [2:08:16<7:10:34, 2.73s/it] 23%|██▎ | 2861/12313 [2:08:18<7:12:06, 2.74s/it] {'loss': 0.6206, 'grad_norm': 4.613246918041221, 'learning_rate': 4.482231820484336e-06, 'epoch': 0.23} 23%|██▎ | 2861/12313 [2:08:18<7:12:06, 2.74s/it] 23%|██▎ | 2862/12313 [2:08:21<7:07:56, 2.72s/it] {'loss': 0.4812, 'grad_norm': 7.324346436954002, 'learning_rate': 4.4818310224446645e-06, 'epoch': 0.23} 23%|██▎ | 2862/12313 [2:08:21<7:07:56, 2.72s/it] 23%|██▎ | 2863/12313 [2:08:24<6:58:41, 2.66s/it] {'loss': 0.6606, 'grad_norm': 6.5981856748393755, 'learning_rate': 4.481430087272776e-06, 'epoch': 0.23} 23%|██▎ | 2863/12313 [2:08:24<6:58:41, 2.66s/it] 23%|██▎ | 2864/12313 [2:08:27<7:15:02, 2.76s/it] {'loss': 0.4754, 'grad_norm': 4.619941208113225, 'learning_rate': 4.481029014996412e-06, 'epoch': 0.23} 23%|██▎ | 2864/12313 [2:08:27<7:15:02, 2.76s/it] 23%|██▎ | 2865/12313 [2:08:29<7:05:23, 2.70s/it] {'loss': 0.5482, 'grad_norm': 5.7207154634398085, 'learning_rate': 4.480627805643324e-06, 'epoch': 0.23} 23%|██▎ | 2865/12313 [2:08:29<7:05:23, 2.70s/it] 23%|██▎ | 2866/12313 [2:08:32<7:04:24, 2.70s/it] {'loss': 0.6597, 'grad_norm': 7.298594910122886, 'learning_rate': 4.480226459241275e-06, 'epoch': 0.23} 23%|██▎ | 2866/12313 [2:08:32<7:04:24, 2.70s/it] 23%|██▎ | 2867/12313 [2:08:35<7:01:46, 2.68s/it] {'loss': 0.6121, 'grad_norm': 9.363398730309251, 'learning_rate': 4.479824975818034e-06, 'epoch': 0.23} 23%|██▎ | 2867/12313 [2:08:35<7:01:46, 2.68s/it] 23%|██▎ | 2868/12313 [2:08:37<7:00:00, 2.67s/it] {'loss': 0.5151, 'grad_norm': 4.781902048228461, 'learning_rate': 4.4794233554013835e-06, 'epoch': 0.23} 23%|██▎ | 2868/12313 [2:08:37<7:00:00, 2.67s/it] 23%|██▎ | 2869/12313 [2:08:40<6:59:38, 2.67s/it] {'loss': 0.5063, 'grad_norm': 2.949065575080919, 'learning_rate': 4.479021598019113e-06, 'epoch': 0.23} 23%|██▎ | 2869/12313 [2:08:40<6:59:38, 2.67s/it] 23%|██▎ | 2870/12313 [2:08:42<6:55:40, 2.64s/it] {'loss': 0.5932, 'grad_norm': 4.021888781896201, 'learning_rate': 4.4786197036990205e-06, 'epoch': 0.23} 23%|██▎ | 2870/12313 [2:08:42<6:55:40, 2.64s/it] 23%|██▎ | 2871/12313 [2:08:45<6:58:22, 2.66s/it] {'loss': 0.6553, 'grad_norm': 5.096647466230962, 'learning_rate': 4.478217672468918e-06, 'epoch': 0.23} 23%|██▎ | 2871/12313 [2:08:45<6:58:22, 2.66s/it] 23%|██▎ | 2872/12313 [2:08:48<6:59:36, 2.67s/it] {'loss': 0.7021, 'grad_norm': 3.3544437426357434, 'learning_rate': 4.47781550435662e-06, 'epoch': 0.23} 23%|██▎ | 2872/12313 [2:08:48<6:59:36, 2.67s/it] 23%|██▎ | 2873/12313 [2:08:51<7:06:08, 2.71s/it] {'loss': 0.5843, 'grad_norm': 3.955029448497719, 'learning_rate': 4.4774131993899585e-06, 'epoch': 0.23} 23%|██▎ | 2873/12313 [2:08:51<7:06:08, 2.71s/it] 23%|██▎ | 2874/12313 [2:08:53<7:00:54, 2.68s/it] {'loss': 0.5778, 'grad_norm': 5.718278060740205, 'learning_rate': 4.477010757596768e-06, 'epoch': 0.23} 23%|██▎ | 2874/12313 [2:08:53<7:00:54, 2.68s/it] 23%|██▎ | 2875/12313 [2:08:56<7:02:39, 2.69s/it] {'loss': 0.4537, 'grad_norm': 3.558212336992993, 'learning_rate': 4.4766081790048965e-06, 'epoch': 0.23} 23%|██▎ | 2875/12313 [2:08:56<7:02:39, 2.69s/it] 23%|██▎ | 2876/12313 [2:08:59<6:58:57, 2.66s/it] {'loss': 0.5913, 'grad_norm': 3.821349848295602, 'learning_rate': 4.4762054636422005e-06, 'epoch': 0.23} 23%|██▎ | 2876/12313 [2:08:59<6:58:57, 2.66s/it] 23%|██▎ | 2877/12313 [2:09:01<6:54:38, 2.64s/it] {'loss': 0.5516, 'grad_norm': 5.178508133946879, 'learning_rate': 4.475802611536545e-06, 'epoch': 0.23} 23%|██▎ | 2877/12313 [2:09:01<6:54:38, 2.64s/it] 23%|██▎ | 2878/12313 [2:09:04<7:08:42, 2.73s/it] {'loss': 0.5186, 'grad_norm': 4.64155733262824, 'learning_rate': 4.475399622715805e-06, 'epoch': 0.23} 23%|██▎ | 2878/12313 [2:09:04<7:08:42, 2.73s/it] 23%|██▎ | 2879/12313 [2:09:07<7:03:25, 2.69s/it] {'loss': 0.6616, 'grad_norm': 5.700083589065183, 'learning_rate': 4.474996497207866e-06, 'epoch': 0.23} 23%|██▎ | 2879/12313 [2:09:07<7:03:25, 2.69s/it] 23%|██▎ | 2880/12313 [2:09:09<6:58:04, 2.66s/it] {'loss': 0.6395, 'grad_norm': 5.357424859826312, 'learning_rate': 4.4745932350406225e-06, 'epoch': 0.23} 23%|██▎ | 2880/12313 [2:09:09<6:58:04, 2.66s/it] 23%|██▎ | 2881/12313 [2:09:12<7:13:07, 2.76s/it] {'loss': 0.5116, 'grad_norm': 5.301441690670492, 'learning_rate': 4.474189836241976e-06, 'epoch': 0.23} 23%|██▎ | 2881/12313 [2:09:12<7:13:07, 2.76s/it] 23%|██▎ | 2882/12313 [2:09:15<7:04:02, 2.70s/it] {'loss': 0.537, 'grad_norm': 3.7935528708576784, 'learning_rate': 4.473786300839843e-06, 'epoch': 0.23} 23%|██▎ | 2882/12313 [2:09:15<7:04:02, 2.70s/it] 23%|██▎ | 2883/12313 [2:09:17<6:56:27, 2.65s/it] {'loss': 0.4027, 'grad_norm': 4.6354063500549945, 'learning_rate': 4.4733826288621435e-06, 'epoch': 0.23} 23%|██▎ | 2883/12313 [2:09:17<6:56:27, 2.65s/it] 23%|██▎ | 2884/12313 [2:09:20<6:47:10, 2.59s/it] {'loss': 0.5345, 'grad_norm': 4.258268978553181, 'learning_rate': 4.47297882033681e-06, 'epoch': 0.23} 23%|██▎ | 2884/12313 [2:09:20<6:47:10, 2.59s/it] 23%|██▎ | 2885/12313 [2:09:22<6:47:24, 2.59s/it] {'loss': 0.3998, 'grad_norm': 4.645513866526441, 'learning_rate': 4.472574875291784e-06, 'epoch': 0.23} 23%|██▎ | 2885/12313 [2:09:22<6:47:24, 2.59s/it] 23%|██▎ | 2886/12313 [2:09:25<7:12:26, 2.75s/it] {'loss': 0.5404, 'grad_norm': 5.592577886479116, 'learning_rate': 4.472170793755016e-06, 'epoch': 0.23} 23%|██▎ | 2886/12313 [2:09:25<7:12:26, 2.75s/it] 23%|██▎ | 2887/12313 [2:09:28<7:04:15, 2.70s/it] {'loss': 0.4769, 'grad_norm': 4.864175858056895, 'learning_rate': 4.471766575754467e-06, 'epoch': 0.23} 23%|██▎ | 2887/12313 [2:09:28<7:04:15, 2.70s/it] 23%|██▎ | 2888/12313 [2:09:31<7:11:27, 2.75s/it] {'loss': 0.5522, 'grad_norm': 5.3559791822991984, 'learning_rate': 4.471362221318106e-06, 'epoch': 0.23} 23%|██▎ | 2888/12313 [2:09:31<7:11:27, 2.75s/it] 23%|██▎ | 2889/12313 [2:09:33<7:02:56, 2.69s/it] {'loss': 0.5913, 'grad_norm': 3.985841294995745, 'learning_rate': 4.470957730473913e-06, 'epoch': 0.23} 23%|██▎ | 2889/12313 [2:09:33<7:02:56, 2.69s/it] 23%|██▎ | 2890/12313 [2:09:36<6:58:08, 2.66s/it] {'loss': 0.5101, 'grad_norm': 7.9982149419628765, 'learning_rate': 4.470553103249876e-06, 'epoch': 0.23} 23%|██▎ | 2890/12313 [2:09:36<6:58:08, 2.66s/it] 23%|██▎ | 2891/12313 [2:09:39<6:58:49, 2.67s/it] {'loss': 0.5213, 'grad_norm': 3.748824385107628, 'learning_rate': 4.470148339673993e-06, 'epoch': 0.23} 23%|██▎ | 2891/12313 [2:09:39<6:58:49, 2.67s/it] 23%|██▎ | 2892/12313 [2:09:41<6:52:46, 2.63s/it] {'loss': 0.5209, 'grad_norm': 4.816022996866285, 'learning_rate': 4.469743439774272e-06, 'epoch': 0.23} 23%|██▎ | 2892/12313 [2:09:41<6:52:46, 2.63s/it] 23%|██▎ | 2893/12313 [2:09:44<6:54:23, 2.64s/it] {'loss': 0.5256, 'grad_norm': 5.863963017632638, 'learning_rate': 4.46933840357873e-06, 'epoch': 0.23} 23%|██▎ | 2893/12313 [2:09:44<6:54:23, 2.64s/it] 24%|██▎ | 2894/12313 [2:09:47<7:08:34, 2.73s/it] {'loss': 0.53, 'grad_norm': 4.876008648396625, 'learning_rate': 4.468933231115393e-06, 'epoch': 0.24} 24%|██▎ | 2894/12313 [2:09:47<7:08:34, 2.73s/it] 24%|██▎ | 2895/12313 [2:09:50<7:03:52, 2.70s/it] {'loss': 0.5812, 'grad_norm': 6.008481144166018, 'learning_rate': 4.468527922412297e-06, 'epoch': 0.24} 24%|██▎ | 2895/12313 [2:09:50<7:03:52, 2.70s/it] 24%|██▎ | 2896/12313 [2:09:52<6:59:15, 2.67s/it] {'loss': 0.5318, 'grad_norm': 6.418455057806858, 'learning_rate': 4.468122477497486e-06, 'epoch': 0.24} 24%|██▎ | 2896/12313 [2:09:52<6:59:15, 2.67s/it] 24%|██▎ | 2897/12313 [2:09:55<7:17:01, 2.78s/it] {'loss': 0.5948, 'grad_norm': 2.5240378044740415, 'learning_rate': 4.467716896399017e-06, 'epoch': 0.24} 24%|██▎ | 2897/12313 [2:09:55<7:17:01, 2.78s/it] 24%|██▎ | 2898/12313 [2:09:58<7:17:02, 2.79s/it] {'loss': 0.5233, 'grad_norm': 4.361122855517343, 'learning_rate': 4.4673111791449515e-06, 'epoch': 0.24} 24%|██▎ | 2898/12313 [2:09:58<7:17:02, 2.79s/it] 24%|██▎ | 2899/12313 [2:10:01<7:06:28, 2.72s/it] {'loss': 0.6362, 'grad_norm': 4.528523763290337, 'learning_rate': 4.466905325763365e-06, 'epoch': 0.24} 24%|██▎ | 2899/12313 [2:10:01<7:06:28, 2.72s/it] 24%|██▎ | 2900/12313 [2:10:04<7:22:13, 2.82s/it] {'loss': 0.5745, 'grad_norm': 6.538860668916424, 'learning_rate': 4.4664993362823394e-06, 'epoch': 0.24} 24%|██▎ | 2900/12313 [2:10:04<7:22:13, 2.82s/it] 24%|██▎ | 2901/12313 [2:10:06<7:20:16, 2.81s/it] {'loss': 0.6584, 'grad_norm': 5.847755183433358, 'learning_rate': 4.466093210729967e-06, 'epoch': 0.24} 24%|██▎ | 2901/12313 [2:10:06<7:20:16, 2.81s/it] 24%|██▎ | 2902/12313 [2:10:09<7:11:23, 2.75s/it] {'loss': 0.6396, 'grad_norm': 4.179474365670219, 'learning_rate': 4.465686949134351e-06, 'epoch': 0.24} 24%|██▎ | 2902/12313 [2:10:09<7:11:23, 2.75s/it] 24%|██▎ | 2903/12313 [2:10:12<7:03:15, 2.70s/it] {'loss': 0.5861, 'grad_norm': 5.774986352129162, 'learning_rate': 4.465280551523601e-06, 'epoch': 0.24} 24%|██▎ | 2903/12313 [2:10:12<7:03:15, 2.70s/it] 24%|██▎ | 2904/12313 [2:10:14<7:14:28, 2.77s/it] {'loss': 0.6337, 'grad_norm': 3.2478970012240325, 'learning_rate': 4.464874017925837e-06, 'epoch': 0.24} 24%|██▎ | 2904/12313 [2:10:14<7:14:28, 2.77s/it] 24%|██▎ | 2905/12313 [2:10:17<7:10:59, 2.75s/it] {'loss': 0.6407, 'grad_norm': 6.068452434980089, 'learning_rate': 4.46446734836919e-06, 'epoch': 0.24} 24%|██▎ | 2905/12313 [2:10:17<7:10:59, 2.75s/it] 24%|██▎ | 2906/12313 [2:10:20<6:56:19, 2.66s/it] {'loss': 0.6955, 'grad_norm': 3.2686213630532026, 'learning_rate': 4.4640605428818e-06, 'epoch': 0.24} 24%|██▎ | 2906/12313 [2:10:20<6:56:19, 2.66s/it] 24%|██▎ | 2907/12313 [2:10:22<6:49:25, 2.61s/it] {'loss': 0.4519, 'grad_norm': 4.07905119935594, 'learning_rate': 4.463653601491815e-06, 'epoch': 0.24} 24%|██▎ | 2907/12313 [2:10:22<6:49:25, 2.61s/it] 24%|██▎ | 2908/12313 [2:10:25<6:41:22, 2.56s/it] {'loss': 0.523, 'grad_norm': 6.169592515743247, 'learning_rate': 4.463246524227393e-06, 'epoch': 0.24} 24%|██▎ | 2908/12313 [2:10:25<6:41:22, 2.56s/it] 24%|██▎ | 2909/12313 [2:10:27<6:47:05, 2.60s/it] {'loss': 0.522, 'grad_norm': 4.321690950129896, 'learning_rate': 4.462839311116702e-06, 'epoch': 0.24} 24%|██▎ | 2909/12313 [2:10:27<6:47:05, 2.60s/it] 24%|██▎ | 2910/12313 [2:10:30<6:50:38, 2.62s/it] {'loss': 0.747, 'grad_norm': 7.237413918755258, 'learning_rate': 4.462431962187919e-06, 'epoch': 0.24} 24%|██▎ | 2910/12313 [2:10:30<6:50:38, 2.62s/it] 24%|██▎ | 2911/12313 [2:10:33<7:12:17, 2.76s/it] {'loss': 0.6299, 'grad_norm': 3.734173317333828, 'learning_rate': 4.46202447746923e-06, 'epoch': 0.24} 24%|██▎ | 2911/12313 [2:10:33<7:12:17, 2.76s/it] 24%|██▎ | 2912/12313 [2:10:35<6:58:10, 2.67s/it] {'loss': 0.6253, 'grad_norm': 5.880101867809025, 'learning_rate': 4.461616856988831e-06, 'epoch': 0.24} 24%|██▎ | 2912/12313 [2:10:35<6:58:10, 2.67s/it] 24%|██▎ | 2913/12313 [2:10:38<7:01:04, 2.69s/it] {'loss': 0.5456, 'grad_norm': 4.370253406800247, 'learning_rate': 4.461209100774928e-06, 'epoch': 0.24} 24%|██▎ | 2913/12313 [2:10:38<7:01:04, 2.69s/it] 24%|██▎ | 2914/12313 [2:10:41<6:56:20, 2.66s/it] {'loss': 0.427, 'grad_norm': 4.39926590709237, 'learning_rate': 4.460801208855734e-06, 'epoch': 0.24} 24%|██▎ | 2914/12313 [2:10:41<6:56:20, 2.66s/it] 24%|██▎ | 2915/12313 [2:10:43<6:57:50, 2.67s/it] {'loss': 0.7197, 'grad_norm': 6.280728294974258, 'learning_rate': 4.4603931812594735e-06, 'epoch': 0.24} 24%|██▎ | 2915/12313 [2:10:43<6:57:50, 2.67s/it] 24%|██▎ | 2916/12313 [2:10:46<6:59:08, 2.68s/it] {'loss': 0.4599, 'grad_norm': 6.982111134101197, 'learning_rate': 4.45998501801438e-06, 'epoch': 0.24} 24%|██▎ | 2916/12313 [2:10:46<6:59:08, 2.68s/it] 24%|██▎ | 2917/12313 [2:10:49<6:56:50, 2.66s/it] {'loss': 0.4797, 'grad_norm': 7.601832967298451, 'learning_rate': 4.459576719148697e-06, 'epoch': 0.24} 24%|██▎ | 2917/12313 [2:10:49<6:56:50, 2.66s/it] 24%|██▎ | 2918/12313 [2:10:51<6:54:53, 2.65s/it] {'loss': 0.5447, 'grad_norm': 5.8714956548009924, 'learning_rate': 4.459168284690676e-06, 'epoch': 0.24} 24%|██▎ | 2918/12313 [2:10:51<6:54:53, 2.65s/it] 24%|██▎ | 2919/12313 [2:10:54<6:51:54, 2.63s/it] {'loss': 0.6044, 'grad_norm': 7.1059975498172685, 'learning_rate': 4.458759714668578e-06, 'epoch': 0.24} 24%|██▎ | 2919/12313 [2:10:54<6:51:54, 2.63s/it] 24%|██▎ | 2920/12313 [2:10:57<7:01:12, 2.69s/it] {'loss': 0.5054, 'grad_norm': 4.612443635513808, 'learning_rate': 4.458351009110675e-06, 'epoch': 0.24} 24%|██▎ | 2920/12313 [2:10:57<7:01:12, 2.69s/it] 24%|██▎ | 2921/12313 [2:10:59<6:59:07, 2.68s/it] {'loss': 0.6243, 'grad_norm': 4.117158983616684, 'learning_rate': 4.457942168045246e-06, 'epoch': 0.24} 24%|██▎ | 2921/12313 [2:10:59<6:59:07, 2.68s/it] 24%|██▎ | 2922/12313 [2:11:02<6:58:54, 2.68s/it] {'loss': 0.6199, 'grad_norm': 4.243517625792392, 'learning_rate': 4.457533191500581e-06, 'epoch': 0.24} 24%|██▎ | 2922/12313 [2:11:02<6:58:54, 2.68s/it] 24%|██▎ | 2923/12313 [2:11:05<6:54:00, 2.65s/it] {'loss': 0.4745, 'grad_norm': 6.196728526337707, 'learning_rate': 4.45712407950498e-06, 'epoch': 0.24} 24%|██▎ | 2923/12313 [2:11:05<6:54:00, 2.65s/it] 24%|██▎ | 2924/12313 [2:11:07<6:57:34, 2.67s/it] {'loss': 0.5762, 'grad_norm': 4.089855406887254, 'learning_rate': 4.45671483208675e-06, 'epoch': 0.24} 24%|██▎ | 2924/12313 [2:11:07<6:57:34, 2.67s/it] 24%|██▍ | 2925/12313 [2:11:10<6:46:25, 2.60s/it] {'loss': 0.6403, 'grad_norm': 8.680736194237051, 'learning_rate': 4.45630544927421e-06, 'epoch': 0.24} 24%|██▍ | 2925/12313 [2:11:10<6:46:25, 2.60s/it] 24%|██▍ | 2926/12313 [2:11:12<6:42:40, 2.57s/it] {'loss': 0.5683, 'grad_norm': 3.4364871224606452, 'learning_rate': 4.4558959310956865e-06, 'epoch': 0.24} 24%|██▍ | 2926/12313 [2:11:12<6:42:40, 2.57s/it] 24%|██▍ | 2927/12313 [2:11:15<6:48:05, 2.61s/it] {'loss': 0.4727, 'grad_norm': 4.94476918130175, 'learning_rate': 4.4554862775795146e-06, 'epoch': 0.24} 24%|██▍ | 2927/12313 [2:11:15<6:48:05, 2.61s/it] 24%|██▍ | 2928/12313 [2:11:18<6:45:42, 2.59s/it] {'loss': 0.5719, 'grad_norm': 3.68514712087398, 'learning_rate': 4.455076488754043e-06, 'epoch': 0.24} 24%|██▍ | 2928/12313 [2:11:18<6:45:42, 2.59s/it] 24%|██▍ | 2929/12313 [2:11:20<6:42:29, 2.57s/it] {'loss': 0.575, 'grad_norm': 4.764481615544404, 'learning_rate': 4.4546665646476254e-06, 'epoch': 0.24} 24%|██▍ | 2929/12313 [2:11:20<6:42:29, 2.57s/it] 24%|██▍ | 2930/12313 [2:11:23<6:54:11, 2.65s/it] {'loss': 0.6751, 'grad_norm': 4.8777757605766405, 'learning_rate': 4.4542565052886256e-06, 'epoch': 0.24} 24%|██▍ | 2930/12313 [2:11:23<6:54:11, 2.65s/it] 24%|██▍ | 2931/12313 [2:11:26<7:12:04, 2.76s/it] {'loss': 0.6563, 'grad_norm': 4.182540781833306, 'learning_rate': 4.45384631070542e-06, 'epoch': 0.24} 24%|██▍ | 2931/12313 [2:11:26<7:12:04, 2.76s/it] 24%|██▍ | 2932/12313 [2:11:30<7:45:09, 2.98s/it] {'loss': 0.5779, 'grad_norm': 3.5345505370213264, 'learning_rate': 4.453435980926388e-06, 'epoch': 0.24} 24%|██▍ | 2932/12313 [2:11:30<7:45:09, 2.98s/it] 24%|██▍ | 2933/12313 [2:11:32<7:24:12, 2.84s/it] {'loss': 0.5615, 'grad_norm': 8.12041102050761, 'learning_rate': 4.453025515979926e-06, 'epoch': 0.24} 24%|██▍ | 2933/12313 [2:11:32<7:24:12, 2.84s/it] 24%|██▍ | 2934/12313 [2:11:35<7:15:49, 2.79s/it] {'loss': 0.4717, 'grad_norm': 6.278197290858452, 'learning_rate': 4.452614915894434e-06, 'epoch': 0.24} 24%|██▍ | 2934/12313 [2:11:35<7:15:49, 2.79s/it] 24%|██▍ | 2935/12313 [2:11:38<7:21:36, 2.83s/it] {'loss': 0.5222, 'grad_norm': 3.470804618193489, 'learning_rate': 4.452204180698325e-06, 'epoch': 0.24} 24%|██▍ | 2935/12313 [2:11:38<7:21:36, 2.83s/it] 24%|██▍ | 2936/12313 [2:11:40<7:12:17, 2.77s/it] {'loss': 0.6273, 'grad_norm': 4.736573933269891, 'learning_rate': 4.451793310420017e-06, 'epoch': 0.24} 24%|██▍ | 2936/12313 [2:11:40<7:12:17, 2.77s/it] 24%|██▍ | 2937/12313 [2:11:43<7:07:26, 2.74s/it] {'loss': 0.6002, 'grad_norm': 6.791656553293532, 'learning_rate': 4.451382305087943e-06, 'epoch': 0.24} 24%|██▍ | 2937/12313 [2:11:43<7:07:26, 2.74s/it] 24%|██▍ | 2938/12313 [2:11:46<7:02:50, 2.71s/it] {'loss': 0.4814, 'grad_norm': 4.478548259706515, 'learning_rate': 4.450971164730541e-06, 'epoch': 0.24} 24%|██▍ | 2938/12313 [2:11:46<7:02:50, 2.71s/it] 24%|██▍ | 2939/12313 [2:11:48<7:07:14, 2.73s/it] {'loss': 0.5895, 'grad_norm': 3.681202303319773, 'learning_rate': 4.4505598893762595e-06, 'epoch': 0.24} 24%|██▍ | 2939/12313 [2:11:48<7:07:14, 2.73s/it] 24%|██▍ | 2940/12313 [2:11:51<7:04:25, 2.72s/it] {'loss': 0.6095, 'grad_norm': 4.1452045492879055, 'learning_rate': 4.4501484790535555e-06, 'epoch': 0.24} 24%|██▍ | 2940/12313 [2:11:51<7:04:25, 2.72s/it] 24%|██▍ | 2941/12313 [2:11:54<7:00:48, 2.69s/it] {'loss': 0.6445, 'grad_norm': 6.265763596141227, 'learning_rate': 4.449736933790899e-06, 'epoch': 0.24} 24%|██▍ | 2941/12313 [2:11:54<7:00:48, 2.69s/it] 24%|██▍ | 2942/12313 [2:11:56<6:58:18, 2.68s/it] {'loss': 0.6796, 'grad_norm': 3.195443239329471, 'learning_rate': 4.449325253616765e-06, 'epoch': 0.24} 24%|██▍ | 2942/12313 [2:11:56<6:58:18, 2.68s/it] 24%|██▍ | 2943/12313 [2:11:59<7:20:07, 2.82s/it] {'loss': 0.5013, 'grad_norm': 4.681861481447409, 'learning_rate': 4.448913438559641e-06, 'epoch': 0.24} 24%|██▍ | 2943/12313 [2:11:59<7:20:07, 2.82s/it] 24%|██▍ | 2944/12313 [2:12:02<7:12:55, 2.77s/it] {'loss': 0.4841, 'grad_norm': 3.9893399361235873, 'learning_rate': 4.448501488648021e-06, 'epoch': 0.24} 24%|██▍ | 2944/12313 [2:12:02<7:12:55, 2.77s/it] 24%|██▍ | 2945/12313 [2:12:05<6:58:01, 2.68s/it] {'loss': 0.65, 'grad_norm': 8.26472685534627, 'learning_rate': 4.448089403910411e-06, 'epoch': 0.24} 24%|██▍ | 2945/12313 [2:12:05<6:58:01, 2.68s/it] 24%|██▍ | 2946/12313 [2:12:07<7:02:11, 2.70s/it] {'loss': 0.6863, 'grad_norm': 7.279481691054026, 'learning_rate': 4.447677184375323e-06, 'epoch': 0.24} 24%|██▍ | 2946/12313 [2:12:07<7:02:11, 2.70s/it] 24%|██▍ | 2947/12313 [2:12:10<6:59:34, 2.69s/it] {'loss': 0.4314, 'grad_norm': 5.107617453615278, 'learning_rate': 4.447264830071282e-06, 'epoch': 0.24} 24%|██▍ | 2947/12313 [2:12:10<6:59:34, 2.69s/it] 24%|██▍ | 2948/12313 [2:12:13<6:59:08, 2.69s/it] {'loss': 0.5368, 'grad_norm': 10.45135842568149, 'learning_rate': 4.446852341026822e-06, 'epoch': 0.24} 24%|██▍ | 2948/12313 [2:12:13<6:59:08, 2.69s/it] 24%|██▍ | 2949/12313 [2:12:15<6:49:42, 2.63s/it] {'loss': 0.5992, 'grad_norm': 3.384814108308006, 'learning_rate': 4.4464397172704825e-06, 'epoch': 0.24} 24%|██▍ | 2949/12313 [2:12:15<6:49:42, 2.63s/it] 24%|██▍ | 2950/12313 [2:12:18<7:01:22, 2.70s/it] {'loss': 0.6577, 'grad_norm': 3.6038417485455723, 'learning_rate': 4.446026958830816e-06, 'epoch': 0.24} 24%|██▍ | 2950/12313 [2:12:18<7:01:22, 2.70s/it] 24%|██▍ | 2951/12313 [2:12:21<7:04:02, 2.72s/it] {'loss': 0.5815, 'grad_norm': 5.105585351046193, 'learning_rate': 4.4456140657363824e-06, 'epoch': 0.24} 24%|██▍ | 2951/12313 [2:12:21<7:04:02, 2.72s/it] 24%|██▍ | 2952/12313 [2:12:23<7:02:06, 2.71s/it] {'loss': 0.6109, 'grad_norm': 6.436128437554316, 'learning_rate': 4.445201038015753e-06, 'epoch': 0.24} 24%|██▍ | 2952/12313 [2:12:23<7:02:06, 2.71s/it] 24%|██▍ | 2953/12313 [2:12:26<6:52:27, 2.64s/it] {'loss': 0.5737, 'grad_norm': 6.0286114195287, 'learning_rate': 4.4447878756975074e-06, 'epoch': 0.24} 24%|██▍ | 2953/12313 [2:12:26<6:52:27, 2.64s/it] 24%|██▍ | 2954/12313 [2:12:28<6:42:31, 2.58s/it] {'loss': 0.6333, 'grad_norm': 6.340347602254372, 'learning_rate': 4.444374578810233e-06, 'epoch': 0.24} 24%|██▍ | 2954/12313 [2:12:28<6:42:31, 2.58s/it] 24%|██▍ | 2955/12313 [2:12:31<6:44:43, 2.59s/it] {'loss': 0.5584, 'grad_norm': 3.611889348535864, 'learning_rate': 4.443961147382528e-06, 'epoch': 0.24} 24%|██▍ | 2955/12313 [2:12:31<6:44:43, 2.59s/it] 24%|██▍ | 2956/12313 [2:12:34<6:54:39, 2.66s/it] {'loss': 0.5662, 'grad_norm': 4.508170768419845, 'learning_rate': 4.4435475814429995e-06, 'epoch': 0.24} 24%|██▍ | 2956/12313 [2:12:34<6:54:39, 2.66s/it] 24%|██▍ | 2957/12313 [2:12:36<6:51:55, 2.64s/it] {'loss': 0.5413, 'grad_norm': 3.857621418576937, 'learning_rate': 4.4431338810202655e-06, 'epoch': 0.24} 24%|██▍ | 2957/12313 [2:12:36<6:51:55, 2.64s/it] 24%|██▍ | 2958/12313 [2:12:39<6:50:28, 2.63s/it] {'loss': 0.5279, 'grad_norm': 5.167525653044904, 'learning_rate': 4.4427200461429494e-06, 'epoch': 0.24} 24%|██▍ | 2958/12313 [2:12:39<6:50:28, 2.63s/it] 24%|██▍ | 2959/12313 [2:12:42<6:50:17, 2.63s/it] {'loss': 0.4993, 'grad_norm': 9.873321863622351, 'learning_rate': 4.442306076839689e-06, 'epoch': 0.24} 24%|██▍ | 2959/12313 [2:12:42<6:50:17, 2.63s/it] 24%|██▍ | 2960/12313 [2:12:45<7:14:54, 2.79s/it] {'loss': 0.5868, 'grad_norm': 9.73450941091562, 'learning_rate': 4.441891973139127e-06, 'epoch': 0.24} 24%|██▍ | 2960/12313 [2:12:45<7:14:54, 2.79s/it] 24%|██▍ | 2961/12313 [2:12:48<7:45:55, 2.99s/it] {'loss': 0.6352, 'grad_norm': 3.7053040122979297, 'learning_rate': 4.441477735069918e-06, 'epoch': 0.24} 24%|██▍ | 2961/12313 [2:12:48<7:45:55, 2.99s/it] 24%|██▍ | 2962/12313 [2:12:51<7:54:53, 3.05s/it] {'loss': 0.6923, 'grad_norm': 5.920625186279193, 'learning_rate': 4.441063362660726e-06, 'epoch': 0.24} 24%|██▍ | 2962/12313 [2:12:51<7:54:53, 3.05s/it] 24%|██▍ | 2963/12313 [2:12:54<7:35:27, 2.92s/it] {'loss': 0.5829, 'grad_norm': 6.875957645683971, 'learning_rate': 4.44064885594022e-06, 'epoch': 0.24} 24%|██▍ | 2963/12313 [2:12:54<7:35:27, 2.92s/it] 24%|██▍ | 2964/12313 [2:12:57<7:29:14, 2.88s/it] {'loss': 0.4949, 'grad_norm': 6.3088089208901215, 'learning_rate': 4.440234214937086e-06, 'epoch': 0.24} 24%|██▍ | 2964/12313 [2:12:57<7:29:14, 2.88s/it] 24%|██▍ | 2965/12313 [2:12:59<7:13:20, 2.78s/it] {'loss': 0.5373, 'grad_norm': 6.283915304694119, 'learning_rate': 4.439819439680012e-06, 'epoch': 0.24} 24%|██▍ | 2965/12313 [2:12:59<7:13:20, 2.78s/it] 24%|██▍ | 2966/12313 [2:13:02<7:05:50, 2.73s/it] {'loss': 0.5615, 'grad_norm': 5.544344295999567, 'learning_rate': 4.439404530197699e-06, 'epoch': 0.24} 24%|██▍ | 2966/12313 [2:13:02<7:05:50, 2.73s/it] 24%|██▍ | 2967/12313 [2:13:05<6:57:20, 2.68s/it] {'loss': 0.496, 'grad_norm': 5.522735082552348, 'learning_rate': 4.438989486518856e-06, 'epoch': 0.24} 24%|██▍ | 2967/12313 [2:13:05<6:57:20, 2.68s/it] 24%|██▍ | 2968/12313 [2:13:07<6:46:13, 2.61s/it] {'loss': 0.3989, 'grad_norm': 6.134276115094066, 'learning_rate': 4.438574308672203e-06, 'epoch': 0.24} 24%|██▍ | 2968/12313 [2:13:07<6:46:13, 2.61s/it] 24%|██▍ | 2969/12313 [2:13:10<6:51:48, 2.64s/it] {'loss': 0.4992, 'grad_norm': 18.34187318276363, 'learning_rate': 4.438158996686468e-06, 'epoch': 0.24} 24%|██▍ | 2969/12313 [2:13:10<6:51:48, 2.64s/it] 24%|██▍ | 2970/12313 [2:13:12<6:50:00, 2.63s/it] {'loss': 0.7617, 'grad_norm': 5.7407597574290605, 'learning_rate': 4.4377435505903876e-06, 'epoch': 0.24} 24%|██▍ | 2970/12313 [2:13:12<6:50:00, 2.63s/it] 24%|██▍ | 2971/12313 [2:13:15<6:55:18, 2.67s/it] {'loss': 0.6338, 'grad_norm': 5.129242694583421, 'learning_rate': 4.4373279704127095e-06, 'epoch': 0.24} 24%|██▍ | 2971/12313 [2:13:15<6:55:18, 2.67s/it] 24%|██▍ | 2972/12313 [2:13:18<6:59:06, 2.69s/it] {'loss': 0.4831, 'grad_norm': 5.956882401490225, 'learning_rate': 4.4369122561821885e-06, 'epoch': 0.24} 24%|██▍ | 2972/12313 [2:13:18<6:59:06, 2.69s/it] 24%|██▍ | 2973/12313 [2:13:20<6:54:43, 2.66s/it] {'loss': 0.5962, 'grad_norm': 4.390822342695221, 'learning_rate': 4.436496407927591e-06, 'epoch': 0.24} 24%|██▍ | 2973/12313 [2:13:20<6:54:43, 2.66s/it] 24%|██▍ | 2974/12313 [2:13:23<7:08:03, 2.75s/it] {'loss': 0.5129, 'grad_norm': 8.138342846156773, 'learning_rate': 4.436080425677689e-06, 'epoch': 0.24} 24%|██▍ | 2974/12313 [2:13:23<7:08:03, 2.75s/it] 24%|██▍ | 2975/12313 [2:13:26<7:17:28, 2.81s/it] {'loss': 0.6631, 'grad_norm': 3.692323205845064, 'learning_rate': 4.43566430946127e-06, 'epoch': 0.24} 24%|██▍ | 2975/12313 [2:13:26<7:17:28, 2.81s/it] 24%|██▍ | 2976/12313 [2:13:29<7:14:53, 2.79s/it] {'loss': 0.5203, 'grad_norm': 4.598524784324876, 'learning_rate': 4.435248059307124e-06, 'epoch': 0.24} 24%|██▍ | 2976/12313 [2:13:29<7:14:53, 2.79s/it] 24%|██▍ | 2977/12313 [2:13:32<7:08:58, 2.76s/it] {'loss': 0.5976, 'grad_norm': 36.50302303476803, 'learning_rate': 4.434831675244056e-06, 'epoch': 0.24} 24%|██▍ | 2977/12313 [2:13:32<7:08:58, 2.76s/it] 24%|██▍ | 2978/12313 [2:13:34<7:03:05, 2.72s/it] {'loss': 0.7187, 'grad_norm': 4.225579732329154, 'learning_rate': 4.434415157300875e-06, 'epoch': 0.24} 24%|██▍ | 2978/12313 [2:13:34<7:03:05, 2.72s/it] 24%|██▍ | 2979/12313 [2:13:37<7:08:35, 2.76s/it] {'loss': 0.5036, 'grad_norm': 6.144364914404981, 'learning_rate': 4.433998505506402e-06, 'epoch': 0.24} 24%|██▍ | 2979/12313 [2:13:37<7:08:35, 2.76s/it] 24%|██▍ | 2980/12313 [2:13:40<7:06:54, 2.74s/it] {'loss': 0.5174, 'grad_norm': 5.890869286549283, 'learning_rate': 4.433581719889469e-06, 'epoch': 0.24} 24%|██▍ | 2980/12313 [2:13:40<7:06:54, 2.74s/it] 24%|██▍ | 2981/12313 [2:13:43<7:06:31, 2.74s/it] {'loss': 0.5758, 'grad_norm': 4.666862090401287, 'learning_rate': 4.433164800478914e-06, 'epoch': 0.24} 24%|██▍ | 2981/12313 [2:13:43<7:06:31, 2.74s/it] 24%|██▍ | 2982/12313 [2:13:46<7:17:30, 2.81s/it] {'loss': 0.5223, 'grad_norm': 4.488080886068684, 'learning_rate': 4.432747747303586e-06, 'epoch': 0.24} 24%|██▍ | 2982/12313 [2:13:46<7:17:30, 2.81s/it] 24%|██▍ | 2983/12313 [2:13:49<7:18:58, 2.82s/it] {'loss': 0.5231, 'grad_norm': 4.722733122087884, 'learning_rate': 4.432330560392343e-06, 'epoch': 0.24} 24%|██▍ | 2983/12313 [2:13:49<7:18:58, 2.82s/it] 24%|██▍ | 2984/12313 [2:13:51<7:04:20, 2.73s/it] {'loss': 0.6269, 'grad_norm': 10.302897259600046, 'learning_rate': 4.431913239774052e-06, 'epoch': 0.24} 24%|██▍ | 2984/12313 [2:13:51<7:04:20, 2.73s/it] 24%|██▍ | 2985/12313 [2:13:54<7:03:35, 2.72s/it] {'loss': 0.4772, 'grad_norm': 4.986694907260506, 'learning_rate': 4.4314957854775895e-06, 'epoch': 0.24} 24%|██▍ | 2985/12313 [2:13:54<7:03:35, 2.72s/it] 24%|██▍ | 2986/12313 [2:13:56<6:54:50, 2.67s/it] {'loss': 0.4776, 'grad_norm': 28.644633867544297, 'learning_rate': 4.43107819753184e-06, 'epoch': 0.24} 24%|██▍ | 2986/12313 [2:13:56<6:54:50, 2.67s/it] 24%|██▍ | 2987/12313 [2:13:59<6:53:57, 2.66s/it] {'loss': 0.6489, 'grad_norm': 4.650389954451338, 'learning_rate': 4.4306604759657e-06, 'epoch': 0.24} 24%|██▍ | 2987/12313 [2:13:59<6:53:57, 2.66s/it] 24%|██▍ | 2988/12313 [2:14:02<6:53:56, 2.66s/it] {'loss': 0.4797, 'grad_norm': 8.110031601217097, 'learning_rate': 4.430242620808073e-06, 'epoch': 0.24} 24%|██▍ | 2988/12313 [2:14:02<6:53:56, 2.66s/it] 24%|██▍ | 2989/12313 [2:14:04<7:02:22, 2.72s/it] {'loss': 0.5772, 'grad_norm': 4.435012545474071, 'learning_rate': 4.429824632087873e-06, 'epoch': 0.24} 24%|██▍ | 2989/12313 [2:14:04<7:02:22, 2.72s/it] 24%|██▍ | 2990/12313 [2:14:07<7:14:36, 2.80s/it] {'loss': 0.6512, 'grad_norm': 6.6461768049947505, 'learning_rate': 4.42940650983402e-06, 'epoch': 0.24} 24%|██▍ | 2990/12313 [2:14:07<7:14:36, 2.80s/it] 24%|██▍ | 2991/12313 [2:14:10<6:58:35, 2.69s/it] {'loss': 0.4932, 'grad_norm': 4.2488570760327935, 'learning_rate': 4.428988254075449e-06, 'epoch': 0.24} 24%|██▍ | 2991/12313 [2:14:10<6:58:35, 2.69s/it] 24%|██▍ | 2992/12313 [2:14:13<7:12:40, 2.79s/it] {'loss': 0.6507, 'grad_norm': 3.4583725498967657, 'learning_rate': 4.4285698648411005e-06, 'epoch': 0.24} 24%|██▍ | 2992/12313 [2:14:13<7:12:40, 2.79s/it] 24%|██▍ | 2993/12313 [2:14:15<7:02:53, 2.72s/it] {'loss': 0.7045, 'grad_norm': 8.442412186418991, 'learning_rate': 4.428151342159923e-06, 'epoch': 0.24} 24%|██▍ | 2993/12313 [2:14:15<7:02:53, 2.72s/it] 24%|██▍ | 2994/12313 [2:14:18<7:00:39, 2.71s/it] {'loss': 0.9022, 'grad_norm': 4.275956058237837, 'learning_rate': 4.427732686060877e-06, 'epoch': 0.24} 24%|██▍ | 2994/12313 [2:14:18<7:00:39, 2.71s/it] 24%|██▍ | 2995/12313 [2:14:21<6:55:48, 2.68s/it] {'loss': 0.5063, 'grad_norm': 3.5996383051852328, 'learning_rate': 4.427313896572933e-06, 'epoch': 0.24} 24%|██▍ | 2995/12313 [2:14:21<6:55:48, 2.68s/it] 24%|██▍ | 2996/12313 [2:14:23<6:45:35, 2.61s/it] {'loss': 0.5054, 'grad_norm': 8.226615556870824, 'learning_rate': 4.426894973725066e-06, 'epoch': 0.24} 24%|██▍ | 2996/12313 [2:14:23<6:45:35, 2.61s/it] 24%|██▍ | 2997/12313 [2:14:26<6:36:15, 2.55s/it] {'loss': 0.5189, 'grad_norm': 4.428674996775757, 'learning_rate': 4.426475917546266e-06, 'epoch': 0.24} 24%|██▍ | 2997/12313 [2:14:26<6:36:15, 2.55s/it] 24%|██▍ | 2998/12313 [2:14:28<6:48:42, 2.63s/it] {'loss': 0.5175, 'grad_norm': 3.292995498057493, 'learning_rate': 4.426056728065527e-06, 'epoch': 0.24} 24%|██▍ | 2998/12313 [2:14:28<6:48:42, 2.63s/it] 24%|██▍ | 2999/12313 [2:14:31<6:56:26, 2.68s/it] {'loss': 0.5239, 'grad_norm': 5.396362723485166, 'learning_rate': 4.425637405311857e-06, 'epoch': 0.24} 24%|██▍ | 2999/12313 [2:14:31<6:56:26, 2.68s/it] 24%|██▍ | 3000/12313 [2:14:34<6:55:38, 2.68s/it] {'loss': 0.5758, 'grad_norm': 4.409324480057902, 'learning_rate': 4.425217949314269e-06, 'epoch': 0.24} 24%|██▍ | 3000/12313 [2:14:34<6:55:38, 2.68s/it] 24%|██▍ | 3001/12313 [2:14:37<7:00:52, 2.71s/it] {'loss': 0.5178, 'grad_norm': 4.772191402586126, 'learning_rate': 4.424798360101788e-06, 'epoch': 0.24} 24%|██▍ | 3001/12313 [2:14:37<7:00:52, 2.71s/it] 24%|██▍ | 3002/12313 [2:14:40<7:15:51, 2.81s/it] {'loss': 0.5634, 'grad_norm': 4.70270174443938, 'learning_rate': 4.424378637703448e-06, 'epoch': 0.24} 24%|██▍ | 3002/12313 [2:14:40<7:15:51, 2.81s/it] 24%|██▍ | 3003/12313 [2:14:43<7:18:24, 2.83s/it] {'loss': 0.6138, 'grad_norm': 4.384411856516953, 'learning_rate': 4.423958782148291e-06, 'epoch': 0.24} 24%|██▍ | 3003/12313 [2:14:43<7:18:24, 2.83s/it] 24%|██▍ | 3004/12313 [2:14:45<7:09:55, 2.77s/it] {'loss': 0.5399, 'grad_norm': 4.849333757605469, 'learning_rate': 4.423538793465368e-06, 'epoch': 0.24} 24%|██▍ | 3004/12313 [2:14:45<7:09:55, 2.77s/it] 24%|██▍ | 3005/12313 [2:14:48<6:58:44, 2.70s/it] {'loss': 0.5882, 'grad_norm': 7.683916564358679, 'learning_rate': 4.423118671683741e-06, 'epoch': 0.24} 24%|██▍ | 3005/12313 [2:14:48<6:58:44, 2.70s/it] 24%|██▍ | 3006/12313 [2:14:50<6:47:35, 2.63s/it] {'loss': 0.501, 'grad_norm': 6.414015664183622, 'learning_rate': 4.42269841683248e-06, 'epoch': 0.24} 24%|██▍ | 3006/12313 [2:14:50<6:47:35, 2.63s/it] 24%|██▍ | 3007/12313 [2:14:53<6:50:29, 2.65s/it] {'loss': 0.7989, 'grad_norm': 3.445748481334447, 'learning_rate': 4.422278028940664e-06, 'epoch': 0.24} 24%|██▍ | 3007/12313 [2:14:53<6:50:29, 2.65s/it] 24%|██▍ | 3008/12313 [2:14:56<6:58:08, 2.70s/it] {'loss': 0.6564, 'grad_norm': 3.446395382611435, 'learning_rate': 4.4218575080373825e-06, 'epoch': 0.24} 24%|██▍ | 3008/12313 [2:14:56<6:58:08, 2.70s/it] 24%|██▍ | 3009/12313 [2:14:58<6:43:02, 2.60s/it] {'loss': 0.5491, 'grad_norm': 7.075485495123882, 'learning_rate': 4.421436854151731e-06, 'epoch': 0.24} 24%|██▍ | 3009/12313 [2:14:58<6:43:02, 2.60s/it] 24%|██▍ | 3010/12313 [2:15:01<6:49:36, 2.64s/it] {'loss': 0.5192, 'grad_norm': 13.379409500342488, 'learning_rate': 4.421016067312821e-06, 'epoch': 0.24} 24%|██▍ | 3010/12313 [2:15:01<6:49:36, 2.64s/it] 24%|██▍ | 3011/12313 [2:15:04<6:48:51, 2.64s/it] {'loss': 0.5197, 'grad_norm': 6.268741236760656, 'learning_rate': 4.420595147549764e-06, 'epoch': 0.24} 24%|██▍ | 3011/12313 [2:15:04<6:48:51, 2.64s/it] 24%|██▍ | 3012/12313 [2:15:06<6:50:20, 2.65s/it] {'loss': 0.6056, 'grad_norm': 4.324676469837308, 'learning_rate': 4.420174094891688e-06, 'epoch': 0.24} 24%|██▍ | 3012/12313 [2:15:06<6:50:20, 2.65s/it] 24%|██▍ | 3013/12313 [2:15:09<6:51:23, 2.65s/it] {'loss': 0.4472, 'grad_norm': 4.740074003918016, 'learning_rate': 4.419752909367727e-06, 'epoch': 0.24} 24%|██▍ | 3013/12313 [2:15:09<6:51:23, 2.65s/it] 24%|██▍ | 3014/12313 [2:15:11<6:50:17, 2.65s/it] {'loss': 0.662, 'grad_norm': 4.345165270453679, 'learning_rate': 4.419331591007025e-06, 'epoch': 0.24} 24%|██▍ | 3014/12313 [2:15:11<6:50:17, 2.65s/it] 24%|██▍ | 3015/12313 [2:15:14<6:56:39, 2.69s/it] {'loss': 0.6347, 'grad_norm': 3.2912342359960913, 'learning_rate': 4.418910139838734e-06, 'epoch': 0.24} 24%|██▍ | 3015/12313 [2:15:14<6:56:39, 2.69s/it] 24%|██▍ | 3016/12313 [2:15:17<6:59:02, 2.70s/it] {'loss': 0.709, 'grad_norm': 4.197613659900217, 'learning_rate': 4.418488555892018e-06, 'epoch': 0.24} 24%|██▍ | 3016/12313 [2:15:17<6:59:02, 2.70s/it] 25%|██▍ | 3017/12313 [2:15:20<6:51:43, 2.66s/it] {'loss': 0.5097, 'grad_norm': 6.165218832788323, 'learning_rate': 4.418066839196047e-06, 'epoch': 0.25} 25%|██▍ | 3017/12313 [2:15:20<6:51:43, 2.66s/it] 25%|██▍ | 3018/12313 [2:15:22<6:52:41, 2.66s/it] {'loss': 0.4602, 'grad_norm': 5.754235502940853, 'learning_rate': 4.4176449897800025e-06, 'epoch': 0.25} 25%|██▍ | 3018/12313 [2:15:22<6:52:41, 2.66s/it] 25%|██▍ | 3019/12313 [2:15:25<6:56:56, 2.69s/it] {'loss': 0.6116, 'grad_norm': 6.807261983784002, 'learning_rate': 4.417223007673073e-06, 'epoch': 0.25} 25%|██▍ | 3019/12313 [2:15:25<6:56:56, 2.69s/it] 25%|██▍ | 3020/12313 [2:15:28<6:52:14, 2.66s/it] {'loss': 0.4809, 'grad_norm': 5.874117891239969, 'learning_rate': 4.4168008929044585e-06, 'epoch': 0.25} 25%|██▍ | 3020/12313 [2:15:28<6:52:14, 2.66s/it] 25%|██▍ | 3021/12313 [2:15:30<6:55:42, 2.68s/it] {'loss': 0.5298, 'grad_norm': 4.771234322125993, 'learning_rate': 4.416378645503366e-06, 'epoch': 0.25} 25%|██▍ | 3021/12313 [2:15:30<6:55:42, 2.68s/it] 25%|██▍ | 3022/12313 [2:15:33<6:53:19, 2.67s/it] {'loss': 0.5011, 'grad_norm': 5.033152836788516, 'learning_rate': 4.415956265499014e-06, 'epoch': 0.25} 25%|██▍ | 3022/12313 [2:15:33<6:53:19, 2.67s/it] 25%|██▍ | 3023/12313 [2:15:35<6:46:34, 2.63s/it] {'loss': 0.5847, 'grad_norm': 4.892045552702636, 'learning_rate': 4.415533752920629e-06, 'epoch': 0.25} 25%|██▍ | 3023/12313 [2:15:35<6:46:34, 2.63s/it] 25%|██▍ | 3024/12313 [2:15:38<6:53:01, 2.67s/it] {'loss': 0.5626, 'grad_norm': 3.158494131947643, 'learning_rate': 4.415111107797445e-06, 'epoch': 0.25} 25%|██▍ | 3024/12313 [2:15:38<6:53:01, 2.67s/it] 25%|██▍ | 3025/12313 [2:15:41<6:46:24, 2.63s/it] {'loss': 0.6228, 'grad_norm': 19.734315350148133, 'learning_rate': 4.414688330158709e-06, 'epoch': 0.25} 25%|██▍ | 3025/12313 [2:15:41<6:46:24, 2.63s/it] 25%|██▍ | 3026/12313 [2:15:43<6:37:55, 2.57s/it] {'loss': 0.6962, 'grad_norm': 3.975333547273237, 'learning_rate': 4.4142654200336735e-06, 'epoch': 0.25} 25%|██▍ | 3026/12313 [2:15:43<6:37:55, 2.57s/it] 25%|██▍ | 3027/12313 [2:15:46<6:44:47, 2.62s/it] {'loss': 0.541, 'grad_norm': 5.615922654297775, 'learning_rate': 4.413842377451602e-06, 'epoch': 0.25} 25%|██▍ | 3027/12313 [2:15:46<6:44:47, 2.62s/it] 25%|██▍ | 3028/12313 [2:15:49<6:47:33, 2.63s/it] {'loss': 0.4546, 'grad_norm': 9.003336691465453, 'learning_rate': 4.4134192024417674e-06, 'epoch': 0.25} 25%|██▍ | 3028/12313 [2:15:49<6:47:33, 2.63s/it] 25%|██▍ | 3029/12313 [2:15:51<6:42:23, 2.60s/it] {'loss': 0.4849, 'grad_norm': 3.891965066900615, 'learning_rate': 4.412995895033449e-06, 'epoch': 0.25} 25%|██▍ | 3029/12313 [2:15:51<6:42:23, 2.60s/it] 25%|██▍ | 3030/12313 [2:15:54<6:57:52, 2.70s/it] {'loss': 0.615, 'grad_norm': 4.878722949580337, 'learning_rate': 4.412572455255942e-06, 'epoch': 0.25} 25%|██▍ | 3030/12313 [2:15:54<6:57:52, 2.70s/it] 25%|██▍ | 3031/12313 [2:15:57<6:47:54, 2.64s/it] {'loss': 0.5744, 'grad_norm': 7.419504517267929, 'learning_rate': 4.412148883138541e-06, 'epoch': 0.25} 25%|██▍ | 3031/12313 [2:15:57<6:47:54, 2.64s/it] 25%|██▍ | 3032/12313 [2:15:59<6:47:18, 2.63s/it] {'loss': 0.4246, 'grad_norm': 15.433110249433737, 'learning_rate': 4.4117251787105566e-06, 'epoch': 0.25} 25%|██▍ | 3032/12313 [2:15:59<6:47:18, 2.63s/it] 25%|██▍ | 3033/12313 [2:16:02<6:44:31, 2.62s/it] {'loss': 0.4779, 'grad_norm': 4.626381274322953, 'learning_rate': 4.411301342001309e-06, 'epoch': 0.25} 25%|██▍ | 3033/12313 [2:16:02<6:44:31, 2.62s/it] 25%|██▍ | 3034/12313 [2:16:05<6:52:21, 2.67s/it] {'loss': 0.5733, 'grad_norm': 3.140055018404333, 'learning_rate': 4.4108773730401235e-06, 'epoch': 0.25} 25%|██▍ | 3034/12313 [2:16:05<6:52:21, 2.67s/it] 25%|██▍ | 3035/12313 [2:16:07<6:47:43, 2.64s/it] {'loss': 0.5525, 'grad_norm': 5.288841429027363, 'learning_rate': 4.410453271856337e-06, 'epoch': 0.25} 25%|██▍ | 3035/12313 [2:16:07<6:47:43, 2.64s/it] 25%|██▍ | 3036/12313 [2:16:10<6:45:41, 2.62s/it] {'loss': 0.5659, 'grad_norm': 5.257436077679411, 'learning_rate': 4.410029038479295e-06, 'epoch': 0.25} 25%|██▍ | 3036/12313 [2:16:10<6:45:41, 2.62s/it] 25%|██▍ | 3037/12313 [2:16:12<6:36:49, 2.57s/it] {'loss': 0.5136, 'grad_norm': 5.634845248429157, 'learning_rate': 4.409604672938352e-06, 'epoch': 0.25} 25%|██▍ | 3037/12313 [2:16:12<6:36:49, 2.57s/it] 25%|██▍ | 3038/12313 [2:16:15<6:47:31, 2.64s/it] {'loss': 0.5139, 'grad_norm': 6.3457504654963195, 'learning_rate': 4.409180175262872e-06, 'epoch': 0.25} 25%|██▍ | 3038/12313 [2:16:15<6:47:31, 2.64s/it] 25%|██▍ | 3039/12313 [2:16:18<6:48:18, 2.64s/it] {'loss': 0.6184, 'grad_norm': 6.705941236376485, 'learning_rate': 4.408755545482229e-06, 'epoch': 0.25} 25%|██▍ | 3039/12313 [2:16:18<6:48:18, 2.64s/it] 25%|██▍ | 3040/12313 [2:16:20<6:42:39, 2.61s/it] {'loss': 0.6296, 'grad_norm': 4.672344302149794, 'learning_rate': 4.408330783625803e-06, 'epoch': 0.25} 25%|██▍ | 3040/12313 [2:16:20<6:42:39, 2.61s/it] 25%|██▍ | 3041/12313 [2:16:23<6:44:15, 2.62s/it] {'loss': 0.3766, 'grad_norm': 7.4047101228444685, 'learning_rate': 4.407905889722987e-06, 'epoch': 0.25} 25%|██▍ | 3041/12313 [2:16:23<6:44:15, 2.62s/it] 25%|██▍ | 3042/12313 [2:16:25<6:45:04, 2.62s/it] {'loss': 0.5838, 'grad_norm': 4.2484417683054545, 'learning_rate': 4.407480863803181e-06, 'epoch': 0.25} 25%|██▍ | 3042/12313 [2:16:25<6:45:04, 2.62s/it] 25%|██▍ | 3043/12313 [2:16:28<6:45:18, 2.62s/it] {'loss': 0.6218, 'grad_norm': 5.209047094212918, 'learning_rate': 4.407055705895794e-06, 'epoch': 0.25} 25%|██▍ | 3043/12313 [2:16:28<6:45:18, 2.62s/it] 25%|██▍ | 3044/12313 [2:16:31<6:45:49, 2.63s/it] {'loss': 0.5566, 'grad_norm': 4.0255565337154025, 'learning_rate': 4.4066304160302455e-06, 'epoch': 0.25} 25%|██▍ | 3044/12313 [2:16:31<6:45:49, 2.63s/it] 25%|██▍ | 3045/12313 [2:16:33<6:49:22, 2.65s/it] {'loss': 0.5295, 'grad_norm': 4.333677201999764, 'learning_rate': 4.4062049942359634e-06, 'epoch': 0.25} 25%|██▍ | 3045/12313 [2:16:33<6:49:22, 2.65s/it] 25%|██▍ | 3046/12313 [2:16:36<6:53:43, 2.68s/it] {'loss': 0.7309, 'grad_norm': 4.229077150168856, 'learning_rate': 4.405779440542383e-06, 'epoch': 0.25} 25%|██▍ | 3046/12313 [2:16:36<6:53:43, 2.68s/it] 25%|██▍ | 3047/12313 [2:16:39<6:41:30, 2.60s/it] {'loss': 0.5675, 'grad_norm': 7.678488339873706, 'learning_rate': 4.405353754978952e-06, 'epoch': 0.25} 25%|██▍ | 3047/12313 [2:16:39<6:41:30, 2.60s/it] 25%|██▍ | 3048/12313 [2:16:41<6:44:28, 2.62s/it] {'loss': 0.6678, 'grad_norm': 5.135214318233568, 'learning_rate': 4.404927937575125e-06, 'epoch': 0.25} 25%|██▍ | 3048/12313 [2:16:41<6:44:28, 2.62s/it] 25%|██▍ | 3049/12313 [2:16:44<6:37:05, 2.57s/it] {'loss': 0.544, 'grad_norm': 5.25362634341942, 'learning_rate': 4.4045019883603676e-06, 'epoch': 0.25} 25%|██▍ | 3049/12313 [2:16:44<6:37:05, 2.57s/it] 25%|██▍ | 3050/12313 [2:16:46<6:32:03, 2.54s/it] {'loss': 0.6509, 'grad_norm': 5.0585319314475194, 'learning_rate': 4.40407590736415e-06, 'epoch': 0.25} 25%|██▍ | 3050/12313 [2:16:46<6:32:03, 2.54s/it] 25%|██▍ | 3051/12313 [2:16:49<6:37:26, 2.57s/it] {'loss': 0.5899, 'grad_norm': 6.248691395394384, 'learning_rate': 4.403649694615959e-06, 'epoch': 0.25} 25%|██▍ | 3051/12313 [2:16:49<6:37:26, 2.57s/it] 25%|██▍ | 3052/12313 [2:16:51<6:30:18, 2.53s/it] {'loss': 0.4696, 'grad_norm': 10.336963003211816, 'learning_rate': 4.403223350145283e-06, 'epoch': 0.25} 25%|██▍ | 3052/12313 [2:16:51<6:30:18, 2.53s/it] 25%|██▍ | 3053/12313 [2:16:54<6:30:20, 2.53s/it] {'loss': 0.4006, 'grad_norm': 4.541662859187703, 'learning_rate': 4.402796873981623e-06, 'epoch': 0.25} 25%|██▍ | 3053/12313 [2:16:54<6:30:20, 2.53s/it] 25%|██▍ | 3054/12313 [2:16:56<6:31:06, 2.53s/it] {'loss': 0.5062, 'grad_norm': 6.296840627991974, 'learning_rate': 4.402370266154491e-06, 'epoch': 0.25} 25%|██▍ | 3054/12313 [2:16:56<6:31:06, 2.53s/it] 25%|██▍ | 3055/12313 [2:16:59<6:36:05, 2.57s/it] {'loss': 0.7281, 'grad_norm': 5.894042406789113, 'learning_rate': 4.401943526693404e-06, 'epoch': 0.25} 25%|██▍ | 3055/12313 [2:16:59<6:36:05, 2.57s/it] 25%|██▍ | 3056/12313 [2:17:02<6:39:37, 2.59s/it] {'loss': 0.6206, 'grad_norm': 5.656446562333912, 'learning_rate': 4.401516655627891e-06, 'epoch': 0.25} 25%|██▍ | 3056/12313 [2:17:02<6:39:37, 2.59s/it] 25%|██▍ | 3057/12313 [2:17:04<6:42:44, 2.61s/it] {'loss': 0.4256, 'grad_norm': 5.686281684309384, 'learning_rate': 4.401089652987489e-06, 'epoch': 0.25} 25%|██▍ | 3057/12313 [2:17:04<6:42:44, 2.61s/it] 25%|██▍ | 3058/12313 [2:17:07<6:45:14, 2.63s/it] {'loss': 0.502, 'grad_norm': 6.0150133514660284, 'learning_rate': 4.4006625188017445e-06, 'epoch': 0.25} 25%|██▍ | 3058/12313 [2:17:07<6:45:14, 2.63s/it] 25%|██▍ | 3059/12313 [2:17:09<6:43:11, 2.61s/it] {'loss': 0.5524, 'grad_norm': 5.249632475937014, 'learning_rate': 4.400235253100214e-06, 'epoch': 0.25} 25%|██▍ | 3059/12313 [2:17:09<6:43:11, 2.61s/it] 25%|██▍ | 3060/12313 [2:17:12<6:50:47, 2.66s/it] {'loss': 0.6269, 'grad_norm': 4.835925595719573, 'learning_rate': 4.399807855912459e-06, 'epoch': 0.25} 25%|██▍ | 3060/12313 [2:17:12<6:50:47, 2.66s/it] 25%|██▍ | 3061/12313 [2:17:15<6:43:29, 2.62s/it] {'loss': 0.6161, 'grad_norm': 4.606327993826141, 'learning_rate': 4.3993803272680555e-06, 'epoch': 0.25} 25%|██▍ | 3061/12313 [2:17:15<6:43:29, 2.62s/it] 25%|██▍ | 3062/12313 [2:17:17<6:49:10, 2.65s/it] {'loss': 0.479, 'grad_norm': 7.236306200132777, 'learning_rate': 4.398952667196585e-06, 'epoch': 0.25} 25%|██▍ | 3062/12313 [2:17:17<6:49:10, 2.65s/it] 25%|██▍ | 3063/12313 [2:17:20<6:47:09, 2.64s/it] {'loss': 0.568, 'grad_norm': 5.858172893023612, 'learning_rate': 4.398524875727641e-06, 'epoch': 0.25} 25%|██▍ | 3063/12313 [2:17:20<6:47:09, 2.64s/it] 25%|██▍ | 3064/12313 [2:17:22<6:35:06, 2.56s/it] {'loss': 0.631, 'grad_norm': 4.056227046494207, 'learning_rate': 4.398096952890823e-06, 'epoch': 0.25} 25%|██▍ | 3064/12313 [2:17:22<6:35:06, 2.56s/it] 25%|██▍ | 3065/12313 [2:17:25<6:32:51, 2.55s/it] {'loss': 0.5397, 'grad_norm': 8.0717170292666, 'learning_rate': 4.397668898715743e-06, 'epoch': 0.25} 25%|██▍ | 3065/12313 [2:17:25<6:32:51, 2.55s/it] 25%|██▍ | 3066/12313 [2:17:27<6:26:04, 2.51s/it] {'loss': 0.5774, 'grad_norm': 6.347095752035782, 'learning_rate': 4.397240713232016e-06, 'epoch': 0.25} 25%|██▍ | 3066/12313 [2:17:27<6:26:04, 2.51s/it] 25%|██▍ | 3067/12313 [2:17:30<6:35:21, 2.57s/it] {'loss': 0.4825, 'grad_norm': 4.659148068120233, 'learning_rate': 4.3968123964692745e-06, 'epoch': 0.25} 25%|██▍ | 3067/12313 [2:17:30<6:35:21, 2.57s/it] 25%|██▍ | 3068/12313 [2:17:33<6:42:31, 2.61s/it] {'loss': 0.6587, 'grad_norm': 5.987640448722061, 'learning_rate': 4.396383948457153e-06, 'epoch': 0.25} 25%|██▍ | 3068/12313 [2:17:33<6:42:31, 2.61s/it] 25%|██▍ | 3069/12313 [2:17:35<6:36:32, 2.57s/it] {'loss': 0.8384, 'grad_norm': 3.9258521372891573, 'learning_rate': 4.395955369225299e-06, 'epoch': 0.25} 25%|██▍ | 3069/12313 [2:17:35<6:36:32, 2.57s/it] 25%|██▍ | 3070/12313 [2:17:38<6:30:37, 2.54s/it] {'loss': 0.5995, 'grad_norm': 5.296577196209221, 'learning_rate': 4.395526658803367e-06, 'epoch': 0.25} 25%|██▍ | 3070/12313 [2:17:38<6:30:37, 2.54s/it] 25%|██▍ | 3071/12313 [2:17:40<6:27:27, 2.52s/it] {'loss': 0.5141, 'grad_norm': 6.077098107367331, 'learning_rate': 4.395097817221023e-06, 'epoch': 0.25} 25%|██▍ | 3071/12313 [2:17:40<6:27:27, 2.52s/it] 25%|██▍ | 3072/12313 [2:17:43<6:23:49, 2.49s/it] {'loss': 0.5545, 'grad_norm': 9.074255129254627, 'learning_rate': 4.39466884450794e-06, 'epoch': 0.25} 25%|██▍ | 3072/12313 [2:17:43<6:23:49, 2.49s/it] 25%|██▍ | 3073/12313 [2:17:45<6:21:22, 2.48s/it] {'loss': 0.7089, 'grad_norm': 3.837360404731743, 'learning_rate': 4.3942397406937996e-06, 'epoch': 0.25} 25%|██▍ | 3073/12313 [2:17:45<6:21:22, 2.48s/it] 25%|██▍ | 3074/12313 [2:17:48<6:20:56, 2.47s/it] {'loss': 0.685, 'grad_norm': 24.0705694528878, 'learning_rate': 4.393810505808294e-06, 'epoch': 0.25} 25%|██▍ | 3074/12313 [2:17:48<6:20:56, 2.47s/it] 25%|██▍ | 3075/12313 [2:17:50<6:27:06, 2.51s/it] {'loss': 0.5362, 'grad_norm': 29.345212216374478, 'learning_rate': 4.393381139881125e-06, 'epoch': 0.25} 25%|██▍ | 3075/12313 [2:17:50<6:27:06, 2.51s/it] 25%|██▍ | 3076/12313 [2:17:53<6:35:04, 2.57s/it] {'loss': 0.5189, 'grad_norm': 2.6160968338209645, 'learning_rate': 4.392951642942001e-06, 'epoch': 0.25} 25%|██▍ | 3076/12313 [2:17:53<6:35:04, 2.57s/it] 25%|██▍ | 3077/12313 [2:17:55<6:38:37, 2.59s/it] {'loss': 0.526, 'grad_norm': 3.2671386577254586, 'learning_rate': 4.392522015020643e-06, 'epoch': 0.25} 25%|██▍ | 3077/12313 [2:17:55<6:38:37, 2.59s/it] 25%|██▍ | 3078/12313 [2:17:58<6:53:13, 2.68s/it] {'loss': 0.484, 'grad_norm': 3.4560985141832905, 'learning_rate': 4.392092256146776e-06, 'epoch': 0.25} 25%|██▍ | 3078/12313 [2:17:58<6:53:13, 2.68s/it] 25%|██▌ | 3079/12313 [2:18:01<6:46:12, 2.64s/it] {'loss': 0.482, 'grad_norm': 4.1989067824587405, 'learning_rate': 4.391662366350139e-06, 'epoch': 0.25} 25%|██▌ | 3079/12313 [2:18:01<6:46:12, 2.64s/it] 25%|██▌ | 3080/12313 [2:18:04<6:43:38, 2.62s/it] {'loss': 0.6224, 'grad_norm': 5.326290581465668, 'learning_rate': 4.3912323456604785e-06, 'epoch': 0.25} 25%|██▌ | 3080/12313 [2:18:04<6:43:38, 2.62s/it] 25%|██▌ | 3081/12313 [2:18:06<6:45:52, 2.64s/it] {'loss': 0.5254, 'grad_norm': 7.502364694548779, 'learning_rate': 4.390802194107548e-06, 'epoch': 0.25} 25%|██▌ | 3081/12313 [2:18:06<6:45:52, 2.64s/it] 25%|██▌ | 3082/12313 [2:18:09<6:55:40, 2.70s/it] {'loss': 0.6956, 'grad_norm': 4.007678084848024, 'learning_rate': 4.390371911721113e-06, 'epoch': 0.25} 25%|██▌ | 3082/12313 [2:18:09<6:55:40, 2.70s/it] 25%|██▌ | 3083/12313 [2:18:12<6:52:06, 2.68s/it] {'loss': 0.6204, 'grad_norm': 6.194239862786131, 'learning_rate': 4.389941498530946e-06, 'epoch': 0.25} 25%|██▌ | 3083/12313 [2:18:12<6:52:06, 2.68s/it] 25%|██▌ | 3084/12313 [2:18:14<6:46:07, 2.64s/it] {'loss': 0.6249, 'grad_norm': 5.28161752223026, 'learning_rate': 4.38951095456683e-06, 'epoch': 0.25} 25%|██▌ | 3084/12313 [2:18:14<6:46:07, 2.64s/it] 25%|██▌ | 3085/12313 [2:18:17<6:49:08, 2.66s/it] {'loss': 0.6299, 'grad_norm': 6.6004264867031415, 'learning_rate': 4.389080279858556e-06, 'epoch': 0.25} 25%|██▌ | 3085/12313 [2:18:17<6:49:08, 2.66s/it] 25%|██▌ | 3086/12313 [2:18:20<6:50:17, 2.67s/it] {'loss': 0.5395, 'grad_norm': 4.08243439857223, 'learning_rate': 4.388649474435925e-06, 'epoch': 0.25} 25%|██▌ | 3086/12313 [2:18:20<6:50:17, 2.67s/it] 25%|██▌ | 3087/12313 [2:18:22<6:47:02, 2.65s/it] {'loss': 0.4487, 'grad_norm': 10.991487088043705, 'learning_rate': 4.388218538328746e-06, 'epoch': 0.25} 25%|██▌ | 3087/12313 [2:18:22<6:47:02, 2.65s/it] 25%|██▌ | 3088/12313 [2:18:25<6:44:55, 2.63s/it] {'loss': 0.6908, 'grad_norm': 5.374906431529036, 'learning_rate': 4.387787471566837e-06, 'epoch': 0.25} 25%|██▌ | 3088/12313 [2:18:25<6:44:55, 2.63s/it] 25%|██▌ | 3089/12313 [2:18:27<6:36:16, 2.58s/it] {'loss': 0.559, 'grad_norm': 6.4091647653829344, 'learning_rate': 4.387356274180025e-06, 'epoch': 0.25} 25%|██▌ | 3089/12313 [2:18:27<6:36:16, 2.58s/it] 25%|██▌ | 3090/12313 [2:18:30<6:52:29, 2.68s/it] {'loss': 0.5294, 'grad_norm': 3.189931420557015, 'learning_rate': 4.386924946198148e-06, 'epoch': 0.25} 25%|██▌ | 3090/12313 [2:18:30<6:52:29, 2.68s/it] 25%|██▌ | 3091/12313 [2:18:33<6:45:06, 2.64s/it] {'loss': 0.7014, 'grad_norm': 3.881260853633071, 'learning_rate': 4.386493487651052e-06, 'epoch': 0.25} 25%|██▌ | 3091/12313 [2:18:33<6:45:06, 2.64s/it] 25%|██▌ | 3092/12313 [2:18:35<6:38:40, 2.59s/it] {'loss': 0.506, 'grad_norm': 9.67421505771052, 'learning_rate': 4.38606189856859e-06, 'epoch': 0.25} 25%|██▌ | 3092/12313 [2:18:35<6:38:40, 2.59s/it] 25%|██▌ | 3093/12313 [2:18:38<6:39:59, 2.60s/it] {'loss': 0.539, 'grad_norm': 12.331043817036711, 'learning_rate': 4.385630178980627e-06, 'epoch': 0.25} 25%|██▌ | 3093/12313 [2:18:38<6:39:59, 2.60s/it] 25%|██▌ | 3094/12313 [2:18:41<6:44:57, 2.64s/it] {'loss': 0.5523, 'grad_norm': 4.246731275050449, 'learning_rate': 4.385198328917034e-06, 'epoch': 0.25} 25%|██▌ | 3094/12313 [2:18:41<6:44:57, 2.64s/it] 25%|██▌ | 3095/12313 [2:18:43<6:49:46, 2.67s/it] {'loss': 0.5281, 'grad_norm': 5.769645319972767, 'learning_rate': 4.384766348407695e-06, 'epoch': 0.25} 25%|██▌ | 3095/12313 [2:18:43<6:49:46, 2.67s/it] 25%|██▌ | 3096/12313 [2:18:46<6:38:11, 2.59s/it] {'loss': 0.5954, 'grad_norm': 4.286081820286076, 'learning_rate': 4.3843342374825e-06, 'epoch': 0.25} 25%|██▌ | 3096/12313 [2:18:46<6:38:11, 2.59s/it] 25%|██▌ | 3097/12313 [2:18:48<6:38:06, 2.59s/it] {'loss': 0.809, 'grad_norm': 3.6620430541424187, 'learning_rate': 4.383901996171348e-06, 'epoch': 0.25} 25%|██▌ | 3097/12313 [2:18:48<6:38:06, 2.59s/it] 25%|██▌ | 3098/12313 [2:18:51<6:39:15, 2.60s/it] {'loss': 0.5542, 'grad_norm': 4.730554157695557, 'learning_rate': 4.383469624504148e-06, 'epoch': 0.25} 25%|██▌ | 3098/12313 [2:18:51<6:39:15, 2.60s/it] 25%|██▌ | 3099/12313 [2:18:53<6:31:12, 2.55s/it] {'loss': 0.4754, 'grad_norm': 5.354061606302798, 'learning_rate': 4.3830371225108185e-06, 'epoch': 0.25} 25%|██▌ | 3099/12313 [2:18:53<6:31:12, 2.55s/it] 25%|██▌ | 3100/12313 [2:18:56<6:44:54, 2.64s/it] {'loss': 0.4081, 'grad_norm': 7.122075601960895, 'learning_rate': 4.382604490221286e-06, 'epoch': 0.25} 25%|██▌ | 3100/12313 [2:18:56<6:44:54, 2.64s/it] 25%|██▌ | 3101/12313 [2:18:59<6:43:38, 2.63s/it] {'loss': 0.71, 'grad_norm': 4.487864560668316, 'learning_rate': 4.382171727665486e-06, 'epoch': 0.25} 25%|██▌ | 3101/12313 [2:18:59<6:43:38, 2.63s/it] 25%|██▌ | 3102/12313 [2:19:02<6:47:47, 2.66s/it] {'loss': 0.518, 'grad_norm': 5.423045386515985, 'learning_rate': 4.381738834873364e-06, 'epoch': 0.25} 25%|██▌ | 3102/12313 [2:19:02<6:47:47, 2.66s/it] 25%|██▌ | 3103/12313 [2:19:04<6:55:50, 2.71s/it] {'loss': 0.3472, 'grad_norm': 4.117696423281904, 'learning_rate': 4.381305811874873e-06, 'epoch': 0.25} 25%|██▌ | 3103/12313 [2:19:04<6:55:50, 2.71s/it] 25%|██▌ | 3104/12313 [2:19:07<7:06:01, 2.78s/it] {'loss': 0.6369, 'grad_norm': 4.590924951658724, 'learning_rate': 4.3808726586999766e-06, 'epoch': 0.25} 25%|██▌ | 3104/12313 [2:19:07<7:06:01, 2.78s/it] 25%|██▌ | 3105/12313 [2:19:10<7:03:58, 2.76s/it] {'loss': 0.5335, 'grad_norm': 6.69015366947571, 'learning_rate': 4.380439375378646e-06, 'epoch': 0.25} 25%|██▌ | 3105/12313 [2:19:10<7:03:58, 2.76s/it] 25%|██▌ | 3106/12313 [2:19:13<7:00:18, 2.74s/it] {'loss': 0.4577, 'grad_norm': 9.103381660915732, 'learning_rate': 4.380005961940864e-06, 'epoch': 0.25} 25%|██▌ | 3106/12313 [2:19:13<7:00:18, 2.74s/it] 25%|██▌ | 3107/12313 [2:19:15<6:55:56, 2.71s/it] {'loss': 0.5355, 'grad_norm': 7.434831084210403, 'learning_rate': 4.379572418416619e-06, 'epoch': 0.25} 25%|██▌ | 3107/12313 [2:19:15<6:55:56, 2.71s/it] 25%|██▌ | 3108/12313 [2:19:18<6:45:16, 2.64s/it] {'loss': 0.4481, 'grad_norm': 8.173889824189096, 'learning_rate': 4.37913874483591e-06, 'epoch': 0.25} 25%|██▌ | 3108/12313 [2:19:18<6:45:16, 2.64s/it] 25%|██▌ | 3109/12313 [2:19:20<6:41:21, 2.62s/it] {'loss': 0.5386, 'grad_norm': 10.499303258759314, 'learning_rate': 4.378704941228746e-06, 'epoch': 0.25} 25%|██▌ | 3109/12313 [2:19:20<6:41:21, 2.62s/it] 25%|██▌ | 3110/12313 [2:19:23<6:48:44, 2.66s/it] {'loss': 0.4977, 'grad_norm': 5.028078700952843, 'learning_rate': 4.378271007625141e-06, 'epoch': 0.25} 25%|██▌ | 3110/12313 [2:19:23<6:48:44, 2.66s/it] 25%|██▌ | 3111/12313 [2:19:26<6:51:05, 2.68s/it] {'loss': 0.5747, 'grad_norm': 4.3897095234161245, 'learning_rate': 4.377836944055124e-06, 'epoch': 0.25} 25%|██▌ | 3111/12313 [2:19:26<6:51:05, 2.68s/it] 25%|██▌ | 3112/12313 [2:19:28<6:39:27, 2.60s/it] {'loss': 0.6102, 'grad_norm': 5.593416675150363, 'learning_rate': 4.377402750548729e-06, 'epoch': 0.25} 25%|██▌ | 3112/12313 [2:19:28<6:39:27, 2.60s/it] 25%|██▌ | 3113/12313 [2:19:31<6:51:13, 2.68s/it] {'loss': 0.6942, 'grad_norm': 5.376961891531233, 'learning_rate': 4.376968427135999e-06, 'epoch': 0.25} 25%|██▌ | 3113/12313 [2:19:31<6:51:13, 2.68s/it] 25%|██▌ | 3114/12313 [2:19:34<6:51:16, 2.68s/it] {'loss': 0.7788, 'grad_norm': 7.126459999216967, 'learning_rate': 4.376533973846988e-06, 'epoch': 0.25} 25%|██▌ | 3114/12313 [2:19:34<6:51:16, 2.68s/it] 25%|██▌ | 3115/12313 [2:19:36<6:49:21, 2.67s/it] {'loss': 0.4626, 'grad_norm': 4.854425202352291, 'learning_rate': 4.376099390711758e-06, 'epoch': 0.25} 25%|██▌ | 3115/12313 [2:19:36<6:49:21, 2.67s/it] 25%|██▌ | 3116/12313 [2:19:39<6:41:25, 2.62s/it] {'loss': 0.6323, 'grad_norm': 8.756831124179643, 'learning_rate': 4.375664677760378e-06, 'epoch': 0.25} 25%|██▌ | 3116/12313 [2:19:39<6:41:25, 2.62s/it] 25%|██▌ | 3117/12313 [2:19:42<6:43:04, 2.63s/it] {'loss': 0.6067, 'grad_norm': 4.267269088423493, 'learning_rate': 4.375229835022929e-06, 'epoch': 0.25} 25%|██▌ | 3117/12313 [2:19:42<6:43:04, 2.63s/it] 25%|██▌ | 3118/12313 [2:19:44<6:47:22, 2.66s/it] {'loss': 0.5796, 'grad_norm': 5.198815978500407, 'learning_rate': 4.374794862529501e-06, 'epoch': 0.25} 25%|██▌ | 3118/12313 [2:19:44<6:47:22, 2.66s/it] 25%|██▌ | 3119/12313 [2:19:47<6:53:13, 2.70s/it] {'loss': 0.4924, 'grad_norm': 4.95003494031335, 'learning_rate': 4.374359760310191e-06, 'epoch': 0.25} 25%|██▌ | 3119/12313 [2:19:47<6:53:13, 2.70s/it] 25%|██▌ | 3120/12313 [2:19:50<6:52:55, 2.70s/it] {'loss': 0.6536, 'grad_norm': 4.604098155034568, 'learning_rate': 4.373924528395105e-06, 'epoch': 0.25} 25%|██▌ | 3120/12313 [2:19:50<6:52:55, 2.70s/it] 25%|██▌ | 3121/12313 [2:19:52<6:45:37, 2.65s/it] {'loss': 0.6378, 'grad_norm': 5.033107338044671, 'learning_rate': 4.373489166814358e-06, 'epoch': 0.25} 25%|██▌ | 3121/12313 [2:19:52<6:45:37, 2.65s/it] 25%|██▌ | 3122/12313 [2:19:55<6:46:15, 2.65s/it] {'loss': 0.4952, 'grad_norm': 5.806258612021515, 'learning_rate': 4.3730536755980776e-06, 'epoch': 0.25} 25%|██▌ | 3122/12313 [2:19:55<6:46:15, 2.65s/it] 25%|██▌ | 3123/12313 [2:19:58<6:43:33, 2.63s/it] {'loss': 0.7578, 'grad_norm': 4.217899119753335, 'learning_rate': 4.372618054776395e-06, 'epoch': 0.25} 25%|██▌ | 3123/12313 [2:19:58<6:43:33, 2.63s/it] 25%|██▌ | 3124/12313 [2:20:00<6:38:13, 2.60s/it] {'loss': 0.5417, 'grad_norm': 4.5310110282850085, 'learning_rate': 4.372182304379455e-06, 'epoch': 0.25} 25%|██▌ | 3124/12313 [2:20:00<6:38:13, 2.60s/it] 25%|██▌ | 3125/12313 [2:20:03<6:41:59, 2.63s/it] {'loss': 0.5578, 'grad_norm': 5.792695550840339, 'learning_rate': 4.371746424437406e-06, 'epoch': 0.25} 25%|██▌ | 3125/12313 [2:20:03<6:41:59, 2.63s/it] 25%|██▌ | 3126/12313 [2:20:06<6:44:07, 2.64s/it] {'loss': 0.5975, 'grad_norm': 9.674533837851303, 'learning_rate': 4.371310414980412e-06, 'epoch': 0.25} 25%|██▌ | 3126/12313 [2:20:06<6:44:07, 2.64s/it] 25%|██▌ | 3127/12313 [2:20:08<6:43:33, 2.64s/it] {'loss': 0.5482, 'grad_norm': 4.234861474964158, 'learning_rate': 4.37087427603864e-06, 'epoch': 0.25} 25%|██▌ | 3127/12313 [2:20:08<6:43:33, 2.64s/it] 25%|██▌ | 3128/12313 [2:20:11<6:41:31, 2.62s/it] {'loss': 0.5447, 'grad_norm': 5.754506563667726, 'learning_rate': 4.37043800764227e-06, 'epoch': 0.25} 25%|██▌ | 3128/12313 [2:20:11<6:41:31, 2.62s/it] 25%|██▌ | 3129/12313 [2:20:13<6:44:13, 2.64s/it] {'loss': 0.7462, 'grad_norm': 10.678832172560531, 'learning_rate': 4.37000160982149e-06, 'epoch': 0.25} 25%|██▌ | 3129/12313 [2:20:13<6:44:13, 2.64s/it] 25%|██▌ | 3130/12313 [2:20:16<6:48:06, 2.67s/it] {'loss': 0.7041, 'grad_norm': 4.948964860269852, 'learning_rate': 4.369565082606495e-06, 'epoch': 0.25} 25%|██▌ | 3130/12313 [2:20:16<6:48:06, 2.67s/it] 25%|██▌ | 3131/12313 [2:20:19<6:35:31, 2.58s/it] {'loss': 0.6471, 'grad_norm': 3.6881446493621657, 'learning_rate': 4.369128426027489e-06, 'epoch': 0.25} 25%|██▌ | 3131/12313 [2:20:19<6:35:31, 2.58s/it] 25%|██▌ | 3132/12313 [2:20:21<6:37:42, 2.60s/it] {'loss': 0.6399, 'grad_norm': 4.0215825826282625, 'learning_rate': 4.36869164011469e-06, 'epoch': 0.25} 25%|██▌ | 3132/12313 [2:20:21<6:37:42, 2.60s/it] 25%|██▌ | 3133/12313 [2:20:24<6:42:56, 2.63s/it] {'loss': 0.7778, 'grad_norm': 5.583956863498479, 'learning_rate': 4.368254724898319e-06, 'epoch': 0.25} 25%|██▌ | 3133/12313 [2:20:24<6:42:56, 2.63s/it] 25%|██▌ | 3134/12313 [2:20:27<6:54:38, 2.71s/it] {'loss': 0.5685, 'grad_norm': 4.313157219895635, 'learning_rate': 4.367817680408609e-06, 'epoch': 0.25} 25%|██▌ | 3134/12313 [2:20:27<6:54:38, 2.71s/it] 25%|██▌ | 3135/12313 [2:20:30<6:56:00, 2.72s/it] {'loss': 0.6546, 'grad_norm': 5.121719815553165, 'learning_rate': 4.3673805066758e-06, 'epoch': 0.25} 25%|██▌ | 3135/12313 [2:20:30<6:56:00, 2.72s/it] 25%|██▌ | 3136/12313 [2:20:32<6:47:24, 2.66s/it] {'loss': 0.5776, 'grad_norm': 6.598529582533699, 'learning_rate': 4.366943203730144e-06, 'epoch': 0.25} 25%|██▌ | 3136/12313 [2:20:32<6:47:24, 2.66s/it] 25%|██▌ | 3137/12313 [2:20:35<6:42:51, 2.63s/it] {'loss': 0.655, 'grad_norm': 8.179382375263176, 'learning_rate': 4.366505771601898e-06, 'epoch': 0.25} 25%|██▌ | 3137/12313 [2:20:35<6:42:51, 2.63s/it] 25%|██▌ | 3138/12313 [2:20:37<6:38:08, 2.60s/it] {'loss': 0.5393, 'grad_norm': 3.5740269182727085, 'learning_rate': 4.366068210321331e-06, 'epoch': 0.25} 25%|██▌ | 3138/12313 [2:20:37<6:38:08, 2.60s/it] 25%|██▌ | 3139/12313 [2:20:40<6:56:45, 2.73s/it] {'loss': 0.5082, 'grad_norm': 9.451730618946357, 'learning_rate': 4.3656305199187195e-06, 'epoch': 0.25} 25%|██▌ | 3139/12313 [2:20:40<6:56:45, 2.73s/it] 26%|██▌ | 3140/12313 [2:20:43<6:53:23, 2.70s/it] {'loss': 0.6664, 'grad_norm': 6.028800369548608, 'learning_rate': 4.365192700424351e-06, 'epoch': 0.26} 26%|██▌ | 3140/12313 [2:20:43<6:53:23, 2.70s/it] 26%|██▌ | 3141/12313 [2:20:46<6:53:16, 2.70s/it] {'loss': 0.5164, 'grad_norm': 5.858773502768749, 'learning_rate': 4.364754751868519e-06, 'epoch': 0.26} 26%|██▌ | 3141/12313 [2:20:46<6:53:16, 2.70s/it] 26%|██▌ | 3142/12313 [2:20:48<6:54:40, 2.71s/it] {'loss': 0.544, 'grad_norm': 7.052601898082844, 'learning_rate': 4.364316674281526e-06, 'epoch': 0.26} 26%|██▌ | 3142/12313 [2:20:48<6:54:40, 2.71s/it] 26%|██▌ | 3143/12313 [2:20:51<6:45:51, 2.66s/it] {'loss': 0.5765, 'grad_norm': 5.076254392341973, 'learning_rate': 4.363878467693686e-06, 'epoch': 0.26} 26%|██▌ | 3143/12313 [2:20:51<6:45:51, 2.66s/it] 26%|██▌ | 3144/12313 [2:20:54<6:50:19, 2.69s/it] {'loss': 0.6367, 'grad_norm': 4.671237955123445, 'learning_rate': 4.363440132135322e-06, 'epoch': 0.26} 26%|██▌ | 3144/12313 [2:20:54<6:50:19, 2.69s/it] 26%|██▌ | 3145/12313 [2:20:56<6:40:37, 2.62s/it] {'loss': 0.725, 'grad_norm': 4.135246305916654, 'learning_rate': 4.363001667636762e-06, 'epoch': 0.26} 26%|██▌ | 3145/12313 [2:20:56<6:40:37, 2.62s/it] 26%|██▌ | 3146/12313 [2:20:59<6:39:40, 2.62s/it] {'loss': 0.5253, 'grad_norm': 5.579014444017926, 'learning_rate': 4.362563074228346e-06, 'epoch': 0.26} 26%|██▌ | 3146/12313 [2:20:59<6:39:40, 2.62s/it] 26%|██▌ | 3147/12313 [2:21:01<6:49:55, 2.68s/it] {'loss': 0.5849, 'grad_norm': 4.386851280267495, 'learning_rate': 4.3621243519404235e-06, 'epoch': 0.26} 26%|██▌ | 3147/12313 [2:21:01<6:49:55, 2.68s/it] 26%|██▌ | 3148/12313 [2:21:04<6:51:06, 2.69s/it] {'loss': 0.5252, 'grad_norm': 4.068805050530992, 'learning_rate': 4.36168550080335e-06, 'epoch': 0.26} 26%|██▌ | 3148/12313 [2:21:04<6:51:06, 2.69s/it] 26%|██▌ | 3149/12313 [2:21:07<6:52:13, 2.70s/it] {'loss': 0.4509, 'grad_norm': 5.079228239353849, 'learning_rate': 4.361246520847493e-06, 'epoch': 0.26} 26%|██▌ | 3149/12313 [2:21:07<6:52:13, 2.70s/it] 26%|██▌ | 3150/12313 [2:21:10<6:55:19, 2.72s/it] {'loss': 0.5594, 'grad_norm': 4.417633075136854, 'learning_rate': 4.360807412103228e-06, 'epoch': 0.26} 26%|██▌ | 3150/12313 [2:21:10<6:55:19, 2.72s/it] 26%|██▌ | 3151/12313 [2:21:13<7:02:53, 2.77s/it] {'loss': 0.5581, 'grad_norm': 5.457155176446224, 'learning_rate': 4.3603681746009374e-06, 'epoch': 0.26} 26%|██▌ | 3151/12313 [2:21:13<7:02:53, 2.77s/it] 26%|██▌ | 3152/12313 [2:21:15<6:58:56, 2.74s/it] {'loss': 0.5777, 'grad_norm': 4.311137196331857, 'learning_rate': 4.3599288083710155e-06, 'epoch': 0.26} 26%|██▌ | 3152/12313 [2:21:15<6:58:56, 2.74s/it] 26%|██▌ | 3153/12313 [2:21:18<6:55:24, 2.72s/it] {'loss': 0.6154, 'grad_norm': 5.779855459554121, 'learning_rate': 4.359489313443864e-06, 'epoch': 0.26} 26%|██▌ | 3153/12313 [2:21:18<6:55:24, 2.72s/it] 26%|██▌ | 3154/12313 [2:21:20<6:44:54, 2.65s/it] {'loss': 0.5689, 'grad_norm': 5.203623644425137, 'learning_rate': 4.359049689849893e-06, 'epoch': 0.26} 26%|██▌ | 3154/12313 [2:21:20<6:44:54, 2.65s/it] 26%|██▌ | 3155/12313 [2:21:23<6:38:36, 2.61s/it] {'loss': 0.6593, 'grad_norm': 6.2014101236501995, 'learning_rate': 4.358609937619522e-06, 'epoch': 0.26} 26%|██▌ | 3155/12313 [2:21:23<6:38:36, 2.61s/it] 26%|██▌ | 3156/12313 [2:21:26<6:55:04, 2.72s/it] {'loss': 0.425, 'grad_norm': 6.085891873335906, 'learning_rate': 4.358170056783179e-06, 'epoch': 0.26} 26%|██▌ | 3156/12313 [2:21:26<6:55:04, 2.72s/it] 26%|██▌ | 3157/12313 [2:21:29<7:10:55, 2.82s/it] {'loss': 0.538, 'grad_norm': 3.58366241946543, 'learning_rate': 4.357730047371304e-06, 'epoch': 0.26} 26%|██▌ | 3157/12313 [2:21:29<7:10:55, 2.82s/it] 26%|██▌ | 3158/12313 [2:21:32<7:01:51, 2.76s/it] {'loss': 0.5894, 'grad_norm': 5.44533166154409, 'learning_rate': 4.357289909414341e-06, 'epoch': 0.26} 26%|██▌ | 3158/12313 [2:21:32<7:01:51, 2.76s/it] 26%|██▌ | 3159/12313 [2:21:34<6:59:55, 2.75s/it] {'loss': 0.5385, 'grad_norm': 3.8173019501608625, 'learning_rate': 4.356849642942746e-06, 'epoch': 0.26} 26%|██▌ | 3159/12313 [2:21:34<6:59:55, 2.75s/it] 26%|██▌ | 3160/12313 [2:21:37<6:46:36, 2.67s/it] {'loss': 0.5628, 'grad_norm': 6.388033240415221, 'learning_rate': 4.356409247986982e-06, 'epoch': 0.26} 26%|██▌ | 3160/12313 [2:21:37<6:46:36, 2.67s/it] 26%|██▌ | 3161/12313 [2:21:40<6:55:22, 2.72s/it] {'loss': 0.5962, 'grad_norm': 10.182238415534854, 'learning_rate': 4.355968724577523e-06, 'epoch': 0.26} 26%|██▌ | 3161/12313 [2:21:40<6:55:22, 2.72s/it] 26%|██▌ | 3162/12313 [2:21:43<7:08:30, 2.81s/it] {'loss': 0.6925, 'grad_norm': 3.7342682561919998, 'learning_rate': 4.355528072744851e-06, 'epoch': 0.26} 26%|██▌ | 3162/12313 [2:21:43<7:08:30, 2.81s/it] 26%|██▌ | 3163/12313 [2:21:45<7:01:31, 2.76s/it] {'loss': 0.7971, 'grad_norm': 3.771086888643871, 'learning_rate': 4.355087292519458e-06, 'epoch': 0.26} 26%|██▌ | 3163/12313 [2:21:45<7:01:31, 2.76s/it] 26%|██▌ | 3164/12313 [2:21:48<6:56:47, 2.73s/it] {'loss': 0.6958, 'grad_norm': 4.185200079768148, 'learning_rate': 4.354646383931841e-06, 'epoch': 0.26} 26%|██▌ | 3164/12313 [2:21:48<6:56:47, 2.73s/it] 26%|██▌ | 3165/12313 [2:21:51<6:51:37, 2.70s/it] {'loss': 0.5607, 'grad_norm': 4.773930727509564, 'learning_rate': 4.3542053470125104e-06, 'epoch': 0.26} 26%|██▌ | 3165/12313 [2:21:51<6:51:37, 2.70s/it] 26%|██▌ | 3166/12313 [2:21:53<6:54:47, 2.72s/it] {'loss': 0.5175, 'grad_norm': 4.658976115697843, 'learning_rate': 4.353764181791983e-06, 'epoch': 0.26} 26%|██▌ | 3166/12313 [2:21:53<6:54:47, 2.72s/it] 26%|██▌ | 3167/12313 [2:21:56<6:49:19, 2.69s/it] {'loss': 0.4758, 'grad_norm': 6.33815354108174, 'learning_rate': 4.353322888300785e-06, 'epoch': 0.26} 26%|██▌ | 3167/12313 [2:21:56<6:49:19, 2.69s/it] 26%|██▌ | 3168/12313 [2:21:59<6:47:09, 2.67s/it] {'loss': 0.6159, 'grad_norm': 3.6435625688001423, 'learning_rate': 4.3528814665694515e-06, 'epoch': 0.26} 26%|██▌ | 3168/12313 [2:21:59<6:47:09, 2.67s/it] 26%|██▌ | 3169/12313 [2:22:01<6:44:42, 2.66s/it] {'loss': 0.6253, 'grad_norm': 3.901416738242754, 'learning_rate': 4.352439916628527e-06, 'epoch': 0.26} 26%|██▌ | 3169/12313 [2:22:01<6:44:42, 2.66s/it] 26%|██▌ | 3170/12313 [2:22:04<6:39:59, 2.62s/it] {'loss': 0.4725, 'grad_norm': 5.97609923012503, 'learning_rate': 4.351998238508563e-06, 'epoch': 0.26} 26%|██▌ | 3170/12313 [2:22:04<6:39:59, 2.62s/it] 26%|██▌ | 3171/12313 [2:22:06<6:42:28, 2.64s/it] {'loss': 0.4923, 'grad_norm': 3.836440857096687, 'learning_rate': 4.351556432240124e-06, 'epoch': 0.26} 26%|██▌ | 3171/12313 [2:22:06<6:42:28, 2.64s/it] 26%|██▌ | 3172/12313 [2:22:09<6:38:48, 2.62s/it] {'loss': 0.6401, 'grad_norm': 6.0800255854901595, 'learning_rate': 4.351114497853779e-06, 'epoch': 0.26} 26%|██▌ | 3172/12313 [2:22:09<6:38:48, 2.62s/it] 26%|██▌ | 3173/12313 [2:22:12<6:35:07, 2.59s/it] {'loss': 0.7105, 'grad_norm': 5.622546507114739, 'learning_rate': 4.350672435380107e-06, 'epoch': 0.26} 26%|██▌ | 3173/12313 [2:22:12<6:35:07, 2.59s/it] 26%|██▌ | 3174/12313 [2:22:14<6:52:18, 2.71s/it] {'loss': 0.6403, 'grad_norm': 5.822121836191669, 'learning_rate': 4.350230244849697e-06, 'epoch': 0.26} 26%|██▌ | 3174/12313 [2:22:14<6:52:18, 2.71s/it] 26%|██▌ | 3175/12313 [2:22:17<6:42:37, 2.64s/it] {'loss': 0.6267, 'grad_norm': 3.5905578627539123, 'learning_rate': 4.349787926293146e-06, 'epoch': 0.26} 26%|██▌ | 3175/12313 [2:22:17<6:42:37, 2.64s/it] 26%|██▌ | 3176/12313 [2:22:20<6:51:52, 2.70s/it] {'loss': 0.7763, 'grad_norm': 3.4379879993140126, 'learning_rate': 4.349345479741062e-06, 'epoch': 0.26} 26%|██▌ | 3176/12313 [2:22:20<6:51:52, 2.70s/it] 26%|██▌ | 3177/12313 [2:22:23<6:57:03, 2.74s/it] {'loss': 0.5434, 'grad_norm': 5.296351841281579, 'learning_rate': 4.348902905224057e-06, 'epoch': 0.26} 26%|██▌ | 3177/12313 [2:22:23<6:57:03, 2.74s/it] 26%|██▌ | 3178/12313 [2:22:25<6:43:10, 2.65s/it] {'loss': 0.6596, 'grad_norm': 4.502659862665315, 'learning_rate': 4.348460202772756e-06, 'epoch': 0.26} 26%|██▌ | 3178/12313 [2:22:25<6:43:10, 2.65s/it] 26%|██▌ | 3179/12313 [2:22:28<6:58:50, 2.75s/it] {'loss': 0.4524, 'grad_norm': 5.66077638544368, 'learning_rate': 4.348017372417792e-06, 'epoch': 0.26} 26%|██▌ | 3179/12313 [2:22:28<6:58:50, 2.75s/it] 26%|██▌ | 3180/12313 [2:22:31<7:03:39, 2.78s/it] {'loss': 0.539, 'grad_norm': 6.837754388761909, 'learning_rate': 4.347574414189807e-06, 'epoch': 0.26} 26%|██▌ | 3180/12313 [2:22:31<7:03:39, 2.78s/it] 26%|██▌ | 3181/12313 [2:22:34<7:27:19, 2.94s/it] {'loss': 0.4425, 'grad_norm': 4.065776898335388, 'learning_rate': 4.347131328119451e-06, 'epoch': 0.26} 26%|██▌ | 3181/12313 [2:22:34<7:27:19, 2.94s/it] 26%|██▌ | 3182/12313 [2:22:37<7:17:43, 2.88s/it] {'loss': 0.5503, 'grad_norm': 6.317177110076131, 'learning_rate': 4.346688114237381e-06, 'epoch': 0.26} 26%|██▌ | 3182/12313 [2:22:37<7:17:43, 2.88s/it] 26%|██▌ | 3183/12313 [2:22:39<6:56:52, 2.74s/it] {'loss': 0.565, 'grad_norm': 3.9597286603343207, 'learning_rate': 4.346244772574268e-06, 'epoch': 0.26} 26%|██▌ | 3183/12313 [2:22:39<6:56:52, 2.74s/it] 26%|██▌ | 3184/12313 [2:22:42<6:50:05, 2.70s/it] {'loss': 0.6364, 'grad_norm': 4.617361547979039, 'learning_rate': 4.345801303160788e-06, 'epoch': 0.26} 26%|██▌ | 3184/12313 [2:22:42<6:50:05, 2.70s/it] 26%|██▌ | 3185/12313 [2:22:45<6:55:44, 2.73s/it] {'loss': 0.7814, 'grad_norm': 3.869309442994426, 'learning_rate': 4.3453577060276264e-06, 'epoch': 0.26} 26%|██▌ | 3185/12313 [2:22:45<6:55:44, 2.73s/it] 26%|██▌ | 3186/12313 [2:22:47<6:53:16, 2.72s/it] {'loss': 0.6746, 'grad_norm': 5.268139660706938, 'learning_rate': 4.344913981205479e-06, 'epoch': 0.26} 26%|██▌ | 3186/12313 [2:22:47<6:53:16, 2.72s/it] 26%|██▌ | 3187/12313 [2:22:50<6:51:55, 2.71s/it] {'loss': 0.5914, 'grad_norm': 8.40326725980448, 'learning_rate': 4.344470128725047e-06, 'epoch': 0.26} 26%|██▌ | 3187/12313 [2:22:50<6:51:55, 2.71s/it] 26%|██▌ | 3188/12313 [2:22:53<6:42:17, 2.65s/it] {'loss': 0.5718, 'grad_norm': 6.871592559950185, 'learning_rate': 4.344026148617043e-06, 'epoch': 0.26} 26%|██▌ | 3188/12313 [2:22:53<6:42:17, 2.65s/it] 26%|██▌ | 3189/12313 [2:22:55<6:32:46, 2.58s/it] {'loss': 0.5049, 'grad_norm': 5.20815182160167, 'learning_rate': 4.343582040912191e-06, 'epoch': 0.26} 26%|██▌ | 3189/12313 [2:22:55<6:32:46, 2.58s/it] 26%|██▌ | 3190/12313 [2:22:58<6:44:13, 2.66s/it] {'loss': 0.6226, 'grad_norm': 5.335316024532678, 'learning_rate': 4.343137805641217e-06, 'epoch': 0.26} 26%|██▌ | 3190/12313 [2:22:58<6:44:13, 2.66s/it] 26%|██▌ | 3191/12313 [2:23:01<6:40:42, 2.64s/it] {'loss': 0.4488, 'grad_norm': 3.8793130716336335, 'learning_rate': 4.3426934428348624e-06, 'epoch': 0.26} 26%|██▌ | 3191/12313 [2:23:01<6:40:42, 2.64s/it] 26%|██▌ | 3192/12313 [2:23:03<6:37:45, 2.62s/it] {'loss': 0.4591, 'grad_norm': 5.127013360030541, 'learning_rate': 4.342248952523874e-06, 'epoch': 0.26} 26%|██▌ | 3192/12313 [2:23:03<6:37:45, 2.62s/it] 26%|██▌ | 3193/12313 [2:23:06<6:37:03, 2.61s/it] {'loss': 0.4925, 'grad_norm': 4.532500566508718, 'learning_rate': 4.341804334739008e-06, 'epoch': 0.26} 26%|██▌ | 3193/12313 [2:23:06<6:37:03, 2.61s/it] 26%|██▌ | 3194/12313 [2:23:08<6:44:50, 2.66s/it] {'loss': 0.5041, 'grad_norm': 5.195487587147855, 'learning_rate': 4.34135958951103e-06, 'epoch': 0.26} 26%|██▌ | 3194/12313 [2:23:08<6:44:50, 2.66s/it] 26%|██▌ | 3195/12313 [2:23:11<6:38:00, 2.62s/it] {'loss': 0.5198, 'grad_norm': 3.983184952506586, 'learning_rate': 4.3409147168707124e-06, 'epoch': 0.26} 26%|██▌ | 3195/12313 [2:23:11<6:38:00, 2.62s/it] 26%|██▌ | 3196/12313 [2:23:13<6:28:32, 2.56s/it] {'loss': 0.4985, 'grad_norm': 4.96090598003835, 'learning_rate': 4.34046971684884e-06, 'epoch': 0.26} 26%|██▌ | 3196/12313 [2:23:13<6:28:32, 2.56s/it] 26%|██▌ | 3197/12313 [2:23:16<6:45:25, 2.67s/it] {'loss': 0.634, 'grad_norm': 4.583631169515384, 'learning_rate': 4.340024589476204e-06, 'epoch': 0.26} 26%|██▌ | 3197/12313 [2:23:16<6:45:25, 2.67s/it] 26%|██▌ | 3198/12313 [2:23:19<6:56:13, 2.74s/it] {'loss': 0.4752, 'grad_norm': 5.545926672500218, 'learning_rate': 4.3395793347836034e-06, 'epoch': 0.26} 26%|██▌ | 3198/12313 [2:23:19<6:56:13, 2.74s/it] 26%|██▌ | 3199/12313 [2:23:22<7:01:59, 2.78s/it] {'loss': 0.7308, 'grad_norm': 6.711460341815773, 'learning_rate': 4.33913395280185e-06, 'epoch': 0.26} 26%|██▌ | 3199/12313 [2:23:22<7:01:59, 2.78s/it] 26%|██▌ | 3200/12313 [2:23:25<7:13:01, 2.85s/it] {'loss': 0.5049, 'grad_norm': 3.7919381803697276, 'learning_rate': 4.33868844356176e-06, 'epoch': 0.26} 26%|██▌ | 3200/12313 [2:23:25<7:13:01, 2.85s/it] 26%|██▌ | 3201/12313 [2:23:28<6:59:34, 2.76s/it] {'loss': 0.6063, 'grad_norm': 6.205008626331162, 'learning_rate': 4.338242807094161e-06, 'epoch': 0.26} 26%|██▌ | 3201/12313 [2:23:28<6:59:34, 2.76s/it] 26%|██▌ | 3202/12313 [2:23:30<6:49:12, 2.69s/it] {'loss': 0.6166, 'grad_norm': 4.694118634605663, 'learning_rate': 4.3377970434298885e-06, 'epoch': 0.26} 26%|██▌ | 3202/12313 [2:23:30<6:49:12, 2.69s/it] 26%|██▌ | 3203/12313 [2:23:33<6:54:54, 2.73s/it] {'loss': 0.5616, 'grad_norm': 5.435770883892459, 'learning_rate': 4.337351152599787e-06, 'epoch': 0.26} 26%|██▌ | 3203/12313 [2:23:33<6:54:54, 2.73s/it] 26%|██▌ | 3204/12313 [2:23:36<7:09:34, 2.83s/it] {'loss': 0.5997, 'grad_norm': 3.8614175476698485, 'learning_rate': 4.33690513463471e-06, 'epoch': 0.26} 26%|██▌ | 3204/12313 [2:23:36<7:09:34, 2.83s/it] 26%|██▌ | 3205/12313 [2:23:39<6:55:32, 2.74s/it] {'loss': 0.5467, 'grad_norm': 5.804567048793658, 'learning_rate': 4.336458989565519e-06, 'epoch': 0.26} 26%|██▌ | 3205/12313 [2:23:39<6:55:32, 2.74s/it] 26%|██▌ | 3206/12313 [2:23:41<6:42:52, 2.65s/it] {'loss': 0.4595, 'grad_norm': 6.725056494128386, 'learning_rate': 4.336012717423085e-06, 'epoch': 0.26} 26%|██▌ | 3206/12313 [2:23:41<6:42:52, 2.65s/it] 26%|██▌ | 3207/12313 [2:23:44<6:46:07, 2.68s/it] {'loss': 0.5616, 'grad_norm': 5.3233685676327065, 'learning_rate': 4.335566318238289e-06, 'epoch': 0.26} 26%|██▌ | 3207/12313 [2:23:44<6:46:07, 2.68s/it] 26%|██▌ | 3208/12313 [2:23:47<6:51:48, 2.71s/it] {'loss': 0.5414, 'grad_norm': 3.4875685060457218, 'learning_rate': 4.335119792042017e-06, 'epoch': 0.26} 26%|██▌ | 3208/12313 [2:23:47<6:51:48, 2.71s/it] 26%|██▌ | 3209/12313 [2:23:49<6:50:01, 2.70s/it] {'loss': 0.5734, 'grad_norm': 4.305522238103512, 'learning_rate': 4.334673138865169e-06, 'epoch': 0.26} 26%|██▌ | 3209/12313 [2:23:49<6:50:01, 2.70s/it] 26%|██▌ | 3210/12313 [2:23:52<6:48:56, 2.70s/it] {'loss': 0.462, 'grad_norm': 6.18157123882514, 'learning_rate': 4.334226358738649e-06, 'epoch': 0.26} 26%|██▌ | 3210/12313 [2:23:52<6:48:56, 2.70s/it] 26%|██▌ | 3211/12313 [2:23:55<6:54:20, 2.73s/it] {'loss': 0.6092, 'grad_norm': 14.713830091090161, 'learning_rate': 4.333779451693373e-06, 'epoch': 0.26} 26%|██▌ | 3211/12313 [2:23:55<6:54:20, 2.73s/it] 26%|██▌ | 3212/12313 [2:23:57<6:51:58, 2.72s/it] {'loss': 0.4972, 'grad_norm': 11.006487008864882, 'learning_rate': 4.333332417760263e-06, 'epoch': 0.26} 26%|██▌ | 3212/12313 [2:23:57<6:51:58, 2.72s/it] 26%|██▌ | 3213/12313 [2:24:00<6:54:15, 2.73s/it] {'loss': 0.5909, 'grad_norm': 4.400134883259066, 'learning_rate': 4.332885256970253e-06, 'epoch': 0.26} 26%|██▌ | 3213/12313 [2:24:00<6:54:15, 2.73s/it] 26%|██▌ | 3214/12313 [2:24:03<6:45:20, 2.67s/it] {'loss': 0.4379, 'grad_norm': 5.443017416295305, 'learning_rate': 4.332437969354284e-06, 'epoch': 0.26} 26%|██▌ | 3214/12313 [2:24:03<6:45:20, 2.67s/it] 26%|██▌ | 3215/12313 [2:24:05<6:31:28, 2.58s/it] {'loss': 0.6238, 'grad_norm': 4.4738174242506235, 'learning_rate': 4.331990554943305e-06, 'epoch': 0.26} 26%|██▌ | 3215/12313 [2:24:05<6:31:28, 2.58s/it] 26%|██▌ | 3216/12313 [2:24:08<6:31:26, 2.58s/it] {'loss': 0.3969, 'grad_norm': 5.0416535662280655, 'learning_rate': 4.331543013768276e-06, 'epoch': 0.26} 26%|██▌ | 3216/12313 [2:24:08<6:31:26, 2.58s/it] 26%|██▌ | 3217/12313 [2:24:10<6:37:38, 2.62s/it] {'loss': 0.4619, 'grad_norm': 4.237641745664451, 'learning_rate': 4.331095345860162e-06, 'epoch': 0.26} 26%|██▌ | 3217/12313 [2:24:10<6:37:38, 2.62s/it] 26%|██▌ | 3218/12313 [2:24:13<6:40:50, 2.64s/it] {'loss': 0.5224, 'grad_norm': 5.09656551194579, 'learning_rate': 4.330647551249942e-06, 'epoch': 0.26} 26%|██▌ | 3218/12313 [2:24:13<6:40:50, 2.64s/it] 26%|██▌ | 3219/12313 [2:24:16<6:40:20, 2.64s/it] {'loss': 0.7045, 'grad_norm': 4.113097985628701, 'learning_rate': 4.330199629968601e-06, 'epoch': 0.26} 26%|██▌ | 3219/12313 [2:24:16<6:40:20, 2.64s/it] 26%|██▌ | 3220/12313 [2:24:18<6:39:47, 2.64s/it] {'loss': 0.4772, 'grad_norm': 4.391696991616136, 'learning_rate': 4.329751582047132e-06, 'epoch': 0.26} 26%|██▌ | 3220/12313 [2:24:18<6:39:47, 2.64s/it] 26%|██▌ | 3221/12313 [2:24:21<6:39:48, 2.64s/it] {'loss': 0.7468, 'grad_norm': 6.284634192395287, 'learning_rate': 4.3293034075165355e-06, 'epoch': 0.26} 26%|██▌ | 3221/12313 [2:24:21<6:39:48, 2.64s/it] 26%|██▌ | 3222/12313 [2:24:24<6:38:23, 2.63s/it] {'loss': 0.5164, 'grad_norm': 5.31239683535975, 'learning_rate': 4.328855106407826e-06, 'epoch': 0.26} 26%|██▌ | 3222/12313 [2:24:24<6:38:23, 2.63s/it] 26%|██▌ | 3223/12313 [2:24:26<6:30:41, 2.58s/it] {'loss': 0.6003, 'grad_norm': 4.2612158979297945, 'learning_rate': 4.328406678752022e-06, 'epoch': 0.26} 26%|██▌ | 3223/12313 [2:24:26<6:30:41, 2.58s/it] 26%|██▌ | 3224/12313 [2:24:29<6:34:07, 2.60s/it] {'loss': 0.5995, 'grad_norm': 5.243856895565067, 'learning_rate': 4.3279581245801515e-06, 'epoch': 0.26} 26%|██▌ | 3224/12313 [2:24:29<6:34:07, 2.60s/it] 26%|██▌ | 3225/12313 [2:24:31<6:26:13, 2.55s/it] {'loss': 0.4661, 'grad_norm': 5.3541602152619046, 'learning_rate': 4.327509443923254e-06, 'epoch': 0.26} 26%|██▌ | 3225/12313 [2:24:31<6:26:13, 2.55s/it] 26%|██▌ | 3226/12313 [2:24:34<6:49:39, 2.70s/it] {'loss': 0.634, 'grad_norm': 5.615300922143262, 'learning_rate': 4.327060636812375e-06, 'epoch': 0.26} 26%|██▌ | 3226/12313 [2:24:34<6:49:39, 2.70s/it] 26%|██▌ | 3227/12313 [2:24:37<6:44:11, 2.67s/it] {'loss': 0.6963, 'grad_norm': 5.152013169697966, 'learning_rate': 4.32661170327857e-06, 'epoch': 0.26} 26%|██▌ | 3227/12313 [2:24:37<6:44:11, 2.67s/it] 26%|██▌ | 3228/12313 [2:24:39<6:39:02, 2.64s/it] {'loss': 0.7672, 'grad_norm': 5.2073960016961545, 'learning_rate': 4.326162643352901e-06, 'epoch': 0.26} 26%|██▌ | 3228/12313 [2:24:39<6:39:02, 2.64s/it] 26%|██▌ | 3229/12313 [2:24:42<6:33:26, 2.60s/it] {'loss': 0.6173, 'grad_norm': 5.909318364895728, 'learning_rate': 4.325713457066443e-06, 'epoch': 0.26} 26%|██▌ | 3229/12313 [2:24:42<6:33:26, 2.60s/it] 26%|██▌ | 3230/12313 [2:24:45<6:33:43, 2.60s/it] {'loss': 0.3936, 'grad_norm': 16.26621162135915, 'learning_rate': 4.325264144450276e-06, 'epoch': 0.26} 26%|██▌ | 3230/12313 [2:24:45<6:33:43, 2.60s/it] 26%|██▌ | 3231/12313 [2:24:47<6:36:12, 2.62s/it] {'loss': 0.5425, 'grad_norm': 4.238425289228819, 'learning_rate': 4.324814705535491e-06, 'epoch': 0.26} 26%|██▌ | 3231/12313 [2:24:47<6:36:12, 2.62s/it] 26%|██▌ | 3232/12313 [2:24:50<6:34:17, 2.61s/it] {'loss': 0.6787, 'grad_norm': 7.639214601519878, 'learning_rate': 4.324365140353185e-06, 'epoch': 0.26} 26%|██▌ | 3232/12313 [2:24:50<6:34:17, 2.61s/it] 26%|██▋ | 3233/12313 [2:24:52<6:32:28, 2.59s/it] {'loss': 0.7179, 'grad_norm': 4.776323919697988, 'learning_rate': 4.323915448934466e-06, 'epoch': 0.26} 26%|██▋ | 3233/12313 [2:24:52<6:32:28, 2.59s/it] 26%|██▋ | 3234/12313 [2:24:55<6:38:51, 2.64s/it] {'loss': 0.5759, 'grad_norm': 4.829079791949981, 'learning_rate': 4.323465631310452e-06, 'epoch': 0.26} 26%|██▋ | 3234/12313 [2:24:55<6:38:51, 2.64s/it] 26%|██▋ | 3235/12313 [2:24:58<6:31:54, 2.59s/it] {'loss': 0.5894, 'grad_norm': 5.074740707653109, 'learning_rate': 4.323015687512267e-06, 'epoch': 0.26} 26%|██▋ | 3235/12313 [2:24:58<6:31:54, 2.59s/it] 26%|██▋ | 3236/12313 [2:25:00<6:35:24, 2.61s/it] {'loss': 0.5466, 'grad_norm': 9.615050714429785, 'learning_rate': 4.322565617571044e-06, 'epoch': 0.26} 26%|██▋ | 3236/12313 [2:25:00<6:35:24, 2.61s/it] 26%|██▋ | 3237/12313 [2:25:03<6:32:47, 2.60s/it] {'loss': 0.5913, 'grad_norm': 5.123025702283434, 'learning_rate': 4.322115421517926e-06, 'epoch': 0.26} 26%|██▋ | 3237/12313 [2:25:03<6:32:47, 2.60s/it] 26%|██▋ | 3238/12313 [2:25:05<6:33:26, 2.60s/it] {'loss': 0.4789, 'grad_norm': 4.751136267011921, 'learning_rate': 4.321665099384064e-06, 'epoch': 0.26} 26%|██▋ | 3238/12313 [2:25:05<6:33:26, 2.60s/it] 26%|██▋ | 3239/12313 [2:25:08<6:38:24, 2.63s/it] {'loss': 0.5118, 'grad_norm': 10.947055513952483, 'learning_rate': 4.321214651200619e-06, 'epoch': 0.26} 26%|██▋ | 3239/12313 [2:25:08<6:38:24, 2.63s/it] 26%|██▋ | 3240/12313 [2:25:11<6:37:56, 2.63s/it] {'loss': 0.6006, 'grad_norm': 5.687786414091623, 'learning_rate': 4.320764076998759e-06, 'epoch': 0.26} 26%|██▋ | 3240/12313 [2:25:11<6:37:56, 2.63s/it] 26%|██▋ | 3241/12313 [2:25:13<6:40:30, 2.65s/it] {'loss': 0.6819, 'grad_norm': 4.846530548806665, 'learning_rate': 4.32031337680966e-06, 'epoch': 0.26} 26%|██▋ | 3241/12313 [2:25:13<6:40:30, 2.65s/it] 26%|██▋ | 3242/12313 [2:25:16<6:47:31, 2.70s/it] {'loss': 0.6645, 'grad_norm': 4.29992728461369, 'learning_rate': 4.31986255066451e-06, 'epoch': 0.26} 26%|██▋ | 3242/12313 [2:25:16<6:47:31, 2.70s/it] 26%|██▋ | 3243/12313 [2:25:19<6:44:38, 2.68s/it] {'loss': 0.65, 'grad_norm': 4.728540495274627, 'learning_rate': 4.319411598594503e-06, 'epoch': 0.26} 26%|██▋ | 3243/12313 [2:25:19<6:44:38, 2.68s/it] 26%|██▋ | 3244/12313 [2:25:22<6:44:38, 2.68s/it] {'loss': 0.5344, 'grad_norm': 4.044743720033708, 'learning_rate': 4.318960520630842e-06, 'epoch': 0.26} 26%|██▋ | 3244/12313 [2:25:22<6:44:38, 2.68s/it] 26%|██▋ | 3245/12313 [2:25:24<6:47:47, 2.70s/it] {'loss': 0.6314, 'grad_norm': 4.062414651992789, 'learning_rate': 4.3185093168047395e-06, 'epoch': 0.26} 26%|██▋ | 3245/12313 [2:25:24<6:47:47, 2.70s/it] 26%|██▋ | 3246/12313 [2:25:27<6:49:34, 2.71s/it] {'loss': 0.5496, 'grad_norm': 5.77814635308223, 'learning_rate': 4.318057987147418e-06, 'epoch': 0.26} 26%|██▋ | 3246/12313 [2:25:27<6:49:34, 2.71s/it] 26%|██▋ | 3247/12313 [2:25:30<6:46:31, 2.69s/it] {'loss': 0.5484, 'grad_norm': 8.868723303386513, 'learning_rate': 4.317606531690104e-06, 'epoch': 0.26} 26%|██▋ | 3247/12313 [2:25:30<6:46:31, 2.69s/it] 26%|██▋ | 3248/12313 [2:25:32<6:38:28, 2.64s/it] {'loss': 0.5311, 'grad_norm': 5.931221128514847, 'learning_rate': 4.317154950464039e-06, 'epoch': 0.26} 26%|██▋ | 3248/12313 [2:25:32<6:38:28, 2.64s/it] 26%|██▋ | 3249/12313 [2:25:35<6:40:25, 2.65s/it] {'loss': 0.532, 'grad_norm': 5.582327237175479, 'learning_rate': 4.316703243500467e-06, 'epoch': 0.26} 26%|██▋ | 3249/12313 [2:25:35<6:40:25, 2.65s/it] 26%|██▋ | 3250/12313 [2:25:37<6:33:12, 2.60s/it] {'loss': 0.7615, 'grad_norm': 3.5277983786023235, 'learning_rate': 4.3162514108306465e-06, 'epoch': 0.26} 26%|██▋ | 3250/12313 [2:25:37<6:33:12, 2.60s/it] 26%|██▋ | 3251/12313 [2:25:40<6:37:48, 2.63s/it] {'loss': 0.4753, 'grad_norm': 6.701677321017773, 'learning_rate': 4.315799452485841e-06, 'epoch': 0.26} 26%|██▋ | 3251/12313 [2:25:40<6:37:48, 2.63s/it] 26%|██▋ | 3252/12313 [2:25:43<6:53:01, 2.73s/it] {'loss': 0.5828, 'grad_norm': 4.478895629503352, 'learning_rate': 4.3153473684973226e-06, 'epoch': 0.26} 26%|██▋ | 3252/12313 [2:25:43<6:53:01, 2.73s/it] 26%|██▋ | 3253/12313 [2:25:46<6:50:31, 2.72s/it] {'loss': 0.5683, 'grad_norm': 6.992879628361261, 'learning_rate': 4.314895158896374e-06, 'epoch': 0.26} 26%|██▋ | 3253/12313 [2:25:46<6:50:31, 2.72s/it] 26%|██▋ | 3254/12313 [2:25:48<6:49:05, 2.71s/it] {'loss': 0.4762, 'grad_norm': 4.622185359910068, 'learning_rate': 4.314442823714286e-06, 'epoch': 0.26} 26%|██▋ | 3254/12313 [2:25:48<6:49:05, 2.71s/it] 26%|██▋ | 3255/12313 [2:25:51<6:52:24, 2.73s/it] {'loss': 0.7585, 'grad_norm': 3.3396459737640587, 'learning_rate': 4.313990362982357e-06, 'epoch': 0.26} 26%|██▋ | 3255/12313 [2:25:51<6:52:24, 2.73s/it] 26%|██▋ | 3256/12313 [2:25:54<6:51:37, 2.73s/it] {'loss': 0.4599, 'grad_norm': 9.71525006135501, 'learning_rate': 4.313537776731895e-06, 'epoch': 0.26} 26%|██▋ | 3256/12313 [2:25:54<6:51:37, 2.73s/it] 26%|██▋ | 3257/12313 [2:25:57<6:48:20, 2.71s/it] {'loss': 0.59, 'grad_norm': 5.997259305627276, 'learning_rate': 4.313085064994218e-06, 'epoch': 0.26} 26%|██▋ | 3257/12313 [2:25:57<6:48:20, 2.71s/it] 26%|██▋ | 3258/12313 [2:26:00<7:03:24, 2.81s/it] {'loss': 0.6535, 'grad_norm': 3.4896157532430285, 'learning_rate': 4.3126322278006496e-06, 'epoch': 0.26} 26%|██▋ | 3258/12313 [2:26:00<7:03:24, 2.81s/it] 26%|██▋ | 3259/12313 [2:26:03<7:32:24, 3.00s/it] {'loss': 0.4253, 'grad_norm': 3.163990271513905, 'learning_rate': 4.312179265182523e-06, 'epoch': 0.26} 26%|██▋ | 3259/12313 [2:26:03<7:32:24, 3.00s/it] 26%|██▋ | 3260/12313 [2:26:06<7:25:59, 2.96s/it] {'loss': 0.5902, 'grad_norm': 4.468139415385732, 'learning_rate': 4.311726177171184e-06, 'epoch': 0.26} 26%|██▋ | 3260/12313 [2:26:06<7:25:59, 2.96s/it] 26%|██▋ | 3261/12313 [2:26:09<7:20:20, 2.92s/it] {'loss': 0.5972, 'grad_norm': 3.7893680396692053, 'learning_rate': 4.311272963797981e-06, 'epoch': 0.26} 26%|██▋ | 3261/12313 [2:26:09<7:20:20, 2.92s/it] 26%|██▋ | 3262/12313 [2:26:11<7:12:41, 2.87s/it] {'loss': 0.4875, 'grad_norm': 5.50109202559223, 'learning_rate': 4.3108196250942746e-06, 'epoch': 0.26} 26%|██▋ | 3262/12313 [2:26:11<7:12:41, 2.87s/it] 27%|██▋ | 3263/12313 [2:26:14<6:54:58, 2.75s/it] {'loss': 0.7109, 'grad_norm': 10.031595831871455, 'learning_rate': 4.310366161091435e-06, 'epoch': 0.27} 27%|██▋ | 3263/12313 [2:26:14<6:54:58, 2.75s/it] 27%|██▋ | 3264/12313 [2:26:17<6:58:53, 2.78s/it] {'loss': 0.521, 'grad_norm': 5.5919773594492, 'learning_rate': 4.309912571820837e-06, 'epoch': 0.27} 27%|██▋ | 3264/12313 [2:26:17<6:58:53, 2.78s/it] 27%|██▋ | 3265/12313 [2:26:19<6:49:14, 2.71s/it] {'loss': 0.6694, 'grad_norm': 4.444420349163222, 'learning_rate': 4.309458857313868e-06, 'epoch': 0.27} 27%|██▋ | 3265/12313 [2:26:19<6:49:14, 2.71s/it] 27%|██▋ | 3266/12313 [2:26:22<6:50:00, 2.72s/it] {'loss': 0.6129, 'grad_norm': 3.8949106613348947, 'learning_rate': 4.309005017601924e-06, 'epoch': 0.27} 27%|██▋ | 3266/12313 [2:26:22<6:50:00, 2.72s/it] 27%|██▋ | 3267/12313 [2:26:25<6:48:34, 2.71s/it] {'loss': 0.6001, 'grad_norm': 3.3059067568093847, 'learning_rate': 4.308551052716406e-06, 'epoch': 0.27} 27%|██▋ | 3267/12313 [2:26:25<6:48:34, 2.71s/it] 27%|██▋ | 3268/12313 [2:26:27<6:49:19, 2.72s/it] {'loss': 0.5416, 'grad_norm': 5.114398735836412, 'learning_rate': 4.308096962688726e-06, 'epoch': 0.27} 27%|██▋ | 3268/12313 [2:26:27<6:49:19, 2.72s/it] 27%|██▋ | 3269/12313 [2:26:30<6:58:48, 2.78s/it] {'loss': 0.5931, 'grad_norm': 3.1154319915363855, 'learning_rate': 4.307642747550306e-06, 'epoch': 0.27} 27%|██▋ | 3269/12313 [2:26:30<6:58:48, 2.78s/it] 27%|██▋ | 3270/12313 [2:26:33<7:02:06, 2.80s/it] {'loss': 0.5852, 'grad_norm': 4.307648831780373, 'learning_rate': 4.307188407332574e-06, 'epoch': 0.27} 27%|██▋ | 3270/12313 [2:26:33<7:02:06, 2.80s/it] 27%|██▋ | 3271/12313 [2:26:36<6:58:17, 2.78s/it] {'loss': 0.5399, 'grad_norm': 4.424486660650266, 'learning_rate': 4.306733942066969e-06, 'epoch': 0.27} 27%|██▋ | 3271/12313 [2:26:36<6:58:17, 2.78s/it] 27%|██▋ | 3272/12313 [2:26:39<6:51:20, 2.73s/it] {'loss': 0.4185, 'grad_norm': 6.036748818342054, 'learning_rate': 4.306279351784938e-06, 'epoch': 0.27} 27%|██▋ | 3272/12313 [2:26:39<6:51:20, 2.73s/it] 27%|██▋ | 3273/12313 [2:26:42<7:01:55, 2.80s/it] {'loss': 0.4484, 'grad_norm': 3.6762274889278195, 'learning_rate': 4.305824636517935e-06, 'epoch': 0.27} 27%|██▋ | 3273/12313 [2:26:42<7:01:55, 2.80s/it] 27%|██▋ | 3274/12313 [2:26:44<6:51:50, 2.73s/it] {'loss': 0.4614, 'grad_norm': 4.098602260190931, 'learning_rate': 4.305369796297424e-06, 'epoch': 0.27} 27%|██▋ | 3274/12313 [2:26:44<6:51:50, 2.73s/it] 27%|██▋ | 3275/12313 [2:26:47<6:43:41, 2.68s/it] {'loss': 0.5971, 'grad_norm': 5.298237691361923, 'learning_rate': 4.3049148311548785e-06, 'epoch': 0.27} 27%|██▋ | 3275/12313 [2:26:47<6:43:41, 2.68s/it] 27%|██▋ | 3276/12313 [2:26:49<6:45:35, 2.69s/it] {'loss': 0.4597, 'grad_norm': 5.05302699787098, 'learning_rate': 4.304459741121778e-06, 'epoch': 0.27} 27%|██▋ | 3276/12313 [2:26:49<6:45:35, 2.69s/it] 27%|██▋ | 3277/12313 [2:26:52<6:55:20, 2.76s/it] {'loss': 0.5854, 'grad_norm': 4.995834340661036, 'learning_rate': 4.304004526229614e-06, 'epoch': 0.27} 27%|██▋ | 3277/12313 [2:26:52<6:55:20, 2.76s/it] 27%|██▋ | 3278/12313 [2:26:55<6:37:36, 2.64s/it] {'loss': 0.498, 'grad_norm': 2.5777374640004838, 'learning_rate': 4.303549186509884e-06, 'epoch': 0.27} 27%|██▋ | 3278/12313 [2:26:55<6:37:36, 2.64s/it] 27%|██▋ | 3279/12313 [2:26:58<6:55:44, 2.76s/it] {'loss': 0.4985, 'grad_norm': 3.390125198708796, 'learning_rate': 4.303093721994096e-06, 'epoch': 0.27} 27%|██▋ | 3279/12313 [2:26:58<6:55:44, 2.76s/it] 27%|██▋ | 3280/12313 [2:27:00<6:48:01, 2.71s/it] {'loss': 0.5427, 'grad_norm': 5.071175646539791, 'learning_rate': 4.302638132713766e-06, 'epoch': 0.27} 27%|██▋ | 3280/12313 [2:27:00<6:48:01, 2.71s/it] 27%|██▋ | 3281/12313 [2:27:03<6:48:40, 2.71s/it] {'loss': 0.4533, 'grad_norm': 4.587539332384161, 'learning_rate': 4.302182418700415e-06, 'epoch': 0.27} 27%|██▋ | 3281/12313 [2:27:03<6:48:40, 2.71s/it] 27%|██▋ | 3282/12313 [2:27:06<7:00:08, 2.79s/it] {'loss': 0.5711, 'grad_norm': 10.760547007231233, 'learning_rate': 4.301726579985581e-06, 'epoch': 0.27} 27%|██▋ | 3282/12313 [2:27:06<7:00:08, 2.79s/it] 27%|██▋ | 3283/12313 [2:27:09<6:48:54, 2.72s/it] {'loss': 0.7594, 'grad_norm': 6.241807773314562, 'learning_rate': 4.301270616600802e-06, 'epoch': 0.27} 27%|██▋ | 3283/12313 [2:27:09<6:48:54, 2.72s/it] 27%|██▋ | 3284/12313 [2:27:11<6:58:19, 2.78s/it] {'loss': 0.5922, 'grad_norm': 5.23430106311079, 'learning_rate': 4.30081452857763e-06, 'epoch': 0.27} 27%|██▋ | 3284/12313 [2:27:11<6:58:19, 2.78s/it] 27%|██▋ | 3285/12313 [2:27:14<6:44:05, 2.69s/it] {'loss': 0.6298, 'grad_norm': 5.314419566343465, 'learning_rate': 4.300358315947622e-06, 'epoch': 0.27} 27%|██▋ | 3285/12313 [2:27:14<6:44:05, 2.69s/it] 27%|██▋ | 3286/12313 [2:27:17<6:55:35, 2.76s/it] {'loss': 0.5465, 'grad_norm': 5.02528385387479, 'learning_rate': 4.299901978742349e-06, 'epoch': 0.27} 27%|██▋ | 3286/12313 [2:27:17<6:55:35, 2.76s/it] 27%|██▋ | 3287/12313 [2:27:19<6:45:40, 2.70s/it] {'loss': 0.5514, 'grad_norm': 4.18489447765815, 'learning_rate': 4.2994455169933835e-06, 'epoch': 0.27} 27%|██▋ | 3287/12313 [2:27:19<6:45:40, 2.70s/it] 27%|██▋ | 3288/12313 [2:27:22<6:42:31, 2.68s/it] {'loss': 0.6557, 'grad_norm': 4.050947136506907, 'learning_rate': 4.298988930732312e-06, 'epoch': 0.27} 27%|██▋ | 3288/12313 [2:27:22<6:42:31, 2.68s/it] 27%|██▋ | 3289/12313 [2:27:25<6:48:50, 2.72s/it] {'loss': 0.4805, 'grad_norm': 4.82281153344797, 'learning_rate': 4.2985322199907275e-06, 'epoch': 0.27} 27%|██▋ | 3289/12313 [2:27:25<6:48:50, 2.72s/it] 27%|██▋ | 3290/12313 [2:27:28<6:52:11, 2.74s/it] {'loss': 0.7165, 'grad_norm': 3.2786394296234787, 'learning_rate': 4.298075384800232e-06, 'epoch': 0.27} 27%|██▋ | 3290/12313 [2:27:28<6:52:11, 2.74s/it] 27%|██▋ | 3291/12313 [2:27:30<6:47:26, 2.71s/it] {'loss': 0.6281, 'grad_norm': 4.134763536721899, 'learning_rate': 4.297618425192436e-06, 'epoch': 0.27} 27%|██▋ | 3291/12313 [2:27:30<6:47:26, 2.71s/it] 27%|██▋ | 3292/12313 [2:27:33<7:02:18, 2.81s/it] {'loss': 0.4522, 'grad_norm': 5.393028330188706, 'learning_rate': 4.297161341198957e-06, 'epoch': 0.27} 27%|██▋ | 3292/12313 [2:27:33<7:02:18, 2.81s/it] 27%|██▋ | 3293/12313 [2:27:36<7:01:23, 2.80s/it] {'loss': 0.5315, 'grad_norm': 3.3337915433920178, 'learning_rate': 4.296704132851427e-06, 'epoch': 0.27} 27%|██▋ | 3293/12313 [2:27:36<7:01:23, 2.80s/it] 27%|██▋ | 3294/12313 [2:27:39<6:50:33, 2.73s/it] {'loss': 0.594, 'grad_norm': 4.494608257605046, 'learning_rate': 4.296246800181479e-06, 'epoch': 0.27} 27%|██▋ | 3294/12313 [2:27:39<6:50:33, 2.73s/it] 27%|██▋ | 3295/12313 [2:27:41<6:46:31, 2.70s/it] {'loss': 0.4113, 'grad_norm': 6.986279435908919, 'learning_rate': 4.29578934322076e-06, 'epoch': 0.27} 27%|██▋ | 3295/12313 [2:27:41<6:46:31, 2.70s/it] 27%|██▋ | 3296/12313 [2:27:44<6:41:35, 2.67s/it] {'loss': 0.656, 'grad_norm': 4.754454279458852, 'learning_rate': 4.295331762000921e-06, 'epoch': 0.27} 27%|██▋ | 3296/12313 [2:27:44<6:41:35, 2.67s/it] 27%|██▋ | 3297/12313 [2:27:47<6:57:28, 2.78s/it] {'loss': 0.6003, 'grad_norm': 4.557941560884389, 'learning_rate': 4.294874056553626e-06, 'epoch': 0.27} 27%|██▋ | 3297/12313 [2:27:47<6:57:28, 2.78s/it] 27%|██▋ | 3298/12313 [2:27:50<7:04:20, 2.82s/it] {'loss': 0.6373, 'grad_norm': 3.5253434081641593, 'learning_rate': 4.294416226910546e-06, 'epoch': 0.27} 27%|██▋ | 3298/12313 [2:27:50<7:04:20, 2.82s/it] 27%|██▋ | 3299/12313 [2:27:53<7:04:36, 2.83s/it] {'loss': 0.5665, 'grad_norm': 5.390157405133369, 'learning_rate': 4.2939582731033605e-06, 'epoch': 0.27} 27%|██▋ | 3299/12313 [2:27:53<7:04:36, 2.83s/it] 27%|██▋ | 3300/12313 [2:27:55<6:53:16, 2.75s/it] {'loss': 0.4936, 'grad_norm': 4.444716348064678, 'learning_rate': 4.293500195163756e-06, 'epoch': 0.27} 27%|██▋ | 3300/12313 [2:27:55<6:53:16, 2.75s/it] 27%|██▋ | 3301/12313 [2:27:58<6:43:35, 2.69s/it] {'loss': 0.6039, 'grad_norm': 4.929185123300809, 'learning_rate': 4.29304199312343e-06, 'epoch': 0.27} 27%|██▋ | 3301/12313 [2:27:58<6:43:35, 2.69s/it] 27%|██▋ | 3302/12313 [2:28:00<6:35:47, 2.64s/it] {'loss': 0.6245, 'grad_norm': 5.005771469258986, 'learning_rate': 4.292583667014087e-06, 'epoch': 0.27} 27%|██▋ | 3302/12313 [2:28:00<6:35:47, 2.64s/it] 27%|██▋ | 3303/12313 [2:28:03<6:44:22, 2.69s/it] {'loss': 0.4753, 'grad_norm': 6.794918253939256, 'learning_rate': 4.292125216867443e-06, 'epoch': 0.27} 27%|██▋ | 3303/12313 [2:28:03<6:44:22, 2.69s/it] 27%|██▋ | 3304/12313 [2:28:06<6:38:54, 2.66s/it] {'loss': 0.6796, 'grad_norm': 4.918313944683376, 'learning_rate': 4.2916666427152175e-06, 'epoch': 0.27} 27%|██▋ | 3304/12313 [2:28:06<6:38:54, 2.66s/it] 27%|██▋ | 3305/12313 [2:28:08<6:36:42, 2.64s/it] {'loss': 0.5299, 'grad_norm': 5.184653793095706, 'learning_rate': 4.291207944589143e-06, 'epoch': 0.27} 27%|██▋ | 3305/12313 [2:28:08<6:36:42, 2.64s/it] 27%|██▋ | 3306/12313 [2:28:11<6:31:51, 2.61s/it] {'loss': 0.5115, 'grad_norm': 5.158508970573285, 'learning_rate': 4.290749122520959e-06, 'epoch': 0.27} 27%|██▋ | 3306/12313 [2:28:11<6:31:51, 2.61s/it] 27%|██▋ | 3307/12313 [2:28:14<6:36:31, 2.64s/it] {'loss': 0.6411, 'grad_norm': 5.019559380014884, 'learning_rate': 4.290290176542412e-06, 'epoch': 0.27} 27%|██▋ | 3307/12313 [2:28:14<6:36:31, 2.64s/it] 27%|██▋ | 3308/12313 [2:28:16<6:41:47, 2.68s/it] {'loss': 0.5529, 'grad_norm': 5.867738157836222, 'learning_rate': 4.289831106685261e-06, 'epoch': 0.27} 27%|██▋ | 3308/12313 [2:28:16<6:41:47, 2.68s/it] 27%|██▋ | 3309/12313 [2:28:19<6:37:28, 2.65s/it] {'loss': 0.5655, 'grad_norm': 4.019129715439323, 'learning_rate': 4.289371912981268e-06, 'epoch': 0.27} 27%|██▋ | 3309/12313 [2:28:19<6:37:28, 2.65s/it] 27%|██▋ | 3310/12313 [2:28:21<6:31:40, 2.61s/it] {'loss': 0.7881, 'grad_norm': 4.959547313791584, 'learning_rate': 4.28891259546221e-06, 'epoch': 0.27} 27%|██▋ | 3310/12313 [2:28:21<6:31:40, 2.61s/it] 27%|██▋ | 3311/12313 [2:28:24<6:25:56, 2.57s/it] {'loss': 0.6476, 'grad_norm': 3.3894537647637466, 'learning_rate': 4.288453154159869e-06, 'epoch': 0.27} 27%|██▋ | 3311/12313 [2:28:24<6:25:56, 2.57s/it] 27%|██▋ | 3312/12313 [2:28:27<6:30:38, 2.60s/it] {'loss': 0.4195, 'grad_norm': 4.576484804759739, 'learning_rate': 4.287993589106034e-06, 'epoch': 0.27} 27%|██▋ | 3312/12313 [2:28:27<6:30:38, 2.60s/it] 27%|██▋ | 3313/12313 [2:28:29<6:38:49, 2.66s/it] {'loss': 0.5731, 'grad_norm': 5.5159598677639075, 'learning_rate': 4.287533900332506e-06, 'epoch': 0.27} 27%|██▋ | 3313/12313 [2:28:29<6:38:49, 2.66s/it] 27%|██▋ | 3314/12313 [2:28:32<6:31:09, 2.61s/it] {'loss': 0.7422, 'grad_norm': 3.476369390277886, 'learning_rate': 4.287074087871092e-06, 'epoch': 0.27} 27%|██▋ | 3314/12313 [2:28:32<6:31:09, 2.61s/it] 27%|██▋ | 3315/12313 [2:28:35<6:30:48, 2.61s/it] {'loss': 0.5666, 'grad_norm': 4.3872226024253305, 'learning_rate': 4.2866141517536085e-06, 'epoch': 0.27} 27%|██▋ | 3315/12313 [2:28:35<6:30:48, 2.61s/it] 27%|██▋ | 3316/12313 [2:28:37<6:31:32, 2.61s/it] {'loss': 0.6373, 'grad_norm': 4.1888189167131795, 'learning_rate': 4.286154092011882e-06, 'epoch': 0.27} 27%|██▋ | 3316/12313 [2:28:37<6:31:32, 2.61s/it] 27%|██▋ | 3317/12313 [2:28:40<6:46:23, 2.71s/it] {'loss': 0.5566, 'grad_norm': 6.277380300582101, 'learning_rate': 4.285693908677746e-06, 'epoch': 0.27} 27%|██▋ | 3317/12313 [2:28:40<6:46:23, 2.71s/it] 27%|██▋ | 3318/12313 [2:28:43<6:40:55, 2.67s/it] {'loss': 0.7619, 'grad_norm': 3.4248947125674802, 'learning_rate': 4.285233601783041e-06, 'epoch': 0.27} 27%|██▋ | 3318/12313 [2:28:43<6:40:55, 2.67s/it] 27%|██▋ | 3319/12313 [2:28:45<6:40:52, 2.67s/it] {'loss': 0.441, 'grad_norm': 9.083452534210942, 'learning_rate': 4.28477317135962e-06, 'epoch': 0.27} 27%|██▋ | 3319/12313 [2:28:45<6:40:52, 2.67s/it] 27%|██▋ | 3320/12313 [2:28:48<6:38:50, 2.66s/it] {'loss': 0.5573, 'grad_norm': 5.194205565045371, 'learning_rate': 4.28431261743934e-06, 'epoch': 0.27} 27%|██▋ | 3320/12313 [2:28:48<6:38:50, 2.66s/it] 27%|██▋ | 3321/12313 [2:28:51<6:35:02, 2.64s/it] {'loss': 0.5867, 'grad_norm': 4.678152409953697, 'learning_rate': 4.2838519400540715e-06, 'epoch': 0.27} 27%|██▋ | 3321/12313 [2:28:51<6:35:02, 2.64s/it] 27%|██▋ | 3322/12313 [2:28:54<6:50:13, 2.74s/it] {'loss': 0.5913, 'grad_norm': 6.079974041201289, 'learning_rate': 4.283391139235688e-06, 'epoch': 0.27} 27%|██▋ | 3322/12313 [2:28:54<6:50:13, 2.74s/it] 27%|██▋ | 3323/12313 [2:28:57<7:02:59, 2.82s/it] {'loss': 0.4492, 'grad_norm': 4.43636337967898, 'learning_rate': 4.282930215016078e-06, 'epoch': 0.27} 27%|██▋ | 3323/12313 [2:28:57<7:02:59, 2.82s/it] 27%|██▋ | 3324/12313 [2:28:59<6:46:33, 2.71s/it] {'loss': 0.7522, 'grad_norm': 5.658242363279942, 'learning_rate': 4.282469167427132e-06, 'epoch': 0.27} 27%|██▋ | 3324/12313 [2:28:59<6:46:33, 2.71s/it] 27%|██▋ | 3325/12313 [2:29:02<6:44:23, 2.70s/it] {'loss': 0.5871, 'grad_norm': 6.996911479616853, 'learning_rate': 4.2820079965007545e-06, 'epoch': 0.27} 27%|██▋ | 3325/12313 [2:29:02<6:44:23, 2.70s/it] 27%|██▋ | 3326/12313 [2:29:04<6:38:04, 2.66s/it] {'loss': 0.3827, 'grad_norm': 4.53448881154164, 'learning_rate': 4.281546702268853e-06, 'epoch': 0.27} 27%|██▋ | 3326/12313 [2:29:04<6:38:04, 2.66s/it] 27%|██▋ | 3327/12313 [2:29:07<6:36:23, 2.65s/it] {'loss': 0.3635, 'grad_norm': 6.424034548623486, 'learning_rate': 4.28108528476335e-06, 'epoch': 0.27} 27%|██▋ | 3327/12313 [2:29:07<6:36:23, 2.65s/it] 27%|██▋ | 3328/12313 [2:29:09<6:27:05, 2.58s/it] {'loss': 0.5539, 'grad_norm': 4.466125493504516, 'learning_rate': 4.280623744016171e-06, 'epoch': 0.27} 27%|██▋ | 3328/12313 [2:29:09<6:27:05, 2.58s/it] 27%|██▋ | 3329/12313 [2:29:12<6:44:17, 2.70s/it] {'loss': 0.5083, 'grad_norm': 3.693236745360508, 'learning_rate': 4.280162080059252e-06, 'epoch': 0.27} 27%|██▋ | 3329/12313 [2:29:12<6:44:17, 2.70s/it] 27%|██▋ | 3330/12313 [2:29:15<6:38:58, 2.66s/it] {'loss': 0.6099, 'grad_norm': 7.92497787565039, 'learning_rate': 4.279700292924539e-06, 'epoch': 0.27} 27%|██▋ | 3330/12313 [2:29:15<6:38:58, 2.66s/it] 27%|██▋ | 3331/12313 [2:29:18<6:46:24, 2.71s/it] {'loss': 0.4806, 'grad_norm': 6.126906391884235, 'learning_rate': 4.279238382643985e-06, 'epoch': 0.27} 27%|██▋ | 3331/12313 [2:29:18<6:46:24, 2.71s/it] 27%|██▋ | 3332/12313 [2:29:20<6:49:50, 2.74s/it] {'loss': 0.5868, 'grad_norm': 10.147235977729743, 'learning_rate': 4.278776349249551e-06, 'epoch': 0.27} 27%|██▋ | 3332/12313 [2:29:20<6:49:50, 2.74s/it] 27%|██▋ | 3333/12313 [2:29:23<6:41:21, 2.68s/it] {'loss': 0.5579, 'grad_norm': 4.694547184635383, 'learning_rate': 4.278314192773208e-06, 'epoch': 0.27} 27%|██▋ | 3333/12313 [2:29:23<6:41:21, 2.68s/it] 27%|██▋ | 3334/12313 [2:29:26<6:46:41, 2.72s/it] {'loss': 0.5259, 'grad_norm': 5.790255949685903, 'learning_rate': 4.277851913246934e-06, 'epoch': 0.27} 27%|██▋ | 3334/12313 [2:29:26<6:46:41, 2.72s/it] 27%|██▋ | 3335/12313 [2:29:29<6:51:11, 2.75s/it] {'loss': 0.5614, 'grad_norm': 4.610604823465653, 'learning_rate': 4.277389510702717e-06, 'epoch': 0.27} 27%|██▋ | 3335/12313 [2:29:29<6:51:11, 2.75s/it] 27%|██▋ | 3336/12313 [2:29:31<6:44:43, 2.71s/it] {'loss': 0.5133, 'grad_norm': 6.063379795121963, 'learning_rate': 4.276926985172553e-06, 'epoch': 0.27} 27%|██▋ | 3336/12313 [2:29:31<6:44:43, 2.71s/it] 27%|██▋ | 3337/12313 [2:29:34<6:45:24, 2.71s/it] {'loss': 0.4748, 'grad_norm': 4.576106292327785, 'learning_rate': 4.276464336688445e-06, 'epoch': 0.27} 27%|██▋ | 3337/12313 [2:29:34<6:45:24, 2.71s/it] 27%|██▋ | 3338/12313 [2:29:37<6:40:49, 2.68s/it] {'loss': 0.7092, 'grad_norm': 5.762207004631984, 'learning_rate': 4.2760015652824074e-06, 'epoch': 0.27} 27%|██▋ | 3338/12313 [2:29:37<6:40:49, 2.68s/it] 27%|██▋ | 3339/12313 [2:29:39<6:45:45, 2.71s/it] {'loss': 0.4025, 'grad_norm': 4.920850361695753, 'learning_rate': 4.27553867098646e-06, 'epoch': 0.27} 27%|██▋ | 3339/12313 [2:29:39<6:45:45, 2.71s/it] 27%|██▋ | 3340/12313 [2:29:42<6:52:00, 2.75s/it] {'loss': 0.5994, 'grad_norm': 4.965952642074921, 'learning_rate': 4.275075653832635e-06, 'epoch': 0.27} 27%|██▋ | 3340/12313 [2:29:42<6:52:00, 2.75s/it] 27%|██▋ | 3341/12313 [2:29:45<6:39:20, 2.67s/it] {'loss': 0.6148, 'grad_norm': 6.661938259420609, 'learning_rate': 4.274612513852968e-06, 'epoch': 0.27} 27%|██▋ | 3341/12313 [2:29:45<6:39:20, 2.67s/it] 27%|██▋ | 3342/12313 [2:29:47<6:37:28, 2.66s/it] {'loss': 0.4974, 'grad_norm': 4.176847484212596, 'learning_rate': 4.274149251079507e-06, 'epoch': 0.27} 27%|██▋ | 3342/12313 [2:29:47<6:37:28, 2.66s/it] 27%|██▋ | 3343/12313 [2:29:50<6:36:02, 2.65s/it] {'loss': 0.518, 'grad_norm': 9.324003992934651, 'learning_rate': 4.273685865544308e-06, 'epoch': 0.27} 27%|██▋ | 3343/12313 [2:29:50<6:36:02, 2.65s/it] 27%|██▋ | 3344/12313 [2:29:52<6:25:52, 2.58s/it] {'loss': 0.5259, 'grad_norm': 10.84005640144426, 'learning_rate': 4.273222357279434e-06, 'epoch': 0.27} 27%|██▋ | 3344/12313 [2:29:52<6:25:52, 2.58s/it] 27%|██▋ | 3345/12313 [2:29:55<6:33:17, 2.63s/it] {'loss': 0.5688, 'grad_norm': 4.583615144085001, 'learning_rate': 4.272758726316958e-06, 'epoch': 0.27} 27%|██▋ | 3345/12313 [2:29:55<6:33:17, 2.63s/it] 27%|██▋ | 3346/12313 [2:29:58<6:35:20, 2.65s/it] {'loss': 0.7256, 'grad_norm': 5.704188407998161, 'learning_rate': 4.272294972688959e-06, 'epoch': 0.27} 27%|██▋ | 3346/12313 [2:29:58<6:35:20, 2.65s/it] 27%|██▋ | 3347/12313 [2:30:00<6:36:44, 2.65s/it] {'loss': 0.4029, 'grad_norm': 5.611046782367686, 'learning_rate': 4.2718310964275285e-06, 'epoch': 0.27} 27%|██▋ | 3347/12313 [2:30:00<6:36:44, 2.65s/it] 27%|██▋ | 3348/12313 [2:30:03<6:44:46, 2.71s/it] {'loss': 0.5548, 'grad_norm': 3.432494651225952, 'learning_rate': 4.271367097564763e-06, 'epoch': 0.27} 27%|██▋ | 3348/12313 [2:30:03<6:44:46, 2.71s/it] 27%|██▋ | 3349/12313 [2:30:06<6:40:47, 2.68s/it] {'loss': 0.4336, 'grad_norm': 5.993392392105414, 'learning_rate': 4.27090297613277e-06, 'epoch': 0.27} 27%|██▋ | 3349/12313 [2:30:06<6:40:47, 2.68s/it] 27%|██▋ | 3350/12313 [2:30:09<6:35:31, 2.65s/it] {'loss': 0.5539, 'grad_norm': 5.973550784862252, 'learning_rate': 4.270438732163663e-06, 'epoch': 0.27} 27%|██▋ | 3350/12313 [2:30:09<6:35:31, 2.65s/it] 27%|██▋ | 3351/12313 [2:30:11<6:34:55, 2.64s/it] {'loss': 0.5499, 'grad_norm': 7.021609581750342, 'learning_rate': 4.269974365689565e-06, 'epoch': 0.27} 27%|██▋ | 3351/12313 [2:30:11<6:34:55, 2.64s/it] 27%|██▋ | 3352/12313 [2:30:14<6:33:45, 2.64s/it] {'loss': 0.5756, 'grad_norm': 4.0778999962904106, 'learning_rate': 4.269509876742609e-06, 'epoch': 0.27} 27%|██▋ | 3352/12313 [2:30:14<6:33:45, 2.64s/it] 27%|██▋ | 3353/12313 [2:30:16<6:36:23, 2.65s/it] {'loss': 0.6475, 'grad_norm': 4.339699329280206, 'learning_rate': 4.269045265354935e-06, 'epoch': 0.27} 27%|██▋ | 3353/12313 [2:30:16<6:36:23, 2.65s/it] 27%|██▋ | 3354/12313 [2:30:19<6:38:32, 2.67s/it] {'loss': 0.6789, 'grad_norm': 3.557598291542795, 'learning_rate': 4.26858053155869e-06, 'epoch': 0.27} 27%|██▋ | 3354/12313 [2:30:19<6:38:32, 2.67s/it] 27%|██▋ | 3355/12313 [2:30:22<6:27:53, 2.60s/it] {'loss': 0.5661, 'grad_norm': 5.638513919453793, 'learning_rate': 4.268115675386033e-06, 'epoch': 0.27} 27%|██▋ | 3355/12313 [2:30:22<6:27:53, 2.60s/it] 27%|██▋ | 3356/12313 [2:30:24<6:19:28, 2.54s/it] {'loss': 0.6213, 'grad_norm': 7.007916663451633, 'learning_rate': 4.267650696869129e-06, 'epoch': 0.27} 27%|██▋ | 3356/12313 [2:30:24<6:19:28, 2.54s/it] 27%|██▋ | 3357/12313 [2:30:26<6:13:47, 2.50s/it] {'loss': 0.6185, 'grad_norm': 6.707656670971426, 'learning_rate': 4.267185596040152e-06, 'epoch': 0.27} 27%|██▋ | 3357/12313 [2:30:26<6:13:47, 2.50s/it] 27%|██▋ | 3358/12313 [2:30:29<6:17:11, 2.53s/it] {'loss': 0.6918, 'grad_norm': 5.468221363182492, 'learning_rate': 4.266720372931285e-06, 'epoch': 0.27} 27%|██▋ | 3358/12313 [2:30:29<6:17:11, 2.53s/it] 27%|██▋ | 3359/12313 [2:30:32<6:25:50, 2.59s/it] {'loss': 0.4968, 'grad_norm': 3.4031280276219826, 'learning_rate': 4.2662550275747175e-06, 'epoch': 0.27} 27%|██▋ | 3359/12313 [2:30:32<6:25:50, 2.59s/it] 27%|██▋ | 3360/12313 [2:30:35<6:38:40, 2.67s/it] {'loss': 0.5128, 'grad_norm': 3.7289743320117914, 'learning_rate': 4.26578956000265e-06, 'epoch': 0.27} 27%|██▋ | 3360/12313 [2:30:35<6:38:40, 2.67s/it] 27%|██▋ | 3361/12313 [2:30:37<6:38:16, 2.67s/it] {'loss': 0.5662, 'grad_norm': 4.058327224361656, 'learning_rate': 4.26532397024729e-06, 'epoch': 0.27} 27%|██▋ | 3361/12313 [2:30:37<6:38:16, 2.67s/it] 27%|██▋ | 3362/12313 [2:30:40<6:37:35, 2.67s/it] {'loss': 0.5985, 'grad_norm': 6.483948262408605, 'learning_rate': 4.264858258340854e-06, 'epoch': 0.27} 27%|██▋ | 3362/12313 [2:30:40<6:37:35, 2.67s/it] 27%|██▋ | 3363/12313 [2:30:43<6:39:30, 2.68s/it] {'loss': 0.5611, 'grad_norm': 6.7970175715192065, 'learning_rate': 4.264392424315568e-06, 'epoch': 0.27} 27%|██▋ | 3363/12313 [2:30:43<6:39:30, 2.68s/it] 27%|██▋ | 3364/12313 [2:30:45<6:42:24, 2.70s/it] {'loss': 0.5915, 'grad_norm': 4.719015734535453, 'learning_rate': 4.263926468203663e-06, 'epoch': 0.27} 27%|██▋ | 3364/12313 [2:30:45<6:42:24, 2.70s/it] 27%|██▋ | 3365/12313 [2:30:48<6:40:16, 2.68s/it] {'loss': 0.6571, 'grad_norm': 5.439068319982008, 'learning_rate': 4.2634603900373825e-06, 'epoch': 0.27} 27%|██▋ | 3365/12313 [2:30:48<6:40:16, 2.68s/it] 27%|██▋ | 3366/12313 [2:30:50<6:29:28, 2.61s/it] {'loss': 0.5393, 'grad_norm': 9.218141495545822, 'learning_rate': 4.262994189848976e-06, 'epoch': 0.27} 27%|██▋ | 3366/12313 [2:30:50<6:29:28, 2.61s/it] 27%|██▋ | 3367/12313 [2:30:53<6:22:03, 2.56s/it] {'loss': 0.5521, 'grad_norm': 4.307338467347387, 'learning_rate': 4.262527867670702e-06, 'epoch': 0.27} 27%|██▋ | 3367/12313 [2:30:53<6:22:03, 2.56s/it] 27%|██▋ | 3368/12313 [2:30:55<6:21:41, 2.56s/it] {'loss': 0.4343, 'grad_norm': 4.015516448961165, 'learning_rate': 4.2620614235348265e-06, 'epoch': 0.27} 27%|██▋ | 3368/12313 [2:30:55<6:21:41, 2.56s/it] 27%|██▋ | 3369/12313 [2:30:58<6:25:14, 2.58s/it] {'loss': 0.4884, 'grad_norm': 5.503396831053377, 'learning_rate': 4.261594857473628e-06, 'epoch': 0.27} 27%|██▋ | 3369/12313 [2:30:58<6:25:14, 2.58s/it] 27%|██▋ | 3370/12313 [2:31:01<6:32:59, 2.64s/it] {'loss': 0.5385, 'grad_norm': 5.761220256833325, 'learning_rate': 4.261128169519388e-06, 'epoch': 0.27} 27%|██▋ | 3370/12313 [2:31:01<6:32:59, 2.64s/it] 27%|██▋ | 3371/12313 [2:31:03<6:29:07, 2.61s/it] {'loss': 0.5659, 'grad_norm': 7.751355969511051, 'learning_rate': 4.2606613597043975e-06, 'epoch': 0.27} 27%|██▋ | 3371/12313 [2:31:03<6:29:07, 2.61s/it] 27%|██▋ | 3372/12313 [2:31:06<6:23:02, 2.57s/it] {'loss': 0.4999, 'grad_norm': 4.671040413073545, 'learning_rate': 4.260194428060961e-06, 'epoch': 0.27} 27%|██▋ | 3372/12313 [2:31:06<6:23:02, 2.57s/it] 27%|██▋ | 3373/12313 [2:31:09<6:27:20, 2.60s/it] {'loss': 0.6527, 'grad_norm': 7.901394405330174, 'learning_rate': 4.2597273746213855e-06, 'epoch': 0.27} 27%|██▋ | 3373/12313 [2:31:09<6:27:20, 2.60s/it] 27%|██▋ | 3374/12313 [2:31:11<6:31:53, 2.63s/it] {'loss': 0.5473, 'grad_norm': 6.603352470726141, 'learning_rate': 4.259260199417988e-06, 'epoch': 0.27} 27%|██▋ | 3374/12313 [2:31:11<6:31:53, 2.63s/it] 27%|██▋ | 3375/12313 [2:31:14<6:30:48, 2.62s/it] {'loss': 0.5654, 'grad_norm': 5.40024658636952, 'learning_rate': 4.2587929024830964e-06, 'epoch': 0.27} 27%|██▋ | 3375/12313 [2:31:14<6:30:48, 2.62s/it] 27%|██▋ | 3376/12313 [2:31:16<6:27:37, 2.60s/it] {'loss': 0.6142, 'grad_norm': 9.759841121522395, 'learning_rate': 4.258325483849044e-06, 'epoch': 0.27} 27%|██▋ | 3376/12313 [2:31:16<6:27:37, 2.60s/it] 27%|██▋ | 3377/12313 [2:31:19<6:29:49, 2.62s/it] {'loss': 0.5213, 'grad_norm': 5.125699081398096, 'learning_rate': 4.257857943548173e-06, 'epoch': 0.27} 27%|██▋ | 3377/12313 [2:31:19<6:29:49, 2.62s/it] 27%|██▋ | 3378/12313 [2:31:22<6:39:11, 2.68s/it] {'loss': 0.4747, 'grad_norm': 4.34028312916136, 'learning_rate': 4.257390281612837e-06, 'epoch': 0.27} 27%|██▋ | 3378/12313 [2:31:22<6:39:11, 2.68s/it] 27%|██▋ | 3379/12313 [2:31:25<6:45:58, 2.73s/it] {'loss': 0.5938, 'grad_norm': 4.263312229046128, 'learning_rate': 4.256922498075394e-06, 'epoch': 0.27} 27%|██▋ | 3379/12313 [2:31:25<6:45:58, 2.73s/it] 27%|██▋ | 3380/12313 [2:31:28<6:54:07, 2.78s/it] {'loss': 0.4205, 'grad_norm': 4.797086281941267, 'learning_rate': 4.256454592968212e-06, 'epoch': 0.27} 27%|██▋ | 3380/12313 [2:31:28<6:54:07, 2.78s/it] 27%|██▋ | 3381/12313 [2:31:30<6:47:03, 2.73s/it] {'loss': 0.6218, 'grad_norm': 7.033829432880135, 'learning_rate': 4.255986566323668e-06, 'epoch': 0.27} 27%|██▋ | 3381/12313 [2:31:30<6:47:03, 2.73s/it] 27%|██▋ | 3382/12313 [2:31:33<6:46:54, 2.73s/it] {'loss': 0.5008, 'grad_norm': 7.8902900782976095, 'learning_rate': 4.255518418174148e-06, 'epoch': 0.27} 27%|██▋ | 3382/12313 [2:31:33<6:46:54, 2.73s/it] 27%|██▋ | 3383/12313 [2:31:36<6:42:30, 2.70s/it] {'loss': 0.694, 'grad_norm': 3.2994006635611655, 'learning_rate': 4.2550501485520445e-06, 'epoch': 0.27} 27%|██▋ | 3383/12313 [2:31:36<6:42:30, 2.70s/it] 27%|██▋ | 3384/12313 [2:31:38<6:35:00, 2.65s/it] {'loss': 0.5462, 'grad_norm': 5.401884002359012, 'learning_rate': 4.254581757489758e-06, 'epoch': 0.27} 27%|██▋ | 3384/12313 [2:31:38<6:35:00, 2.65s/it] 27%|██▋ | 3385/12313 [2:31:41<6:23:08, 2.57s/it] {'loss': 0.5148, 'grad_norm': 11.938262897785561, 'learning_rate': 4.254113245019701e-06, 'epoch': 0.27} 27%|██▋ | 3385/12313 [2:31:41<6:23:08, 2.57s/it] 27%|██▋ | 3386/12313 [2:31:43<6:35:17, 2.66s/it] {'loss': 0.6357, 'grad_norm': 6.8820894517380555, 'learning_rate': 4.25364461117429e-06, 'epoch': 0.27} 27%|██▋ | 3386/12313 [2:31:43<6:35:17, 2.66s/it] 28%|██▊ | 3387/12313 [2:31:46<6:24:07, 2.58s/it] {'loss': 0.609, 'grad_norm': 3.919541658984626, 'learning_rate': 4.2531758559859535e-06, 'epoch': 0.28} 28%|██▊ | 3387/12313 [2:31:46<6:24:07, 2.58s/it] 28%|██▊ | 3388/12313 [2:31:49<6:30:14, 2.62s/it] {'loss': 0.5394, 'grad_norm': 3.9922271367677475, 'learning_rate': 4.252706979487127e-06, 'epoch': 0.28} 28%|██▊ | 3388/12313 [2:31:49<6:30:14, 2.62s/it] 28%|██▊ | 3389/12313 [2:31:51<6:28:00, 2.61s/it] {'loss': 0.4479, 'grad_norm': 4.307590756579932, 'learning_rate': 4.2522379817102525e-06, 'epoch': 0.28} 28%|██▊ | 3389/12313 [2:31:51<6:28:00, 2.61s/it] 28%|██▊ | 3390/12313 [2:31:54<6:32:01, 2.64s/it] {'loss': 0.5758, 'grad_norm': 4.6485314568936715, 'learning_rate': 4.251768862687783e-06, 'epoch': 0.28} 28%|██▊ | 3390/12313 [2:31:54<6:32:01, 2.64s/it] 28%|██▊ | 3391/12313 [2:31:56<6:25:05, 2.59s/it] {'loss': 0.5229, 'grad_norm': 3.219961667439573, 'learning_rate': 4.25129962245218e-06, 'epoch': 0.28} 28%|██▊ | 3391/12313 [2:31:56<6:25:05, 2.59s/it] 28%|██▊ | 3392/12313 [2:31:59<6:41:36, 2.70s/it] {'loss': 0.5625, 'grad_norm': 3.6351810295222955, 'learning_rate': 4.250830261035911e-06, 'epoch': 0.28} 28%|██▊ | 3392/12313 [2:31:59<6:41:36, 2.70s/it] 28%|██▊ | 3393/12313 [2:32:02<6:30:14, 2.62s/it] {'loss': 0.6667, 'grad_norm': 5.369015436096168, 'learning_rate': 4.250360778471455e-06, 'epoch': 0.28} 28%|██▊ | 3393/12313 [2:32:02<6:30:14, 2.62s/it] 28%|██▊ | 3394/12313 [2:32:04<6:24:32, 2.59s/it] {'loss': 0.6424, 'grad_norm': 4.427072372118147, 'learning_rate': 4.249891174791297e-06, 'epoch': 0.28} 28%|██▊ | 3394/12313 [2:32:04<6:24:32, 2.59s/it] 28%|██▊ | 3395/12313 [2:32:07<6:22:54, 2.58s/it] {'loss': 0.6288, 'grad_norm': 5.994983906902898, 'learning_rate': 4.249421450027929e-06, 'epoch': 0.28} 28%|██▊ | 3395/12313 [2:32:07<6:22:54, 2.58s/it] 28%|██▊ | 3396/12313 [2:32:09<6:25:48, 2.60s/it] {'loss': 0.6017, 'grad_norm': 5.740886381814397, 'learning_rate': 4.248951604213858e-06, 'epoch': 0.28} 28%|██▊ | 3396/12313 [2:32:09<6:25:48, 2.60s/it] 28%|██▊ | 3397/12313 [2:32:12<6:21:55, 2.57s/it] {'loss': 0.5481, 'grad_norm': 6.578063566695588, 'learning_rate': 4.24848163738159e-06, 'epoch': 0.28} 28%|██▊ | 3397/12313 [2:32:12<6:21:55, 2.57s/it] 28%|██▊ | 3398/12313 [2:32:15<6:29:18, 2.62s/it] {'loss': 0.601, 'grad_norm': 4.114106520635557, 'learning_rate': 4.248011549563647e-06, 'epoch': 0.28} 28%|██▊ | 3398/12313 [2:32:15<6:29:18, 2.62s/it] 28%|██▊ | 3399/12313 [2:32:17<6:27:42, 2.61s/it] {'loss': 0.5067, 'grad_norm': 5.445436802534245, 'learning_rate': 4.247541340792557e-06, 'epoch': 0.28} 28%|██▊ | 3399/12313 [2:32:17<6:27:42, 2.61s/it] 28%|██▊ | 3400/12313 [2:32:20<6:26:21, 2.60s/it] {'loss': 0.4474, 'grad_norm': 5.480238969386964, 'learning_rate': 4.247071011100855e-06, 'epoch': 0.28} 28%|██▊ | 3400/12313 [2:32:20<6:26:21, 2.60s/it] 28%|██▊ | 3401/12313 [2:32:22<6:27:21, 2.61s/it] {'loss': 0.5349, 'grad_norm': 3.9075643825539865, 'learning_rate': 4.246600560521084e-06, 'epoch': 0.28} 28%|██▊ | 3401/12313 [2:32:22<6:27:21, 2.61s/it] 28%|██▊ | 3402/12313 [2:32:25<6:27:03, 2.61s/it] {'loss': 0.6055, 'grad_norm': 4.585770407486754, 'learning_rate': 4.246129989085798e-06, 'epoch': 0.28} 28%|██▊ | 3402/12313 [2:32:25<6:27:03, 2.61s/it] 28%|██▊ | 3403/12313 [2:32:28<6:29:42, 2.62s/it] {'loss': 0.5557, 'grad_norm': 6.06730610156998, 'learning_rate': 4.245659296827559e-06, 'epoch': 0.28} 28%|██▊ | 3403/12313 [2:32:28<6:29:42, 2.62s/it] 28%|██▊ | 3404/12313 [2:32:30<6:31:23, 2.64s/it] {'loss': 0.5214, 'grad_norm': 4.632787296567119, 'learning_rate': 4.245188483778935e-06, 'epoch': 0.28} 28%|██▊ | 3404/12313 [2:32:30<6:31:23, 2.64s/it] 28%|██▊ | 3405/12313 [2:32:33<6:31:33, 2.64s/it] {'loss': 0.5625, 'grad_norm': 8.317514548088848, 'learning_rate': 4.244717549972504e-06, 'epoch': 0.28} 28%|██▊ | 3405/12313 [2:32:33<6:31:33, 2.64s/it] 28%|██▊ | 3406/12313 [2:32:36<6:34:10, 2.66s/it] {'loss': 0.5844, 'grad_norm': 3.6101896303329255, 'learning_rate': 4.2442464954408524e-06, 'epoch': 0.28} 28%|██▊ | 3406/12313 [2:32:36<6:34:10, 2.66s/it] 28%|██▊ | 3407/12313 [2:32:38<6:30:34, 2.63s/it] {'loss': 0.5043, 'grad_norm': 4.099925210417941, 'learning_rate': 4.243775320216575e-06, 'epoch': 0.28} 28%|██▊ | 3407/12313 [2:32:38<6:30:34, 2.63s/it] 28%|██▊ | 3408/12313 [2:32:41<6:31:00, 2.63s/it] {'loss': 0.5581, 'grad_norm': 5.864792263094429, 'learning_rate': 4.243304024332273e-06, 'epoch': 0.28} 28%|██▊ | 3408/12313 [2:32:41<6:31:00, 2.63s/it] 28%|██▊ | 3409/12313 [2:32:43<6:26:34, 2.60s/it] {'loss': 0.7032, 'grad_norm': 4.148641967222804, 'learning_rate': 4.24283260782056e-06, 'epoch': 0.28} 28%|██▊ | 3409/12313 [2:32:43<6:26:34, 2.60s/it] 28%|██▊ | 3410/12313 [2:32:46<6:38:31, 2.69s/it] {'loss': 0.4561, 'grad_norm': 5.951997919915405, 'learning_rate': 4.2423610707140545e-06, 'epoch': 0.28} 28%|██▊ | 3410/12313 [2:32:46<6:38:31, 2.69s/it] 28%|██▊ | 3411/12313 [2:32:49<6:31:54, 2.64s/it] {'loss': 0.6652, 'grad_norm': 3.9198889092781233, 'learning_rate': 4.241889413045384e-06, 'epoch': 0.28} 28%|██▊ | 3411/12313 [2:32:49<6:31:54, 2.64s/it] 28%|██▊ | 3412/12313 [2:32:52<6:49:39, 2.76s/it] {'loss': 0.7011, 'grad_norm': 4.4203783957213725, 'learning_rate': 4.2414176348471845e-06, 'epoch': 0.28} 28%|██▊ | 3412/12313 [2:32:52<6:49:39, 2.76s/it] 28%|██▊ | 3413/12313 [2:32:55<6:42:06, 2.71s/it] {'loss': 0.5444, 'grad_norm': 2.6236418539062303, 'learning_rate': 4.240945736152101e-06, 'epoch': 0.28} 28%|██▊ | 3413/12313 [2:32:55<6:42:06, 2.71s/it] 28%|██▊ | 3414/12313 [2:32:57<6:43:17, 2.72s/it] {'loss': 0.5089, 'grad_norm': 3.283626852296422, 'learning_rate': 4.240473716992786e-06, 'epoch': 0.28} 28%|██▊ | 3414/12313 [2:32:57<6:43:17, 2.72s/it] 28%|██▊ | 3415/12313 [2:33:00<6:47:20, 2.75s/it] {'loss': 0.6291, 'grad_norm': 4.388557235644921, 'learning_rate': 4.240001577401903e-06, 'epoch': 0.28} 28%|██▊ | 3415/12313 [2:33:00<6:47:20, 2.75s/it] 28%|██▊ | 3416/12313 [2:33:03<6:47:47, 2.75s/it] {'loss': 0.3563, 'grad_norm': 5.882384300446532, 'learning_rate': 4.239529317412118e-06, 'epoch': 0.28} 28%|██▊ | 3416/12313 [2:33:03<6:47:47, 2.75s/it] 28%|██▊ | 3417/12313 [2:33:06<6:47:24, 2.75s/it] {'loss': 0.4948, 'grad_norm': 5.663696285420261, 'learning_rate': 4.239056937056111e-06, 'epoch': 0.28} 28%|██▊ | 3417/12313 [2:33:06<6:47:24, 2.75s/it] 28%|██▊ | 3418/12313 [2:33:09<6:56:27, 2.81s/it] {'loss': 0.7371, 'grad_norm': 5.913404763781138, 'learning_rate': 4.238584436366568e-06, 'epoch': 0.28} 28%|██▊ | 3418/12313 [2:33:09<6:56:27, 2.81s/it] 28%|██▊ | 3419/12313 [2:33:11<6:56:37, 2.81s/it] {'loss': 0.5524, 'grad_norm': 2.9595554999598455, 'learning_rate': 4.238111815376182e-06, 'epoch': 0.28} 28%|██▊ | 3419/12313 [2:33:11<6:56:37, 2.81s/it] 28%|██▊ | 3420/12313 [2:33:14<6:51:41, 2.78s/it] {'loss': 0.436, 'grad_norm': 6.622001053281284, 'learning_rate': 4.23763907411766e-06, 'epoch': 0.28} 28%|██▊ | 3420/12313 [2:33:14<6:51:41, 2.78s/it] 28%|██▊ | 3421/12313 [2:33:17<7:03:40, 2.86s/it] {'loss': 0.4323, 'grad_norm': 4.194004633417891, 'learning_rate': 4.237166212623708e-06, 'epoch': 0.28} 28%|██▊ | 3421/12313 [2:33:17<7:03:40, 2.86s/it] 28%|██▊ | 3422/12313 [2:33:20<6:56:12, 2.81s/it] {'loss': 0.496, 'grad_norm': 5.700219795773494, 'learning_rate': 4.236693230927048e-06, 'epoch': 0.28} 28%|██▊ | 3422/12313 [2:33:20<6:56:12, 2.81s/it] 28%|██▊ | 3423/12313 [2:33:22<6:45:03, 2.73s/it] {'loss': 0.5907, 'grad_norm': 4.15322324920816, 'learning_rate': 4.2362201290604085e-06, 'epoch': 0.28} 28%|██▊ | 3423/12313 [2:33:22<6:45:03, 2.73s/it] 28%|██▊ | 3424/12313 [2:33:25<6:39:24, 2.70s/it] {'loss': 0.652, 'grad_norm': 5.540030074318849, 'learning_rate': 4.235746907056525e-06, 'epoch': 0.28} 28%|██▊ | 3424/12313 [2:33:25<6:39:24, 2.70s/it] 28%|██▊ | 3425/12313 [2:33:28<6:43:23, 2.72s/it] {'loss': 0.4876, 'grad_norm': 6.39998859259988, 'learning_rate': 4.235273564948142e-06, 'epoch': 0.28} 28%|██▊ | 3425/12313 [2:33:28<6:43:23, 2.72s/it] 28%|██▊ | 3426/12313 [2:33:30<6:39:30, 2.70s/it] {'loss': 0.7998, 'grad_norm': 3.7525190738960816, 'learning_rate': 4.234800102768012e-06, 'epoch': 0.28} 28%|██▊ | 3426/12313 [2:33:30<6:39:30, 2.70s/it] 28%|██▊ | 3427/12313 [2:33:33<6:36:48, 2.68s/it] {'loss': 0.6744, 'grad_norm': 4.294740550636778, 'learning_rate': 4.234326520548895e-06, 'epoch': 0.28} 28%|██▊ | 3427/12313 [2:33:33<6:36:48, 2.68s/it] 28%|██▊ | 3428/12313 [2:33:36<6:39:35, 2.70s/it] {'loss': 0.4104, 'grad_norm': 5.52457863696226, 'learning_rate': 4.233852818323563e-06, 'epoch': 0.28} 28%|██▊ | 3428/12313 [2:33:36<6:39:35, 2.70s/it] 28%|██▊ | 3429/12313 [2:33:38<6:38:55, 2.69s/it] {'loss': 0.5443, 'grad_norm': 5.182650646362298, 'learning_rate': 4.233378996124792e-06, 'epoch': 0.28} 28%|██▊ | 3429/12313 [2:33:38<6:38:55, 2.69s/it] 28%|██▊ | 3430/12313 [2:33:41<6:41:26, 2.71s/it] {'loss': 0.6733, 'grad_norm': 8.063914556948795, 'learning_rate': 4.232905053985368e-06, 'epoch': 0.28} 28%|██▊ | 3430/12313 [2:33:41<6:41:26, 2.71s/it] 28%|██▊ | 3431/12313 [2:33:44<6:46:54, 2.75s/it] {'loss': 0.5109, 'grad_norm': 3.9395695955779186, 'learning_rate': 4.232430991938085e-06, 'epoch': 0.28} 28%|██▊ | 3431/12313 [2:33:44<6:46:54, 2.75s/it] 28%|██▊ | 3432/12313 [2:33:47<6:43:08, 2.72s/it] {'loss': 0.672, 'grad_norm': 2.5171709827838993, 'learning_rate': 4.231956810015747e-06, 'epoch': 0.28} 28%|██▊ | 3432/12313 [2:33:47<6:43:08, 2.72s/it] 28%|██▊ | 3433/12313 [2:33:50<7:04:10, 2.87s/it] {'loss': 0.5055, 'grad_norm': 5.452147154881219, 'learning_rate': 4.231482508251164e-06, 'epoch': 0.28} 28%|██▊ | 3433/12313 [2:33:50<7:04:10, 2.87s/it] 28%|██▊ | 3434/12313 [2:33:53<7:04:04, 2.87s/it] {'loss': 0.4714, 'grad_norm': 10.095151440140139, 'learning_rate': 4.231008086677154e-06, 'epoch': 0.28} 28%|██▊ | 3434/12313 [2:33:53<7:04:04, 2.87s/it] 28%|██▊ | 3435/12313 [2:33:55<6:54:44, 2.80s/it] {'loss': 0.5819, 'grad_norm': 4.246691334687768, 'learning_rate': 4.230533545326547e-06, 'epoch': 0.28} 28%|██▊ | 3435/12313 [2:33:55<6:54:44, 2.80s/it] 28%|██▊ | 3436/12313 [2:33:58<6:52:20, 2.79s/it] {'loss': 0.7539, 'grad_norm': 7.118049034428091, 'learning_rate': 4.230058884232177e-06, 'epoch': 0.28} 28%|██▊ | 3436/12313 [2:33:58<6:52:20, 2.79s/it] 28%|██▊ | 3437/12313 [2:34:01<6:44:59, 2.74s/it] {'loss': 0.5447, 'grad_norm': 4.493070103171718, 'learning_rate': 4.229584103426888e-06, 'epoch': 0.28} 28%|██▊ | 3437/12313 [2:34:01<6:44:59, 2.74s/it] 28%|██▊ | 3438/12313 [2:34:04<6:46:56, 2.75s/it] {'loss': 0.5914, 'grad_norm': 3.9508050807048236, 'learning_rate': 4.229109202943533e-06, 'epoch': 0.28} 28%|██▊ | 3438/12313 [2:34:04<6:46:56, 2.75s/it] 28%|██▊ | 3439/12313 [2:34:06<6:45:43, 2.74s/it] {'loss': 0.5831, 'grad_norm': 6.3185262215803775, 'learning_rate': 4.228634182814972e-06, 'epoch': 0.28} 28%|██▊ | 3439/12313 [2:34:06<6:45:43, 2.74s/it] 28%|██▊ | 3440/12313 [2:34:09<6:45:00, 2.74s/it] {'loss': 0.6527, 'grad_norm': 4.369461605720211, 'learning_rate': 4.228159043074075e-06, 'epoch': 0.28} 28%|██▊ | 3440/12313 [2:34:09<6:45:00, 2.74s/it] 28%|██▊ | 3441/12313 [2:34:12<6:44:29, 2.74s/it] {'loss': 0.5189, 'grad_norm': 6.77973389079385, 'learning_rate': 4.227683783753717e-06, 'epoch': 0.28} 28%|██▊ | 3441/12313 [2:34:12<6:44:29, 2.74s/it] 28%|██▊ | 3442/12313 [2:34:14<6:39:08, 2.70s/it] {'loss': 0.5284, 'grad_norm': 4.379921845032675, 'learning_rate': 4.227208404886787e-06, 'epoch': 0.28} 28%|██▊ | 3442/12313 [2:34:14<6:39:08, 2.70s/it] 28%|██▊ | 3443/12313 [2:34:17<6:41:15, 2.71s/it] {'loss': 0.5291, 'grad_norm': 4.507924635404893, 'learning_rate': 4.2267329065061745e-06, 'epoch': 0.28} 28%|██▊ | 3443/12313 [2:34:17<6:41:15, 2.71s/it] 28%|██▊ | 3444/12313 [2:34:20<6:41:36, 2.72s/it] {'loss': 0.5183, 'grad_norm': 6.075130482555993, 'learning_rate': 4.226257288644784e-06, 'epoch': 0.28} 28%|██▊ | 3444/12313 [2:34:20<6:41:36, 2.72s/it] 28%|██▊ | 3445/12313 [2:34:23<6:49:53, 2.77s/it] {'loss': 0.5276, 'grad_norm': 4.72530804483612, 'learning_rate': 4.225781551335526e-06, 'epoch': 0.28} 28%|██▊ | 3445/12313 [2:34:23<6:49:53, 2.77s/it] 28%|██▊ | 3446/12313 [2:34:25<6:46:20, 2.75s/it] {'loss': 0.5282, 'grad_norm': 5.2670855376913535, 'learning_rate': 4.225305694611318e-06, 'epoch': 0.28} 28%|██▊ | 3446/12313 [2:34:25<6:46:20, 2.75s/it] 28%|██▊ | 3447/12313 [2:34:28<6:51:43, 2.79s/it] {'loss': 0.5453, 'grad_norm': 4.900878051447264, 'learning_rate': 4.224829718505087e-06, 'epoch': 0.28} 28%|██▊ | 3447/12313 [2:34:28<6:51:43, 2.79s/it] 28%|██▊ | 3448/12313 [2:34:31<6:48:51, 2.77s/it] {'loss': 0.6766, 'grad_norm': 7.178013534818383, 'learning_rate': 4.224353623049767e-06, 'epoch': 0.28} 28%|██▊ | 3448/12313 [2:34:31<6:48:51, 2.77s/it] 28%|██▊ | 3449/12313 [2:34:34<6:57:06, 2.82s/it] {'loss': 0.6089, 'grad_norm': 4.664935512001858, 'learning_rate': 4.2238774082783025e-06, 'epoch': 0.28} 28%|██▊ | 3449/12313 [2:34:34<6:57:06, 2.82s/it] 28%|██▊ | 3450/12313 [2:34:37<6:56:25, 2.82s/it] {'loss': 0.6572, 'grad_norm': 4.852476116024416, 'learning_rate': 4.223401074223646e-06, 'epoch': 0.28} 28%|██▊ | 3450/12313 [2:34:37<6:56:25, 2.82s/it] 28%|██▊ | 3451/12313 [2:34:39<6:51:28, 2.79s/it] {'loss': 0.5789, 'grad_norm': 8.563098982741186, 'learning_rate': 4.222924620918755e-06, 'epoch': 0.28} 28%|██▊ | 3451/12313 [2:34:39<6:51:28, 2.79s/it] 28%|██▊ | 3452/12313 [2:34:43<7:12:07, 2.93s/it] {'loss': 0.6374, 'grad_norm': 4.014234539171271, 'learning_rate': 4.222448048396599e-06, 'epoch': 0.28} 28%|██▊ | 3452/12313 [2:34:43<7:12:07, 2.93s/it] 28%|██▊ | 3453/12313 [2:34:45<6:52:21, 2.79s/it] {'loss': 0.5382, 'grad_norm': 4.822310015531145, 'learning_rate': 4.221971356690154e-06, 'epoch': 0.28} 28%|██▊ | 3453/12313 [2:34:45<6:52:21, 2.79s/it] 28%|██▊ | 3454/12313 [2:34:48<6:41:04, 2.72s/it] {'loss': 0.4697, 'grad_norm': 4.267417151682677, 'learning_rate': 4.221494545832405e-06, 'epoch': 0.28} 28%|██▊ | 3454/12313 [2:34:48<6:41:04, 2.72s/it] 28%|██▊ | 3455/12313 [2:34:51<6:44:46, 2.74s/it] {'loss': 0.467, 'grad_norm': 8.796169733126867, 'learning_rate': 4.221017615856344e-06, 'epoch': 0.28} 28%|██▊ | 3455/12313 [2:34:51<6:44:46, 2.74s/it] 28%|██▊ | 3456/12313 [2:34:53<6:50:17, 2.78s/it] {'loss': 0.5189, 'grad_norm': 5.742915475497312, 'learning_rate': 4.220540566794972e-06, 'epoch': 0.28} 28%|██▊ | 3456/12313 [2:34:53<6:50:17, 2.78s/it] 28%|██▊ | 3457/12313 [2:34:56<6:37:37, 2.69s/it] {'loss': 0.6394, 'grad_norm': 4.702000191158769, 'learning_rate': 4.220063398681299e-06, 'epoch': 0.28} 28%|██▊ | 3457/12313 [2:34:56<6:37:37, 2.69s/it] 28%|██▊ | 3458/12313 [2:34:59<6:38:38, 2.70s/it] {'loss': 0.427, 'grad_norm': 7.827322462489466, 'learning_rate': 4.219586111548342e-06, 'epoch': 0.28} 28%|██▊ | 3458/12313 [2:34:59<6:38:38, 2.70s/it] 28%|██▊ | 3459/12313 [2:35:02<6:48:17, 2.77s/it] {'loss': 0.5275, 'grad_norm': 4.5952790790019495, 'learning_rate': 4.219108705429127e-06, 'epoch': 0.28} 28%|██▊ | 3459/12313 [2:35:02<6:48:17, 2.77s/it] 28%|██▊ | 3460/12313 [2:35:04<6:42:48, 2.73s/it] {'loss': 0.6774, 'grad_norm': 4.260809941886097, 'learning_rate': 4.218631180356688e-06, 'epoch': 0.28} 28%|██▊ | 3460/12313 [2:35:04<6:42:48, 2.73s/it] 28%|██▊ | 3461/12313 [2:35:07<6:44:28, 2.74s/it] {'loss': 0.5556, 'grad_norm': 5.933145706552524, 'learning_rate': 4.218153536364067e-06, 'epoch': 0.28} 28%|██▊ | 3461/12313 [2:35:07<6:44:28, 2.74s/it] 28%|██▊ | 3462/12313 [2:35:10<6:50:47, 2.78s/it] {'loss': 0.6402, 'grad_norm': 5.5215075629008545, 'learning_rate': 4.217675773484314e-06, 'epoch': 0.28} 28%|██▊ | 3462/12313 [2:35:10<6:50:47, 2.78s/it] 28%|██▊ | 3463/12313 [2:35:13<6:48:06, 2.77s/it] {'loss': 0.553, 'grad_norm': 4.367099875870321, 'learning_rate': 4.217197891750488e-06, 'epoch': 0.28} 28%|██▊ | 3463/12313 [2:35:13<6:48:06, 2.77s/it] 28%|██▊ | 3464/12313 [2:35:15<6:36:42, 2.69s/it] {'loss': 0.7601, 'grad_norm': 8.120031592132388, 'learning_rate': 4.216719891195657e-06, 'epoch': 0.28} 28%|██▊ | 3464/12313 [2:35:15<6:36:42, 2.69s/it] 28%|██▊ | 3465/12313 [2:35:18<6:32:38, 2.66s/it] {'loss': 0.3699, 'grad_norm': 13.915399996993806, 'learning_rate': 4.216241771852895e-06, 'epoch': 0.28} 28%|██▊ | 3465/12313 [2:35:18<6:32:38, 2.66s/it] 28%|██▊ | 3466/12313 [2:35:21<6:46:02, 2.75s/it] {'loss': 0.6308, 'grad_norm': 4.475259050725444, 'learning_rate': 4.215763533755285e-06, 'epoch': 0.28} 28%|██▊ | 3466/12313 [2:35:21<6:46:02, 2.75s/it] 28%|██▊ | 3467/12313 [2:35:24<6:53:11, 2.80s/it] {'loss': 0.5941, 'grad_norm': 4.246073015911453, 'learning_rate': 4.215285176935919e-06, 'epoch': 0.28} 28%|██▊ | 3467/12313 [2:35:24<6:53:11, 2.80s/it] 28%|██▊ | 3468/12313 [2:35:26<6:47:09, 2.76s/it] {'loss': 0.5504, 'grad_norm': 4.463656799978896, 'learning_rate': 4.214806701427896e-06, 'epoch': 0.28} 28%|██▊ | 3468/12313 [2:35:26<6:47:09, 2.76s/it] 28%|██▊ | 3469/12313 [2:35:29<6:55:02, 2.82s/it] {'loss': 0.5993, 'grad_norm': 2.9321912614363517, 'learning_rate': 4.214328107264326e-06, 'epoch': 0.28} 28%|██▊ | 3469/12313 [2:35:29<6:55:02, 2.82s/it] 28%|██▊ | 3470/12313 [2:35:32<6:47:59, 2.77s/it] {'loss': 0.8218, 'grad_norm': 5.912480103116702, 'learning_rate': 4.213849394478323e-06, 'epoch': 0.28} 28%|██▊ | 3470/12313 [2:35:32<6:47:59, 2.77s/it] 28%|██▊ | 3471/12313 [2:35:35<7:00:42, 2.85s/it] {'loss': 0.5549, 'grad_norm': 3.0648081596549996, 'learning_rate': 4.213370563103013e-06, 'epoch': 0.28} 28%|██▊ | 3471/12313 [2:35:35<7:00:42, 2.85s/it] 28%|██▊ | 3472/12313 [2:35:38<6:53:58, 2.81s/it] {'loss': 0.539, 'grad_norm': 7.610704699641839, 'learning_rate': 4.212891613171528e-06, 'epoch': 0.28} 28%|██▊ | 3472/12313 [2:35:38<6:53:58, 2.81s/it] 28%|██▊ | 3473/12313 [2:35:40<6:48:06, 2.77s/it] {'loss': 0.5433, 'grad_norm': 4.821090407393734, 'learning_rate': 4.212412544717009e-06, 'epoch': 0.28} 28%|██▊ | 3473/12313 [2:35:40<6:48:06, 2.77s/it] 28%|██▊ | 3474/12313 [2:35:43<6:39:01, 2.71s/it] {'loss': 0.5649, 'grad_norm': 4.480177003679071, 'learning_rate': 4.211933357772604e-06, 'epoch': 0.28} 28%|██▊ | 3474/12313 [2:35:43<6:39:01, 2.71s/it] 28%|██▊ | 3475/12313 [2:35:45<6:32:32, 2.66s/it] {'loss': 0.5074, 'grad_norm': 6.0539202446851546, 'learning_rate': 4.211454052371471e-06, 'epoch': 0.28} 28%|██▊ | 3475/12313 [2:35:45<6:32:32, 2.66s/it] 28%|██▊ | 3476/12313 [2:35:48<6:16:04, 2.55s/it] {'loss': 0.6066, 'grad_norm': 3.2128791712651066, 'learning_rate': 4.210974628546776e-06, 'epoch': 0.28} 28%|██▊ | 3476/12313 [2:35:48<6:16:04, 2.55s/it] 28%|██▊ | 3477/12313 [2:35:50<6:17:32, 2.56s/it] {'loss': 0.5114, 'grad_norm': 5.748988331106903, 'learning_rate': 4.210495086331691e-06, 'epoch': 0.28} 28%|██▊ | 3477/12313 [2:35:50<6:17:32, 2.56s/it] 28%|██▊ | 3478/12313 [2:35:53<6:21:29, 2.59s/it] {'loss': 0.6491, 'grad_norm': 2.985462121611765, 'learning_rate': 4.2100154257594e-06, 'epoch': 0.28} 28%|██▊ | 3478/12313 [2:35:53<6:21:29, 2.59s/it] 28%|██▊ | 3479/12313 [2:35:55<6:19:40, 2.58s/it] {'loss': 0.5046, 'grad_norm': 4.631989287533722, 'learning_rate': 4.20953564686309e-06, 'epoch': 0.28} 28%|██▊ | 3479/12313 [2:35:55<6:19:40, 2.58s/it] 28%|██▊ | 3480/12313 [2:35:58<6:18:28, 2.57s/it] {'loss': 0.5868, 'grad_norm': 7.645616919092413, 'learning_rate': 4.2090557496759615e-06, 'epoch': 0.28} 28%|██▊ | 3480/12313 [2:35:58<6:18:28, 2.57s/it] 28%|██▊ | 3481/12313 [2:36:01<6:26:53, 2.63s/it] {'loss': 0.539, 'grad_norm': 13.216352857254574, 'learning_rate': 4.208575734231221e-06, 'epoch': 0.28} 28%|██▊ | 3481/12313 [2:36:01<6:26:53, 2.63s/it] 28%|██▊ | 3482/12313 [2:36:04<6:35:02, 2.68s/it] {'loss': 0.6534, 'grad_norm': 4.282296217606311, 'learning_rate': 4.208095600562081e-06, 'epoch': 0.28} 28%|██▊ | 3482/12313 [2:36:04<6:35:02, 2.68s/it] 28%|██▊ | 3483/12313 [2:36:07<6:48:41, 2.78s/it] {'loss': 0.5671, 'grad_norm': 3.613475303783212, 'learning_rate': 4.2076153487017655e-06, 'epoch': 0.28} 28%|██▊ | 3483/12313 [2:36:07<6:48:41, 2.78s/it] 28%|██▊ | 3484/12313 [2:36:09<6:46:34, 2.76s/it] {'loss': 0.5416, 'grad_norm': 4.958757945118231, 'learning_rate': 4.207134978683506e-06, 'epoch': 0.28} 28%|██▊ | 3484/12313 [2:36:09<6:46:34, 2.76s/it] 28%|██▊ | 3485/12313 [2:36:12<6:38:32, 2.71s/it] {'loss': 0.6141, 'grad_norm': 4.199726726733855, 'learning_rate': 4.206654490540541e-06, 'epoch': 0.28} 28%|██▊ | 3485/12313 [2:36:12<6:38:32, 2.71s/it] 28%|██▊ | 3486/12313 [2:36:15<6:45:09, 2.75s/it] {'loss': 0.6657, 'grad_norm': 4.600582479141581, 'learning_rate': 4.206173884306116e-06, 'epoch': 0.28} 28%|██▊ | 3486/12313 [2:36:15<6:45:09, 2.75s/it] 28%|██▊ | 3487/12313 [2:36:17<6:41:21, 2.73s/it] {'loss': 0.5541, 'grad_norm': 7.6947988469297375, 'learning_rate': 4.20569316001349e-06, 'epoch': 0.28} 28%|██▊ | 3487/12313 [2:36:17<6:41:21, 2.73s/it] 28%|██▊ | 3488/12313 [2:36:20<6:33:23, 2.67s/it] {'loss': 0.6524, 'grad_norm': 5.367699544702637, 'learning_rate': 4.205212317695924e-06, 'epoch': 0.28} 28%|██▊ | 3488/12313 [2:36:20<6:33:23, 2.67s/it] 28%|██▊ | 3489/12313 [2:36:23<6:44:59, 2.75s/it] {'loss': 0.5518, 'grad_norm': 6.029688715188512, 'learning_rate': 4.204731357386689e-06, 'epoch': 0.28} 28%|██▊ | 3489/12313 [2:36:23<6:44:59, 2.75s/it] 28%|██▊ | 3490/12313 [2:36:26<6:38:31, 2.71s/it] {'loss': 0.5141, 'grad_norm': 4.352090436645465, 'learning_rate': 4.204250279119068e-06, 'epoch': 0.28} 28%|██▊ | 3490/12313 [2:36:26<6:38:31, 2.71s/it] 28%|██▊ | 3491/12313 [2:36:28<6:33:47, 2.68s/it] {'loss': 0.5047, 'grad_norm': 4.073146574157928, 'learning_rate': 4.203769082926346e-06, 'epoch': 0.28} 28%|██▊ | 3491/12313 [2:36:28<6:33:47, 2.68s/it] 28%|██▊ | 3492/12313 [2:36:31<6:27:33, 2.64s/it] {'loss': 0.5063, 'grad_norm': 4.576078022715138, 'learning_rate': 4.203287768841822e-06, 'epoch': 0.28} 28%|██▊ | 3492/12313 [2:36:31<6:27:33, 2.64s/it] 28%|██▊ | 3493/12313 [2:36:33<6:31:21, 2.66s/it] {'loss': 0.5, 'grad_norm': 5.141508394238785, 'learning_rate': 4.202806336898798e-06, 'epoch': 0.28} 28%|██▊ | 3493/12313 [2:36:33<6:31:21, 2.66s/it] 28%|██▊ | 3494/12313 [2:36:36<6:27:58, 2.64s/it] {'loss': 0.5375, 'grad_norm': 4.33832646184562, 'learning_rate': 4.202324787130587e-06, 'epoch': 0.28} 28%|██▊ | 3494/12313 [2:36:36<6:27:58, 2.64s/it] 28%|██▊ | 3495/12313 [2:36:39<6:29:23, 2.65s/it] {'loss': 0.6192, 'grad_norm': 7.671372488757828, 'learning_rate': 4.201843119570511e-06, 'epoch': 0.28} 28%|██▊ | 3495/12313 [2:36:39<6:29:23, 2.65s/it] 28%|██▊ | 3496/12313 [2:36:41<6:33:20, 2.68s/it] {'loss': 0.5039, 'grad_norm': 6.000625361865301, 'learning_rate': 4.201361334251898e-06, 'epoch': 0.28} 28%|██▊ | 3496/12313 [2:36:41<6:33:20, 2.68s/it] 28%|██▊ | 3497/12313 [2:36:44<6:33:31, 2.68s/it] {'loss': 0.4379, 'grad_norm': 10.56462611054689, 'learning_rate': 4.200879431208084e-06, 'epoch': 0.28} 28%|██▊ | 3497/12313 [2:36:44<6:33:31, 2.68s/it] 28%|██▊ | 3498/12313 [2:36:47<6:34:12, 2.68s/it] {'loss': 0.4859, 'grad_norm': 5.2118113489010325, 'learning_rate': 4.200397410472416e-06, 'epoch': 0.28} 28%|██▊ | 3498/12313 [2:36:47<6:34:12, 2.68s/it] 28%|██▊ | 3499/12313 [2:36:50<6:43:35, 2.75s/it] {'loss': 0.5509, 'grad_norm': 3.7680592065666447, 'learning_rate': 4.199915272078247e-06, 'epoch': 0.28} 28%|██▊ | 3499/12313 [2:36:50<6:43:35, 2.75s/it] 28%|██▊ | 3500/12313 [2:36:52<6:30:13, 2.66s/it] {'loss': 0.5495, 'grad_norm': 4.998842782205934, 'learning_rate': 4.199433016058936e-06, 'epoch': 0.28} 28%|██▊ | 3500/12313 [2:36:52<6:30:13, 2.66s/it] 28%|██▊ | 3501/12313 [2:36:55<6:39:50, 2.72s/it] {'loss': 0.7963, 'grad_norm': 3.6541213589146166, 'learning_rate': 4.198950642447856e-06, 'epoch': 0.28} 28%|██▊ | 3501/12313 [2:36:55<6:39:50, 2.72s/it] 28%|██▊ | 3502/12313 [2:36:58<6:40:48, 2.73s/it] {'loss': 0.5354, 'grad_norm': 4.6320107487602655, 'learning_rate': 4.198468151278382e-06, 'epoch': 0.28} 28%|██▊ | 3502/12313 [2:36:58<6:40:48, 2.73s/it] 28%|██▊ | 3503/12313 [2:37:01<7:00:51, 2.87s/it] {'loss': 0.5639, 'grad_norm': 16.856109294832514, 'learning_rate': 4.197985542583902e-06, 'epoch': 0.28} 28%|██▊ | 3503/12313 [2:37:01<7:00:51, 2.87s/it] 28%|██▊ | 3504/12313 [2:37:03<6:43:54, 2.75s/it] {'loss': 0.5327, 'grad_norm': 4.341047779732985, 'learning_rate': 4.197502816397809e-06, 'epoch': 0.28} 28%|██▊ | 3504/12313 [2:37:03<6:43:54, 2.75s/it] 28%|██▊ | 3505/12313 [2:37:06<6:44:02, 2.75s/it] {'loss': 0.5685, 'grad_norm': 6.54679516358405, 'learning_rate': 4.197019972753504e-06, 'epoch': 0.28} 28%|██▊ | 3505/12313 [2:37:06<6:44:02, 2.75s/it] 28%|██▊ | 3506/12313 [2:37:09<6:47:09, 2.77s/it] {'loss': 0.6608, 'grad_norm': 8.538628468801688, 'learning_rate': 4.1965370116843985e-06, 'epoch': 0.28} 28%|██▊ | 3506/12313 [2:37:09<6:47:09, 2.77s/it] 28%|██▊ | 3507/12313 [2:37:12<6:38:00, 2.71s/it] {'loss': 0.5363, 'grad_norm': 6.781146736308602, 'learning_rate': 4.1960539332239115e-06, 'epoch': 0.28} 28%|██▊ | 3507/12313 [2:37:12<6:38:00, 2.71s/it] 28%|██▊ | 3508/12313 [2:37:15<6:53:29, 2.82s/it] {'loss': 0.4654, 'grad_norm': 6.568218511710609, 'learning_rate': 4.195570737405468e-06, 'epoch': 0.28} 28%|██▊ | 3508/12313 [2:37:15<6:53:29, 2.82s/it] 28%|██▊ | 3509/12313 [2:37:17<6:47:29, 2.78s/it] {'loss': 0.6075, 'grad_norm': 6.14037981907578, 'learning_rate': 4.195087424262503e-06, 'epoch': 0.28} 28%|██▊ | 3509/12313 [2:37:17<6:47:29, 2.78s/it] 29%|██▊ | 3510/12313 [2:37:20<6:41:40, 2.74s/it] {'loss': 0.4975, 'grad_norm': 3.9057111334726016, 'learning_rate': 4.194603993828459e-06, 'epoch': 0.29} 29%|██▊ | 3510/12313 [2:37:20<6:41:40, 2.74s/it] 29%|██▊ | 3511/12313 [2:37:23<6:37:59, 2.71s/it] {'loss': 0.6143, 'grad_norm': 6.56859758615991, 'learning_rate': 4.194120446136788e-06, 'epoch': 0.29} 29%|██▊ | 3511/12313 [2:37:23<6:37:59, 2.71s/it] 29%|██▊ | 3512/12313 [2:37:25<6:36:40, 2.70s/it] {'loss': 0.6135, 'grad_norm': 7.248281271806781, 'learning_rate': 4.193636781220948e-06, 'epoch': 0.29} 29%|██▊ | 3512/12313 [2:37:25<6:36:40, 2.70s/it] 29%|██▊ | 3513/12313 [2:37:28<6:38:30, 2.72s/it] {'loss': 0.644, 'grad_norm': 3.6814056969521265, 'learning_rate': 4.1931529991144056e-06, 'epoch': 0.29} 29%|██▊ | 3513/12313 [2:37:28<6:38:30, 2.72s/it] 29%|██▊ | 3514/12313 [2:37:31<6:33:45, 2.69s/it] {'loss': 0.4091, 'grad_norm': 4.124004556862366, 'learning_rate': 4.192669099850637e-06, 'epoch': 0.29} 29%|██▊ | 3514/12313 [2:37:31<6:33:45, 2.69s/it] 29%|██▊ | 3515/12313 [2:37:33<6:36:02, 2.70s/it] {'loss': 0.6916, 'grad_norm': 9.237020510741628, 'learning_rate': 4.192185083463125e-06, 'epoch': 0.29} 29%|██▊ | 3515/12313 [2:37:33<6:36:02, 2.70s/it] 29%|██▊ | 3516/12313 [2:37:36<6:36:24, 2.70s/it] {'loss': 0.641, 'grad_norm': 4.325518645446921, 'learning_rate': 4.19170094998536e-06, 'epoch': 0.29} 29%|██▊ | 3516/12313 [2:37:36<6:36:24, 2.70s/it] 29%|██▊ | 3517/12313 [2:37:39<6:40:47, 2.73s/it] {'loss': 0.5248, 'grad_norm': 3.772328467417333, 'learning_rate': 4.191216699450844e-06, 'epoch': 0.29} 29%|██▊ | 3517/12313 [2:37:39<6:40:47, 2.73s/it] 29%|██▊ | 3518/12313 [2:37:42<6:35:08, 2.70s/it] {'loss': 0.5488, 'grad_norm': 4.5828680511801, 'learning_rate': 4.190732331893083e-06, 'epoch': 0.29} 29%|██▊ | 3518/12313 [2:37:42<6:35:08, 2.70s/it] 29%|██▊ | 3519/12313 [2:37:44<6:34:36, 2.69s/it] {'loss': 0.6085, 'grad_norm': 6.7144709613110045, 'learning_rate': 4.190247847345591e-06, 'epoch': 0.29} 29%|██▊ | 3519/12313 [2:37:44<6:34:36, 2.69s/it] 29%|██▊ | 3520/12313 [2:37:47<6:36:31, 2.71s/it] {'loss': 0.4582, 'grad_norm': 6.61578411126398, 'learning_rate': 4.189763245841895e-06, 'epoch': 0.29} 29%|██▊ | 3520/12313 [2:37:47<6:36:31, 2.71s/it] 29%|██▊ | 3521/12313 [2:37:50<6:33:54, 2.69s/it] {'loss': 0.4666, 'grad_norm': 7.697721977409483, 'learning_rate': 4.189278527415524e-06, 'epoch': 0.29} 29%|██▊ | 3521/12313 [2:37:50<6:33:54, 2.69s/it] 29%|██▊ | 3522/12313 [2:37:52<6:36:22, 2.71s/it] {'loss': 0.5197, 'grad_norm': 9.805505848445298, 'learning_rate': 4.188793692100021e-06, 'epoch': 0.29} 29%|██▊ | 3522/12313 [2:37:52<6:36:22, 2.71s/it] 29%|██▊ | 3523/12313 [2:37:55<6:35:06, 2.70s/it] {'loss': 0.6191, 'grad_norm': 4.38159024051503, 'learning_rate': 4.1883087399289315e-06, 'epoch': 0.29} 29%|██▊ | 3523/12313 [2:37:55<6:35:06, 2.70s/it] 29%|██▊ | 3524/12313 [2:37:58<6:48:14, 2.79s/it] {'loss': 0.4839, 'grad_norm': 5.443162368213296, 'learning_rate': 4.187823670935812e-06, 'epoch': 0.29} 29%|██▊ | 3524/12313 [2:37:58<6:48:14, 2.79s/it] 29%|██▊ | 3525/12313 [2:38:01<6:43:21, 2.75s/it] {'loss': 0.5212, 'grad_norm': 9.060902521439406, 'learning_rate': 4.187338485154228e-06, 'epoch': 0.29} 29%|██▊ | 3525/12313 [2:38:01<6:43:21, 2.75s/it] 29%|██▊ | 3526/12313 [2:38:03<6:36:41, 2.71s/it] {'loss': 0.5874, 'grad_norm': 4.5276722061450405, 'learning_rate': 4.186853182617751e-06, 'epoch': 0.29} 29%|██▊ | 3526/12313 [2:38:03<6:36:41, 2.71s/it] 29%|██▊ | 3527/12313 [2:38:06<6:36:12, 2.71s/it] {'loss': 0.6216, 'grad_norm': 4.787775939104659, 'learning_rate': 4.1863677633599605e-06, 'epoch': 0.29} 29%|██▊ | 3527/12313 [2:38:06<6:36:12, 2.71s/it] 29%|██▊ | 3528/12313 [2:38:09<6:30:11, 2.66s/it] {'loss': 0.8107, 'grad_norm': 4.032333523480888, 'learning_rate': 4.1858822274144465e-06, 'epoch': 0.29} 29%|██▊ | 3528/12313 [2:38:09<6:30:11, 2.66s/it] 29%|██▊ | 3529/12313 [2:38:11<6:35:26, 2.70s/it] {'loss': 0.5834, 'grad_norm': 4.164487711997063, 'learning_rate': 4.185396574814804e-06, 'epoch': 0.29} 29%|██▊ | 3529/12313 [2:38:11<6:35:26, 2.70s/it] 29%|██▊ | 3530/12313 [2:38:14<6:34:46, 2.70s/it] {'loss': 0.5395, 'grad_norm': 4.736119308914805, 'learning_rate': 4.184910805594639e-06, 'epoch': 0.29} 29%|██▊ | 3530/12313 [2:38:14<6:34:46, 2.70s/it] 29%|██▊ | 3531/12313 [2:38:17<6:33:11, 2.69s/it] {'loss': 0.634, 'grad_norm': 4.206365573240851, 'learning_rate': 4.184424919787563e-06, 'epoch': 0.29} 29%|██▊ | 3531/12313 [2:38:17<6:33:11, 2.69s/it] 29%|██▊ | 3532/12313 [2:38:19<6:28:35, 2.66s/it] {'loss': 0.512, 'grad_norm': 3.609471547339424, 'learning_rate': 4.183938917427198e-06, 'epoch': 0.29} 29%|██▊ | 3532/12313 [2:38:19<6:28:35, 2.66s/it] 29%|██▊ | 3533/12313 [2:38:22<6:17:39, 2.58s/it] {'loss': 0.6594, 'grad_norm': 3.3756577667802303, 'learning_rate': 4.183452798547171e-06, 'epoch': 0.29} 29%|██▊ | 3533/12313 [2:38:22<6:17:39, 2.58s/it] 29%|██▊ | 3534/12313 [2:38:24<6:22:01, 2.61s/it] {'loss': 0.5162, 'grad_norm': 4.577782600286523, 'learning_rate': 4.1829665631811214e-06, 'epoch': 0.29} 29%|██▊ | 3534/12313 [2:38:24<6:22:01, 2.61s/it] 29%|██▊ | 3535/12313 [2:38:27<6:24:04, 2.63s/it] {'loss': 0.6164, 'grad_norm': 5.423760632233463, 'learning_rate': 4.182480211362691e-06, 'epoch': 0.29} 29%|██▊ | 3535/12313 [2:38:27<6:24:04, 2.63s/it] 29%|██▊ | 3536/12313 [2:38:30<6:23:25, 2.62s/it] {'loss': 0.5202, 'grad_norm': 8.37175609091639, 'learning_rate': 4.181993743125535e-06, 'epoch': 0.29} 29%|██▊ | 3536/12313 [2:38:30<6:23:25, 2.62s/it] 29%|██▊ | 3537/12313 [2:38:32<6:27:23, 2.65s/it] {'loss': 0.7249, 'grad_norm': 4.332707609863406, 'learning_rate': 4.181507158503314e-06, 'epoch': 0.29} 29%|██▊ | 3537/12313 [2:38:32<6:27:23, 2.65s/it] 29%|██▊ | 3538/12313 [2:38:35<6:29:28, 2.66s/it] {'loss': 0.5169, 'grad_norm': 4.538954914725168, 'learning_rate': 4.1810204575296966e-06, 'epoch': 0.29} 29%|██▊ | 3538/12313 [2:38:35<6:29:28, 2.66s/it] 29%|██▊ | 3539/12313 [2:38:38<6:40:43, 2.74s/it] {'loss': 0.6053, 'grad_norm': 4.191746753570993, 'learning_rate': 4.180533640238361e-06, 'epoch': 0.29} 29%|██▊ | 3539/12313 [2:38:38<6:40:43, 2.74s/it] 29%|██▉ | 3540/12313 [2:38:41<6:40:16, 2.74s/it] {'loss': 0.5235, 'grad_norm': 4.85627080590084, 'learning_rate': 4.180046706662991e-06, 'epoch': 0.29} 29%|██▉ | 3540/12313 [2:38:41<6:40:16, 2.74s/it] 29%|██▉ | 3541/12313 [2:38:43<6:34:25, 2.70s/it] {'loss': 0.5635, 'grad_norm': 6.5813372383035915, 'learning_rate': 4.17955965683728e-06, 'epoch': 0.29} 29%|██▉ | 3541/12313 [2:38:43<6:34:25, 2.70s/it] 29%|██▉ | 3542/12313 [2:38:46<6:24:37, 2.63s/it] {'loss': 0.5271, 'grad_norm': 6.353502644005653, 'learning_rate': 4.17907249079493e-06, 'epoch': 0.29} 29%|██▉ | 3542/12313 [2:38:46<6:24:37, 2.63s/it] 29%|██▉ | 3543/12313 [2:38:48<6:27:23, 2.65s/it] {'loss': 0.5198, 'grad_norm': 3.316222423088927, 'learning_rate': 4.17858520856965e-06, 'epoch': 0.29} 29%|██▉ | 3543/12313 [2:38:48<6:27:23, 2.65s/it] 29%|██▉ | 3544/12313 [2:38:51<6:25:24, 2.64s/it] {'loss': 0.5364, 'grad_norm': 3.6027797950419846, 'learning_rate': 4.178097810195157e-06, 'epoch': 0.29} 29%|██▉ | 3544/12313 [2:38:51<6:25:24, 2.64s/it] 29%|██▉ | 3545/12313 [2:38:54<6:21:38, 2.61s/it] {'loss': 0.5973, 'grad_norm': 6.196439592693107, 'learning_rate': 4.177610295705178e-06, 'epoch': 0.29} 29%|██▉ | 3545/12313 [2:38:54<6:21:38, 2.61s/it] 29%|██▉ | 3546/12313 [2:38:57<6:40:27, 2.74s/it] {'loss': 0.6097, 'grad_norm': 3.281604318639601, 'learning_rate': 4.177122665133444e-06, 'epoch': 0.29} 29%|██▉ | 3546/12313 [2:38:57<6:40:27, 2.74s/it] 29%|██▉ | 3547/12313 [2:38:59<6:36:01, 2.71s/it] {'loss': 0.479, 'grad_norm': 5.0788056380548525, 'learning_rate': 4.176634918513698e-06, 'epoch': 0.29} 29%|██▉ | 3547/12313 [2:38:59<6:36:01, 2.71s/it] 29%|██▉ | 3548/12313 [2:39:02<6:36:43, 2.72s/it] {'loss': 0.6935, 'grad_norm': 10.492165661109485, 'learning_rate': 4.176147055879689e-06, 'epoch': 0.29} 29%|██▉ | 3548/12313 [2:39:02<6:36:43, 2.72s/it] 29%|██▉ | 3549/12313 [2:39:05<6:36:08, 2.71s/it] {'loss': 0.5723, 'grad_norm': 5.566980762163421, 'learning_rate': 4.175659077265175e-06, 'epoch': 0.29} 29%|██▉ | 3549/12313 [2:39:05<6:36:08, 2.71s/it] 29%|██▉ | 3550/12313 [2:39:07<6:34:31, 2.70s/it] {'loss': 0.4667, 'grad_norm': 4.510747031993934, 'learning_rate': 4.175170982703921e-06, 'epoch': 0.29} 29%|██▉ | 3550/12313 [2:39:07<6:34:31, 2.70s/it] 29%|██▉ | 3551/12313 [2:39:10<6:48:41, 2.80s/it] {'loss': 0.6429, 'grad_norm': 5.795017468211081, 'learning_rate': 4.1746827722297e-06, 'epoch': 0.29} 29%|██▉ | 3551/12313 [2:39:10<6:48:41, 2.80s/it] 29%|██▉ | 3552/12313 [2:39:13<6:43:21, 2.76s/it] {'loss': 0.6138, 'grad_norm': 4.90772281384199, 'learning_rate': 4.174194445876295e-06, 'epoch': 0.29} 29%|██▉ | 3552/12313 [2:39:13<6:43:21, 2.76s/it] 29%|██▉ | 3553/12313 [2:39:16<6:37:34, 2.72s/it] {'loss': 0.5942, 'grad_norm': 9.38458036391076, 'learning_rate': 4.1737060036774945e-06, 'epoch': 0.29} 29%|██▉ | 3553/12313 [2:39:16<6:37:34, 2.72s/it] 29%|██▉ | 3554/12313 [2:39:19<6:42:50, 2.76s/it] {'loss': 0.4725, 'grad_norm': 3.744621239251605, 'learning_rate': 4.173217445667097e-06, 'epoch': 0.29} 29%|██▉ | 3554/12313 [2:39:19<6:42:50, 2.76s/it] 29%|██▉ | 3555/12313 [2:39:21<6:35:31, 2.71s/it] {'loss': 0.4955, 'grad_norm': 3.5760812252375858, 'learning_rate': 4.172728771878908e-06, 'epoch': 0.29} 29%|██▉ | 3555/12313 [2:39:21<6:35:31, 2.71s/it] 29%|██▉ | 3556/12313 [2:39:24<6:24:25, 2.63s/it] {'loss': 0.5496, 'grad_norm': 5.300448357602322, 'learning_rate': 4.17223998234674e-06, 'epoch': 0.29} 29%|██▉ | 3556/12313 [2:39:24<6:24:25, 2.63s/it] 29%|██▉ | 3557/12313 [2:39:26<6:21:42, 2.62s/it] {'loss': 0.6269, 'grad_norm': 4.301151535524972, 'learning_rate': 4.171751077104415e-06, 'epoch': 0.29} 29%|██▉ | 3557/12313 [2:39:26<6:21:42, 2.62s/it] 29%|██▉ | 3558/12313 [2:39:29<6:25:41, 2.64s/it] {'loss': 0.6023, 'grad_norm': 4.150238159594849, 'learning_rate': 4.171262056185764e-06, 'epoch': 0.29} 29%|██▉ | 3558/12313 [2:39:29<6:25:41, 2.64s/it] 29%|██▉ | 3559/12313 [2:39:32<6:29:03, 2.67s/it] {'loss': 0.5044, 'grad_norm': 6.588110479356063, 'learning_rate': 4.170772919624624e-06, 'epoch': 0.29} 29%|██▉ | 3559/12313 [2:39:32<6:29:03, 2.67s/it] 29%|██▉ | 3560/12313 [2:39:34<6:30:02, 2.67s/it] {'loss': 0.4627, 'grad_norm': 9.52780556407402, 'learning_rate': 4.170283667454839e-06, 'epoch': 0.29} 29%|██▉ | 3560/12313 [2:39:34<6:30:02, 2.67s/it] 29%|██▉ | 3561/12313 [2:39:37<6:44:49, 2.78s/it] {'loss': 0.5403, 'grad_norm': 3.112608140290074, 'learning_rate': 4.169794299710266e-06, 'epoch': 0.29} 29%|██▉ | 3561/12313 [2:39:37<6:44:49, 2.78s/it] 29%|██▉ | 3562/12313 [2:39:40<6:37:11, 2.72s/it] {'loss': 0.6422, 'grad_norm': 5.782425679496101, 'learning_rate': 4.169304816424763e-06, 'epoch': 0.29} 29%|██▉ | 3562/12313 [2:39:40<6:37:11, 2.72s/it] 29%|██▉ | 3563/12313 [2:39:43<6:37:38, 2.73s/it] {'loss': 0.5983, 'grad_norm': 6.327129043676947, 'learning_rate': 4.168815217632202e-06, 'epoch': 0.29} 29%|██▉ | 3563/12313 [2:39:43<6:37:38, 2.73s/it] 29%|██▉ | 3564/12313 [2:39:45<6:36:10, 2.72s/it] {'loss': 0.6639, 'grad_norm': 5.64892446517355, 'learning_rate': 4.168325503366461e-06, 'epoch': 0.29} 29%|██▉ | 3564/12313 [2:39:45<6:36:10, 2.72s/it] 29%|██▉ | 3565/12313 [2:39:48<6:20:25, 2.61s/it] {'loss': 0.5173, 'grad_norm': 4.212091644362981, 'learning_rate': 4.167835673661422e-06, 'epoch': 0.29} 29%|██▉ | 3565/12313 [2:39:48<6:20:25, 2.61s/it] 29%|██▉ | 3566/12313 [2:39:50<6:21:41, 2.62s/it] {'loss': 0.5776, 'grad_norm': 4.627484710444349, 'learning_rate': 4.167345728550984e-06, 'epoch': 0.29} 29%|██▉ | 3566/12313 [2:39:50<6:21:41, 2.62s/it] 29%|██▉ | 3567/12313 [2:39:53<6:25:28, 2.64s/it] {'loss': 0.5357, 'grad_norm': 6.431815623240765, 'learning_rate': 4.166855668069045e-06, 'epoch': 0.29} 29%|██▉ | 3567/12313 [2:39:53<6:25:28, 2.64s/it] 29%|██▉ | 3568/12313 [2:39:56<6:16:52, 2.59s/it] {'loss': 0.4888, 'grad_norm': 5.104463737375422, 'learning_rate': 4.166365492249514e-06, 'epoch': 0.29} 29%|██▉ | 3568/12313 [2:39:56<6:16:52, 2.59s/it] 29%|██▉ | 3569/12313 [2:39:59<6:34:00, 2.70s/it] {'loss': 0.4652, 'grad_norm': 2.6553169339995755, 'learning_rate': 4.1658752011263125e-06, 'epoch': 0.29} 29%|██▉ | 3569/12313 [2:39:59<6:34:00, 2.70s/it] 29%|██▉ | 3570/12313 [2:40:01<6:31:44, 2.69s/it] {'loss': 0.6268, 'grad_norm': 4.522379342995965, 'learning_rate': 4.1653847947333625e-06, 'epoch': 0.29} 29%|██▉ | 3570/12313 [2:40:01<6:31:44, 2.69s/it] 29%|██▉ | 3571/12313 [2:40:04<6:27:31, 2.66s/it] {'loss': 0.7023, 'grad_norm': 4.206423587982746, 'learning_rate': 4.164894273104599e-06, 'epoch': 0.29} 29%|██▉ | 3571/12313 [2:40:04<6:27:31, 2.66s/it] 29%|██▉ | 3572/12313 [2:40:07<6:35:16, 2.71s/it] {'loss': 0.6467, 'grad_norm': 3.550921658943635, 'learning_rate': 4.164403636273963e-06, 'epoch': 0.29} 29%|██▉ | 3572/12313 [2:40:07<6:35:16, 2.71s/it] 29%|██▉ | 3573/12313 [2:40:09<6:40:03, 2.75s/it] {'loss': 0.6449, 'grad_norm': 4.41960430630749, 'learning_rate': 4.163912884275403e-06, 'epoch': 0.29} 29%|██▉ | 3573/12313 [2:40:09<6:40:03, 2.75s/it] 29%|██▉ | 3574/12313 [2:40:12<6:46:38, 2.79s/it] {'loss': 0.532, 'grad_norm': 5.335473970114484, 'learning_rate': 4.163422017142879e-06, 'epoch': 0.29} 29%|██▉ | 3574/12313 [2:40:12<6:46:38, 2.79s/it] 29%|██▉ | 3575/12313 [2:40:15<6:36:05, 2.72s/it] {'loss': 0.5057, 'grad_norm': 9.485322332094816, 'learning_rate': 4.162931034910354e-06, 'epoch': 0.29} 29%|██▉ | 3575/12313 [2:40:15<6:36:05, 2.72s/it] 29%|██▉ | 3576/12313 [2:40:18<6:41:55, 2.76s/it] {'loss': 0.5523, 'grad_norm': 5.472990906211878, 'learning_rate': 4.162439937611803e-06, 'epoch': 0.29} 29%|██▉ | 3576/12313 [2:40:18<6:41:55, 2.76s/it] 29%|██▉ | 3577/12313 [2:40:20<6:33:42, 2.70s/it] {'loss': 0.5859, 'grad_norm': 4.626244114541653, 'learning_rate': 4.161948725281206e-06, 'epoch': 0.29} 29%|██▉ | 3577/12313 [2:40:20<6:33:42, 2.70s/it] 29%|██▉ | 3578/12313 [2:40:23<6:37:17, 2.73s/it] {'loss': 0.5745, 'grad_norm': 4.102620389372604, 'learning_rate': 4.161457397952553e-06, 'epoch': 0.29} 29%|██▉ | 3578/12313 [2:40:23<6:37:17, 2.73s/it] 29%|██▉ | 3579/12313 [2:40:26<6:34:17, 2.71s/it] {'loss': 0.4707, 'grad_norm': 5.532218377747797, 'learning_rate': 4.160965955659843e-06, 'epoch': 0.29} 29%|██▉ | 3579/12313 [2:40:26<6:34:17, 2.71s/it] 29%|██▉ | 3580/12313 [2:40:28<6:26:46, 2.66s/it] {'loss': 0.5238, 'grad_norm': 4.2544863217335545, 'learning_rate': 4.160474398437077e-06, 'epoch': 0.29} 29%|██▉ | 3580/12313 [2:40:28<6:26:46, 2.66s/it] 29%|██▉ | 3581/12313 [2:40:31<6:25:26, 2.65s/it] {'loss': 0.8256, 'grad_norm': 5.496605956094565, 'learning_rate': 4.159982726318271e-06, 'epoch': 0.29} 29%|██▉ | 3581/12313 [2:40:31<6:25:26, 2.65s/it] 29%|██▉ | 3582/12313 [2:40:34<6:27:48, 2.66s/it] {'loss': 0.6179, 'grad_norm': 4.300837180788803, 'learning_rate': 4.159490939337447e-06, 'epoch': 0.29} 29%|██▉ | 3582/12313 [2:40:34<6:27:48, 2.66s/it] 29%|██▉ | 3583/12313 [2:40:36<6:30:26, 2.68s/it] {'loss': 0.5216, 'grad_norm': 3.957326025756882, 'learning_rate': 4.158999037528632e-06, 'epoch': 0.29} 29%|██▉ | 3583/12313 [2:40:36<6:30:26, 2.68s/it] 29%|██▉ | 3584/12313 [2:40:39<6:28:22, 2.67s/it] {'loss': 0.859, 'grad_norm': 5.327052291363593, 'learning_rate': 4.1585070209258635e-06, 'epoch': 0.29} 29%|██▉ | 3584/12313 [2:40:39<6:28:22, 2.67s/it] 29%|██▉ | 3585/12313 [2:40:42<6:25:46, 2.65s/it] {'loss': 0.4936, 'grad_norm': 6.708637795707786, 'learning_rate': 4.158014889563187e-06, 'epoch': 0.29} 29%|██▉ | 3585/12313 [2:40:42<6:25:46, 2.65s/it] 29%|██▉ | 3586/12313 [2:40:44<6:23:06, 2.63s/it] {'loss': 0.4951, 'grad_norm': 3.6081169619085487, 'learning_rate': 4.157522643474654e-06, 'epoch': 0.29} 29%|██▉ | 3586/12313 [2:40:44<6:23:06, 2.63s/it] 29%|██▉ | 3587/12313 [2:40:47<6:23:37, 2.64s/it] {'loss': 0.4514, 'grad_norm': 5.692407639431199, 'learning_rate': 4.157030282694328e-06, 'epoch': 0.29} 29%|██▉ | 3587/12313 [2:40:47<6:23:37, 2.64s/it] 29%|██▉ | 3588/12313 [2:40:49<6:23:18, 2.64s/it] {'loss': 0.6435, 'grad_norm': 5.96319840346763, 'learning_rate': 4.156537807256275e-06, 'epoch': 0.29} 29%|██▉ | 3588/12313 [2:40:49<6:23:18, 2.64s/it] 29%|██▉ | 3589/12313 [2:40:52<6:36:47, 2.73s/it] {'loss': 0.5785, 'grad_norm': 4.057628153052015, 'learning_rate': 4.156045217194573e-06, 'epoch': 0.29} 29%|██▉ | 3589/12313 [2:40:52<6:36:47, 2.73s/it] 29%|██▉ | 3590/12313 [2:40:55<6:31:37, 2.69s/it] {'loss': 0.561, 'grad_norm': 4.725549294999352, 'learning_rate': 4.1555525125433074e-06, 'epoch': 0.29} 29%|██▉ | 3590/12313 [2:40:55<6:31:37, 2.69s/it] 29%|██▉ | 3591/12313 [2:40:58<6:33:53, 2.71s/it] {'loss': 0.4877, 'grad_norm': 4.897497425355875, 'learning_rate': 4.155059693336569e-06, 'epoch': 0.29} 29%|██▉ | 3591/12313 [2:40:58<6:33:53, 2.71s/it] 29%|██▉ | 3592/12313 [2:41:00<6:31:09, 2.69s/it] {'loss': 0.6536, 'grad_norm': 6.7306605891617, 'learning_rate': 4.1545667596084596e-06, 'epoch': 0.29} 29%|██▉ | 3592/12313 [2:41:00<6:31:09, 2.69s/it] 29%|██▉ | 3593/12313 [2:41:03<6:37:05, 2.73s/it] {'loss': 0.6075, 'grad_norm': 4.458589616639652, 'learning_rate': 4.154073711393087e-06, 'epoch': 0.29} 29%|██▉ | 3593/12313 [2:41:03<6:37:05, 2.73s/it] 29%|██▉ | 3594/12313 [2:41:06<6:30:24, 2.69s/it] {'loss': 0.5503, 'grad_norm': 16.172763248388275, 'learning_rate': 4.153580548724567e-06, 'epoch': 0.29} 29%|██▉ | 3594/12313 [2:41:06<6:30:24, 2.69s/it] 29%|██▉ | 3595/12313 [2:41:08<6:24:42, 2.65s/it] {'loss': 0.5837, 'grad_norm': 5.597703145374191, 'learning_rate': 4.153087271637025e-06, 'epoch': 0.29} 29%|██▉ | 3595/12313 [2:41:08<6:24:42, 2.65s/it] 29%|██▉ | 3596/12313 [2:41:11<6:26:15, 2.66s/it] {'loss': 0.5668, 'grad_norm': 11.754470585418524, 'learning_rate': 4.1525938801645926e-06, 'epoch': 0.29} 29%|██▉ | 3596/12313 [2:41:11<6:26:15, 2.66s/it] 29%|██▉ | 3597/12313 [2:41:14<6:26:09, 2.66s/it] {'loss': 0.5732, 'grad_norm': 6.254803456678843, 'learning_rate': 4.152100374341409e-06, 'epoch': 0.29} 29%|██▉ | 3597/12313 [2:41:14<6:26:09, 2.66s/it] 29%|██▉ | 3598/12313 [2:41:17<6:37:58, 2.74s/it] {'loss': 0.6051, 'grad_norm': 5.190101012718226, 'learning_rate': 4.151606754201625e-06, 'epoch': 0.29} 29%|██▉ | 3598/12313 [2:41:17<6:37:58, 2.74s/it] 29%|██▉ | 3599/12313 [2:41:19<6:30:14, 2.69s/it] {'loss': 0.639, 'grad_norm': 10.669928997016616, 'learning_rate': 4.151113019779393e-06, 'epoch': 0.29} 29%|██▉ | 3599/12313 [2:41:19<6:30:14, 2.69s/it] 29%|██▉ | 3600/12313 [2:41:22<6:30:58, 2.69s/it] {'loss': 0.5745, 'grad_norm': 5.619799524172815, 'learning_rate': 4.150619171108879e-06, 'epoch': 0.29} 29%|██▉ | 3600/12313 [2:41:22<6:30:58, 2.69s/it] 29%|██▉ | 3601/12313 [2:41:25<6:39:19, 2.75s/it] {'loss': 0.6914, 'grad_norm': 4.562644028883115, 'learning_rate': 4.150125208224255e-06, 'epoch': 0.29} 29%|██▉ | 3601/12313 [2:41:25<6:39:19, 2.75s/it] 29%|██▉ | 3602/12313 [2:41:27<6:30:35, 2.69s/it] {'loss': 0.4882, 'grad_norm': 4.491512432892291, 'learning_rate': 4.149631131159698e-06, 'epoch': 0.29} 29%|██▉ | 3602/12313 [2:41:27<6:30:35, 2.69s/it] 29%|██▉ | 3603/12313 [2:41:30<6:26:26, 2.66s/it] {'loss': 0.5967, 'grad_norm': 4.214753071853409, 'learning_rate': 4.149136939949399e-06, 'epoch': 0.29} 29%|██▉ | 3603/12313 [2:41:30<6:26:26, 2.66s/it] 29%|██▉ | 3604/12313 [2:41:33<6:27:56, 2.67s/it] {'loss': 0.5239, 'grad_norm': 6.53315802645859, 'learning_rate': 4.14864263462755e-06, 'epoch': 0.29} 29%|██▉ | 3604/12313 [2:41:33<6:27:56, 2.67s/it] 29%|██▉ | 3605/12313 [2:41:36<6:41:18, 2.77s/it] {'loss': 0.6479, 'grad_norm': 5.806026900388708, 'learning_rate': 4.148148215228357e-06, 'epoch': 0.29} 29%|██▉ | 3605/12313 [2:41:36<6:41:18, 2.77s/it] 29%|██▉ | 3606/12313 [2:41:38<6:28:22, 2.68s/it] {'loss': 0.5045, 'grad_norm': 4.392767448225316, 'learning_rate': 4.147653681786031e-06, 'epoch': 0.29} 29%|██▉ | 3606/12313 [2:41:38<6:28:22, 2.68s/it] 29%|██▉ | 3607/12313 [2:41:41<6:28:21, 2.68s/it] {'loss': 0.5433, 'grad_norm': 7.680285801817868, 'learning_rate': 4.147159034334789e-06, 'epoch': 0.29} 29%|██▉ | 3607/12313 [2:41:41<6:28:21, 2.68s/it] 29%|██▉ | 3608/12313 [2:41:43<6:22:24, 2.64s/it] {'loss': 0.5867, 'grad_norm': 5.275584522954317, 'learning_rate': 4.146664272908859e-06, 'epoch': 0.29} 29%|██▉ | 3608/12313 [2:41:43<6:22:24, 2.64s/it] 29%|██▉ | 3609/12313 [2:41:46<6:20:51, 2.63s/it] {'loss': 0.5683, 'grad_norm': 5.37369190885388, 'learning_rate': 4.146169397542478e-06, 'epoch': 0.29} 29%|██▉ | 3609/12313 [2:41:46<6:20:51, 2.63s/it] 29%|██▉ | 3610/12313 [2:41:49<6:23:57, 2.65s/it] {'loss': 0.6545, 'grad_norm': 3.8049589428305413, 'learning_rate': 4.145674408269885e-06, 'epoch': 0.29} 29%|██▉ | 3610/12313 [2:41:49<6:23:57, 2.65s/it] 29%|██▉ | 3611/12313 [2:41:51<6:23:56, 2.65s/it] {'loss': 0.5765, 'grad_norm': 5.091756495748853, 'learning_rate': 4.145179305125333e-06, 'epoch': 0.29} 29%|██▉ | 3611/12313 [2:41:51<6:23:56, 2.65s/it] 29%|██▉ | 3612/12313 [2:41:54<6:18:31, 2.61s/it] {'loss': 0.5044, 'grad_norm': 13.084271350497739, 'learning_rate': 4.14468408814308e-06, 'epoch': 0.29} 29%|██▉ | 3612/12313 [2:41:54<6:18:31, 2.61s/it] 29%|██▉ | 3613/12313 [2:41:56<6:19:01, 2.61s/it] {'loss': 0.5513, 'grad_norm': 5.905329389388773, 'learning_rate': 4.1441887573573935e-06, 'epoch': 0.29} 29%|██▉ | 3613/12313 [2:41:56<6:19:01, 2.61s/it] 29%|██▉ | 3614/12313 [2:41:59<6:32:49, 2.71s/it] {'loss': 0.4885, 'grad_norm': 4.255974685046024, 'learning_rate': 4.143693312802546e-06, 'epoch': 0.29} 29%|██▉ | 3614/12313 [2:41:59<6:32:49, 2.71s/it] 29%|██▉ | 3615/12313 [2:42:02<6:35:17, 2.73s/it] {'loss': 0.5293, 'grad_norm': 7.189955017278807, 'learning_rate': 4.143197754512821e-06, 'epoch': 0.29} 29%|██▉ | 3615/12313 [2:42:02<6:35:17, 2.73s/it] 29%|██▉ | 3616/12313 [2:42:05<6:30:31, 2.69s/it] {'loss': 0.4807, 'grad_norm': 4.434545028745304, 'learning_rate': 4.142702082522507e-06, 'epoch': 0.29} 29%|██▉ | 3616/12313 [2:42:05<6:30:31, 2.69s/it] 29%|██▉ | 3617/12313 [2:42:07<6:22:24, 2.64s/it] {'loss': 0.5229, 'grad_norm': 5.613891388134908, 'learning_rate': 4.142206296865904e-06, 'epoch': 0.29} 29%|██▉ | 3617/12313 [2:42:07<6:22:24, 2.64s/it] 29%|██▉ | 3618/12313 [2:42:10<6:27:24, 2.67s/it] {'loss': 0.6718, 'grad_norm': 3.159086352727405, 'learning_rate': 4.141710397577315e-06, 'epoch': 0.29} 29%|██▉ | 3618/12313 [2:42:10<6:27:24, 2.67s/it] 29%|██▉ | 3619/12313 [2:42:13<6:41:05, 2.77s/it] {'loss': 0.5547, 'grad_norm': 3.5367459467799858, 'learning_rate': 4.141214384691056e-06, 'epoch': 0.29} 29%|██▉ | 3619/12313 [2:42:13<6:41:05, 2.77s/it] 29%|██▉ | 3620/12313 [2:42:16<6:42:11, 2.78s/it] {'loss': 0.5301, 'grad_norm': 7.18465729456046, 'learning_rate': 4.1407182582414476e-06, 'epoch': 0.29} 29%|██▉ | 3620/12313 [2:42:16<6:42:11, 2.78s/it] 29%|██▉ | 3621/12313 [2:42:18<6:32:29, 2.71s/it] {'loss': 0.6391, 'grad_norm': 5.6826492806068405, 'learning_rate': 4.140222018262818e-06, 'epoch': 0.29} 29%|██▉ | 3621/12313 [2:42:18<6:32:29, 2.71s/it] 29%|██▉ | 3622/12313 [2:42:21<6:27:24, 2.67s/it] {'loss': 0.7172, 'grad_norm': 3.357835141229998, 'learning_rate': 4.139725664789507e-06, 'epoch': 0.29} 29%|██▉ | 3622/12313 [2:42:21<6:27:24, 2.67s/it] 29%|██▉ | 3623/12313 [2:42:24<6:29:33, 2.69s/it] {'loss': 0.441, 'grad_norm': 17.104412542873206, 'learning_rate': 4.139229197855857e-06, 'epoch': 0.29} 29%|██▉ | 3623/12313 [2:42:24<6:29:33, 2.69s/it] 29%|██▉ | 3624/12313 [2:42:26<6:26:57, 2.67s/it] {'loss': 0.5264, 'grad_norm': 4.059380496052101, 'learning_rate': 4.138732617496223e-06, 'epoch': 0.29} 29%|██▉ | 3624/12313 [2:42:26<6:26:57, 2.67s/it] 29%|██▉ | 3625/12313 [2:42:29<6:40:17, 2.76s/it] {'loss': 0.4865, 'grad_norm': 4.907343820967712, 'learning_rate': 4.138235923744964e-06, 'epoch': 0.29} 29%|██▉ | 3625/12313 [2:42:29<6:40:17, 2.76s/it] 29%|██▉ | 3626/12313 [2:42:32<6:43:44, 2.79s/it] {'loss': 0.5446, 'grad_norm': 4.767491409739287, 'learning_rate': 4.13773911663645e-06, 'epoch': 0.29} 29%|██▉ | 3626/12313 [2:42:32<6:43:44, 2.79s/it] 29%|██▉ | 3627/12313 [2:42:35<6:36:14, 2.74s/it] {'loss': 0.514, 'grad_norm': 11.043074971749009, 'learning_rate': 4.137242196205056e-06, 'epoch': 0.29} 29%|██▉ | 3627/12313 [2:42:35<6:36:14, 2.74s/it] 29%|██▉ | 3628/12313 [2:42:37<6:29:41, 2.69s/it] {'loss': 0.5782, 'grad_norm': 8.751284247365856, 'learning_rate': 4.136745162485168e-06, 'epoch': 0.29} 29%|██▉ | 3628/12313 [2:42:37<6:29:41, 2.69s/it] 29%|██▉ | 3629/12313 [2:42:40<6:29:38, 2.69s/it] {'loss': 0.4735, 'grad_norm': 4.259342421928949, 'learning_rate': 4.1362480155111764e-06, 'epoch': 0.29} 29%|██▉ | 3629/12313 [2:42:40<6:29:38, 2.69s/it] 29%|██▉ | 3630/12313 [2:42:43<6:36:23, 2.74s/it] {'loss': 0.6233, 'grad_norm': 5.982588574242516, 'learning_rate': 4.135750755317481e-06, 'epoch': 0.29} 29%|██▉ | 3630/12313 [2:42:43<6:36:23, 2.74s/it] 29%|██▉ | 3631/12313 [2:42:45<6:30:27, 2.70s/it] {'loss': 0.6496, 'grad_norm': 4.717722006131577, 'learning_rate': 4.135253381938492e-06, 'epoch': 0.29} 29%|██▉ | 3631/12313 [2:42:45<6:30:27, 2.70s/it] 29%|██▉ | 3632/12313 [2:42:49<6:53:52, 2.86s/it] {'loss': 0.6076, 'grad_norm': 3.9506979794735337, 'learning_rate': 4.134755895408623e-06, 'epoch': 0.29} 29%|██▉ | 3632/12313 [2:42:49<6:53:52, 2.86s/it] 30%|██▉ | 3633/12313 [2:42:51<6:38:18, 2.75s/it] {'loss': 0.5481, 'grad_norm': 6.244712141178371, 'learning_rate': 4.134258295762297e-06, 'epoch': 0.3} 30%|██▉ | 3633/12313 [2:42:51<6:38:18, 2.75s/it] 30%|██▉ | 3634/12313 [2:42:54<6:35:03, 2.73s/it] {'loss': 0.5417, 'grad_norm': 5.110391491927889, 'learning_rate': 4.1337605830339465e-06, 'epoch': 0.3} 30%|██▉ | 3634/12313 [2:42:54<6:35:03, 2.73s/it] 30%|██▉ | 3635/12313 [2:42:56<6:30:12, 2.70s/it] {'loss': 0.6065, 'grad_norm': 4.8675201483594615, 'learning_rate': 4.133262757258011e-06, 'epoch': 0.3} 30%|██▉ | 3635/12313 [2:42:56<6:30:12, 2.70s/it] 30%|██▉ | 3636/12313 [2:42:59<6:33:56, 2.72s/it] {'loss': 0.5392, 'grad_norm': 11.498154785191828, 'learning_rate': 4.132764818468936e-06, 'epoch': 0.3} 30%|██▉ | 3636/12313 [2:42:59<6:33:56, 2.72s/it] 30%|██▉ | 3637/12313 [2:43:02<6:29:39, 2.69s/it] {'loss': 0.6767, 'grad_norm': 6.065167654609438, 'learning_rate': 4.1322667667011774e-06, 'epoch': 0.3} 30%|██▉ | 3637/12313 [2:43:02<6:29:39, 2.69s/it] 30%|██▉ | 3638/12313 [2:43:05<6:26:43, 2.67s/it] {'loss': 0.5793, 'grad_norm': 9.853858235216926, 'learning_rate': 4.131768601989196e-06, 'epoch': 0.3} 30%|██▉ | 3638/12313 [2:43:05<6:26:43, 2.67s/it] 30%|██▉ | 3639/12313 [2:43:07<6:27:39, 2.68s/it] {'loss': 0.7266, 'grad_norm': 5.19821566952064, 'learning_rate': 4.131270324367464e-06, 'epoch': 0.3} 30%|██▉ | 3639/12313 [2:43:07<6:27:39, 2.68s/it] 30%|██▉ | 3640/12313 [2:43:10<6:21:23, 2.64s/it] {'loss': 0.6649, 'grad_norm': 6.474598040312163, 'learning_rate': 4.130771933870459e-06, 'epoch': 0.3} 30%|██▉ | 3640/12313 [2:43:10<6:21:23, 2.64s/it] 30%|██▉ | 3641/12313 [2:43:12<6:20:40, 2.63s/it] {'loss': 0.4317, 'grad_norm': 6.622824718236689, 'learning_rate': 4.130273430532667e-06, 'epoch': 0.3} 30%|██▉ | 3641/12313 [2:43:12<6:20:40, 2.63s/it] 30%|██▉ | 3642/12313 [2:43:15<6:10:52, 2.57s/it] {'loss': 0.4864, 'grad_norm': 8.781051249556342, 'learning_rate': 4.129774814388582e-06, 'epoch': 0.3} 30%|██▉ | 3642/12313 [2:43:15<6:10:52, 2.57s/it] 30%|██▉ | 3643/12313 [2:43:17<6:14:55, 2.59s/it] {'loss': 0.5531, 'grad_norm': 4.776911195727594, 'learning_rate': 4.1292760854727045e-06, 'epoch': 0.3} 30%|██▉ | 3643/12313 [2:43:17<6:14:55, 2.59s/it] 30%|██▉ | 3644/12313 [2:43:20<6:15:09, 2.60s/it] {'loss': 0.5435, 'grad_norm': 4.828770110545079, 'learning_rate': 4.128777243819546e-06, 'epoch': 0.3} 30%|██▉ | 3644/12313 [2:43:20<6:15:09, 2.60s/it] 30%|██▉ | 3645/12313 [2:43:23<6:11:50, 2.57s/it] {'loss': 0.4474, 'grad_norm': 4.550741300328438, 'learning_rate': 4.128278289463621e-06, 'epoch': 0.3} 30%|██▉ | 3645/12313 [2:43:23<6:11:50, 2.57s/it] 30%|██▉ | 3646/12313 [2:43:25<6:17:07, 2.61s/it] {'loss': 0.5896, 'grad_norm': 3.03442154965047, 'learning_rate': 4.127779222439457e-06, 'epoch': 0.3} 30%|██▉ | 3646/12313 [2:43:25<6:17:07, 2.61s/it] 30%|██▉ | 3647/12313 [2:43:28<6:23:52, 2.66s/it] {'loss': 0.6183, 'grad_norm': 4.7725701865486085, 'learning_rate': 4.127280042781585e-06, 'epoch': 0.3} 30%|██▉ | 3647/12313 [2:43:28<6:23:52, 2.66s/it] 30%|██▉ | 3648/12313 [2:43:31<6:24:02, 2.66s/it] {'loss': 0.4919, 'grad_norm': 5.751607562129402, 'learning_rate': 4.126780750524546e-06, 'epoch': 0.3} 30%|██▉ | 3648/12313 [2:43:31<6:24:02, 2.66s/it] 30%|██▉ | 3649/12313 [2:43:34<6:31:02, 2.71s/it] {'loss': 0.5275, 'grad_norm': 3.5887135989563057, 'learning_rate': 4.126281345702889e-06, 'epoch': 0.3} 30%|██▉ | 3649/12313 [2:43:34<6:31:02, 2.71s/it] 30%|██▉ | 3650/12313 [2:43:36<6:31:41, 2.71s/it] {'loss': 0.8401, 'grad_norm': 6.464495222017677, 'learning_rate': 4.125781828351171e-06, 'epoch': 0.3} 30%|██▉ | 3650/12313 [2:43:36<6:31:41, 2.71s/it] 30%|██▉ | 3651/12313 [2:43:39<6:22:18, 2.65s/it] {'loss': 0.5954, 'grad_norm': 5.980573147403461, 'learning_rate': 4.125282198503953e-06, 'epoch': 0.3} 30%|██▉ | 3651/12313 [2:43:39<6:22:18, 2.65s/it] 30%|██▉ | 3652/12313 [2:43:41<6:24:29, 2.66s/it] {'loss': 0.5105, 'grad_norm': 5.228111544471104, 'learning_rate': 4.124782456195809e-06, 'epoch': 0.3} 30%|██▉ | 3652/12313 [2:43:41<6:24:29, 2.66s/it] 30%|██▉ | 3653/12313 [2:43:44<6:19:38, 2.63s/it] {'loss': 0.4924, 'grad_norm': 7.53149544733575, 'learning_rate': 4.124282601461319e-06, 'epoch': 0.3} 30%|██▉ | 3653/12313 [2:43:44<6:19:38, 2.63s/it] 30%|██▉ | 3654/12313 [2:43:46<6:11:26, 2.57s/it] {'loss': 0.4124, 'grad_norm': 5.556355856990694, 'learning_rate': 4.123782634335068e-06, 'epoch': 0.3} 30%|██▉ | 3654/12313 [2:43:46<6:11:26, 2.57s/it] 30%|██▉ | 3655/12313 [2:43:49<6:12:41, 2.58s/it] {'loss': 0.5824, 'grad_norm': 4.358048092574572, 'learning_rate': 4.123282554851654e-06, 'epoch': 0.3} 30%|██▉ | 3655/12313 [2:43:49<6:12:41, 2.58s/it] 30%|██▉ | 3656/12313 [2:43:52<6:17:23, 2.62s/it] {'loss': 0.4748, 'grad_norm': 3.5786979265887187, 'learning_rate': 4.122782363045677e-06, 'epoch': 0.3} 30%|██▉ | 3656/12313 [2:43:52<6:17:23, 2.62s/it] 30%|██▉ | 3657/12313 [2:43:54<6:21:56, 2.65s/it] {'loss': 0.4482, 'grad_norm': 4.524617536184045, 'learning_rate': 4.12228205895175e-06, 'epoch': 0.3} 30%|██▉ | 3657/12313 [2:43:54<6:21:56, 2.65s/it] 30%|██▉ | 3658/12313 [2:43:57<6:23:50, 2.66s/it] {'loss': 0.565, 'grad_norm': 6.2371920872247735, 'learning_rate': 4.12178164260449e-06, 'epoch': 0.3} 30%|██▉ | 3658/12313 [2:43:57<6:23:50, 2.66s/it] 30%|██▉ | 3659/12313 [2:44:00<6:48:17, 2.83s/it] {'loss': 0.4706, 'grad_norm': 3.54005929501434, 'learning_rate': 4.121281114038524e-06, 'epoch': 0.3} 30%|██▉ | 3659/12313 [2:44:00<6:48:17, 2.83s/it] 30%|██▉ | 3660/12313 [2:44:03<6:40:46, 2.78s/it] {'loss': 0.4807, 'grad_norm': 6.2010580244373905, 'learning_rate': 4.120780473288485e-06, 'epoch': 0.3} 30%|██▉ | 3660/12313 [2:44:03<6:40:46, 2.78s/it] 30%|██▉ | 3661/12313 [2:44:06<7:02:04, 2.93s/it] {'loss': 0.5279, 'grad_norm': 3.39391672989296, 'learning_rate': 4.120279720389015e-06, 'epoch': 0.3} 30%|██▉ | 3661/12313 [2:44:06<7:02:04, 2.93s/it] 30%|██▉ | 3662/12313 [2:44:09<6:51:52, 2.86s/it] {'loss': 0.6333, 'grad_norm': 7.485601794086229, 'learning_rate': 4.119778855374763e-06, 'epoch': 0.3} 30%|██▉ | 3662/12313 [2:44:09<6:51:52, 2.86s/it] 30%|██▉ | 3663/12313 [2:44:12<6:44:33, 2.81s/it] {'loss': 0.5886, 'grad_norm': 4.283931201224972, 'learning_rate': 4.1192778782803875e-06, 'epoch': 0.3} 30%|██▉ | 3663/12313 [2:44:12<6:44:33, 2.81s/it] 30%|██▉ | 3664/12313 [2:44:14<6:29:31, 2.70s/it] {'loss': 0.4158, 'grad_norm': 6.09549459673354, 'learning_rate': 4.118776789140551e-06, 'epoch': 0.3} 30%|██▉ | 3664/12313 [2:44:14<6:29:31, 2.70s/it] 30%|██▉ | 3665/12313 [2:44:17<6:25:32, 2.67s/it] {'loss': 0.5857, 'grad_norm': 3.9183720909856645, 'learning_rate': 4.1182755879899305e-06, 'epoch': 0.3} 30%|██▉ | 3665/12313 [2:44:17<6:25:32, 2.67s/it] 30%|██▉ | 3666/12313 [2:44:19<6:18:56, 2.63s/it] {'loss': 0.5828, 'grad_norm': 4.554155011331954, 'learning_rate': 4.117774274863203e-06, 'epoch': 0.3} 30%|██▉ | 3666/12313 [2:44:19<6:18:56, 2.63s/it] 30%|██▉ | 3667/12313 [2:44:22<6:30:41, 2.71s/it] {'loss': 0.7132, 'grad_norm': 6.026037573474456, 'learning_rate': 4.117272849795057e-06, 'epoch': 0.3} 30%|██▉ | 3667/12313 [2:44:22<6:30:41, 2.71s/it] 30%|██▉ | 3668/12313 [2:44:25<6:17:19, 2.62s/it] {'loss': 0.6505, 'grad_norm': 5.369767546922633, 'learning_rate': 4.116771312820189e-06, 'epoch': 0.3} 30%|██▉ | 3668/12313 [2:44:25<6:17:19, 2.62s/it] 30%|██▉ | 3669/12313 [2:44:28<6:30:16, 2.71s/it] {'loss': 0.537, 'grad_norm': 3.49258490401417, 'learning_rate': 4.116269663973304e-06, 'epoch': 0.3} 30%|██▉ | 3669/12313 [2:44:28<6:30:16, 2.71s/it] 30%|██▉ | 3670/12313 [2:44:30<6:30:26, 2.71s/it] {'loss': 0.6225, 'grad_norm': 8.038084077662122, 'learning_rate': 4.115767903289112e-06, 'epoch': 0.3} 30%|██▉ | 3670/12313 [2:44:30<6:30:26, 2.71s/it] 30%|██▉ | 3671/12313 [2:44:33<6:27:28, 2.69s/it] {'loss': 0.4825, 'grad_norm': 10.562283989123358, 'learning_rate': 4.115266030802332e-06, 'epoch': 0.3} 30%|██▉ | 3671/12313 [2:44:33<6:27:28, 2.69s/it] 30%|██▉ | 3672/12313 [2:44:35<6:19:17, 2.63s/it] {'loss': 0.59, 'grad_norm': 5.676004986730943, 'learning_rate': 4.114764046547691e-06, 'epoch': 0.3} 30%|██▉ | 3672/12313 [2:44:35<6:19:17, 2.63s/it] 30%|██▉ | 3673/12313 [2:44:38<6:19:02, 2.63s/it] {'loss': 0.5298, 'grad_norm': 8.432493876901468, 'learning_rate': 4.114261950559924e-06, 'epoch': 0.3} 30%|██▉ | 3673/12313 [2:44:38<6:19:02, 2.63s/it] 30%|██▉ | 3674/12313 [2:44:41<6:15:54, 2.61s/it] {'loss': 0.6273, 'grad_norm': 4.1760389034352015, 'learning_rate': 4.113759742873774e-06, 'epoch': 0.3} 30%|██▉ | 3674/12313 [2:44:41<6:15:54, 2.61s/it] 30%|██▉ | 3675/12313 [2:44:43<6:06:13, 2.54s/it] {'loss': 0.4842, 'grad_norm': 6.694938489409981, 'learning_rate': 4.11325742352399e-06, 'epoch': 0.3} 30%|██▉ | 3675/12313 [2:44:43<6:06:13, 2.54s/it] 30%|██▉ | 3676/12313 [2:44:46<6:12:22, 2.59s/it] {'loss': 0.5768, 'grad_norm': 4.346992428927781, 'learning_rate': 4.112754992545331e-06, 'epoch': 0.3} 30%|██▉ | 3676/12313 [2:44:46<6:12:22, 2.59s/it] 30%|██▉ | 3677/12313 [2:44:48<6:17:20, 2.62s/it] {'loss': 0.5956, 'grad_norm': 4.528058186506179, 'learning_rate': 4.112252449972562e-06, 'epoch': 0.3} 30%|██▉ | 3677/12313 [2:44:48<6:17:20, 2.62s/it] 30%|██▉ | 3678/12313 [2:44:51<6:15:50, 2.61s/it] {'loss': 0.5005, 'grad_norm': 4.764966099793393, 'learning_rate': 4.111749795840455e-06, 'epoch': 0.3} 30%|██▉ | 3678/12313 [2:44:51<6:15:50, 2.61s/it] 30%|██▉ | 3679/12313 [2:44:54<6:42:25, 2.80s/it] {'loss': 0.5522, 'grad_norm': 10.20025692058819, 'learning_rate': 4.111247030183793e-06, 'epoch': 0.3} 30%|██▉ | 3679/12313 [2:44:54<6:42:25, 2.80s/it] 30%|██▉ | 3680/12313 [2:44:57<6:44:08, 2.81s/it] {'loss': 0.505, 'grad_norm': 31.38171024066769, 'learning_rate': 4.110744153037363e-06, 'epoch': 0.3} 30%|██▉ | 3680/12313 [2:44:57<6:44:08, 2.81s/it] 30%|██▉ | 3681/12313 [2:45:00<6:37:52, 2.77s/it] {'loss': 0.4673, 'grad_norm': 4.523115084329832, 'learning_rate': 4.110241164435964e-06, 'epoch': 0.3} 30%|██▉ | 3681/12313 [2:45:00<6:37:52, 2.77s/it] 30%|██▉ | 3682/12313 [2:45:02<6:32:38, 2.73s/it] {'loss': 0.4812, 'grad_norm': 3.7531700099140393, 'learning_rate': 4.109738064414397e-06, 'epoch': 0.3} 30%|██▉ | 3682/12313 [2:45:02<6:32:38, 2.73s/it] 30%|██▉ | 3683/12313 [2:45:05<6:31:09, 2.72s/it] {'loss': 0.5328, 'grad_norm': 5.06537300558853, 'learning_rate': 4.1092348530074764e-06, 'epoch': 0.3} 30%|██▉ | 3683/12313 [2:45:05<6:31:09, 2.72s/it] 30%|██▉ | 3684/12313 [2:45:07<6:17:24, 2.62s/it] {'loss': 0.4383, 'grad_norm': 6.176828372115656, 'learning_rate': 4.10873153025002e-06, 'epoch': 0.3} 30%|██▉ | 3684/12313 [2:45:07<6:17:24, 2.62s/it] 30%|██▉ | 3685/12313 [2:45:10<6:16:17, 2.62s/it] {'loss': 0.6148, 'grad_norm': 5.037654343199056, 'learning_rate': 4.108228096176856e-06, 'epoch': 0.3} 30%|██▉ | 3685/12313 [2:45:10<6:16:17, 2.62s/it] 30%|██▉ | 3686/12313 [2:45:13<6:16:54, 2.62s/it] {'loss': 0.558, 'grad_norm': 6.942767647531231, 'learning_rate': 4.10772455082282e-06, 'epoch': 0.3} 30%|██▉ | 3686/12313 [2:45:13<6:16:54, 2.62s/it] 30%|██▉ | 3687/12313 [2:45:15<6:13:08, 2.60s/it] {'loss': 0.7493, 'grad_norm': 4.277595045669092, 'learning_rate': 4.107220894222753e-06, 'epoch': 0.3} 30%|██▉ | 3687/12313 [2:45:15<6:13:08, 2.60s/it] 30%|██▉ | 3688/12313 [2:45:18<6:37:04, 2.76s/it] {'loss': 0.4093, 'grad_norm': 6.128933496939369, 'learning_rate': 4.106717126411506e-06, 'epoch': 0.3} 30%|██▉ | 3688/12313 [2:45:18<6:37:04, 2.76s/it] 30%|██▉ | 3689/12313 [2:45:21<6:39:54, 2.78s/it] {'loss': 0.5386, 'grad_norm': 3.4094618806546713, 'learning_rate': 4.106213247423938e-06, 'epoch': 0.3} 30%|██▉ | 3689/12313 [2:45:21<6:39:54, 2.78s/it] 30%|██▉ | 3690/12313 [2:45:24<6:34:19, 2.74s/it] {'loss': 0.5606, 'grad_norm': 5.429499060264869, 'learning_rate': 4.105709257294914e-06, 'epoch': 0.3} 30%|██▉ | 3690/12313 [2:45:24<6:34:19, 2.74s/it] 30%|██▉ | 3691/12313 [2:45:27<6:43:22, 2.81s/it] {'loss': 0.4964, 'grad_norm': 8.126911051862221, 'learning_rate': 4.105205156059307e-06, 'epoch': 0.3} 30%|██▉ | 3691/12313 [2:45:27<6:43:22, 2.81s/it] 30%|██▉ | 3692/12313 [2:45:30<6:44:21, 2.81s/it] {'loss': 0.5032, 'grad_norm': 5.576298258657682, 'learning_rate': 4.104700943751999e-06, 'epoch': 0.3} 30%|██▉ | 3692/12313 [2:45:30<6:44:21, 2.81s/it] 30%|██▉ | 3693/12313 [2:45:32<6:29:47, 2.71s/it] {'loss': 0.5164, 'grad_norm': 5.428256954230044, 'learning_rate': 4.104196620407878e-06, 'epoch': 0.3} 30%|██▉ | 3693/12313 [2:45:32<6:29:47, 2.71s/it] 30%|███ | 3694/12313 [2:45:35<6:20:34, 2.65s/it] {'loss': 0.4619, 'grad_norm': 5.490233849085779, 'learning_rate': 4.1036921860618415e-06, 'epoch': 0.3} 30%|███ | 3694/12313 [2:45:35<6:20:34, 2.65s/it] 30%|███ | 3695/12313 [2:45:37<6:23:00, 2.67s/it] {'loss': 0.4709, 'grad_norm': 3.6578438271498683, 'learning_rate': 4.103187640748792e-06, 'epoch': 0.3} 30%|███ | 3695/12313 [2:45:37<6:23:00, 2.67s/it] 30%|███ | 3696/12313 [2:45:40<6:36:05, 2.76s/it] {'loss': 0.5943, 'grad_norm': 4.119105065976182, 'learning_rate': 4.102682984503644e-06, 'epoch': 0.3} 30%|███ | 3696/12313 [2:45:40<6:36:05, 2.76s/it] 30%|███ | 3697/12313 [2:45:43<6:33:08, 2.74s/it] {'loss': 0.5706, 'grad_norm': 4.816227234767905, 'learning_rate': 4.102178217361315e-06, 'epoch': 0.3} 30%|███ | 3697/12313 [2:45:43<6:33:08, 2.74s/it] 30%|███ | 3698/12313 [2:45:46<6:29:06, 2.71s/it] {'loss': 0.4774, 'grad_norm': 3.586911969028518, 'learning_rate': 4.101673339356733e-06, 'epoch': 0.3} 30%|███ | 3698/12313 [2:45:46<6:29:06, 2.71s/it] 30%|███ | 3699/12313 [2:45:48<6:27:58, 2.70s/it] {'loss': 0.4695, 'grad_norm': 9.66099144536593, 'learning_rate': 4.101168350524832e-06, 'epoch': 0.3} 30%|███ | 3699/12313 [2:45:48<6:27:58, 2.70s/it] 30%|███ | 3700/12313 [2:45:51<6:37:32, 2.77s/it] {'loss': 0.4937, 'grad_norm': 4.619743032013704, 'learning_rate': 4.100663250900556e-06, 'epoch': 0.3} 30%|███ | 3700/12313 [2:45:51<6:37:32, 2.77s/it] 30%|███ | 3701/12313 [2:45:54<6:35:06, 2.75s/it] {'loss': 0.4896, 'grad_norm': 5.0528198312806705, 'learning_rate': 4.100158040518854e-06, 'epoch': 0.3} 30%|███ | 3701/12313 [2:45:54<6:35:06, 2.75s/it] 30%|███ | 3702/12313 [2:45:57<6:38:14, 2.77s/it] {'loss': 0.4393, 'grad_norm': 6.192798320269057, 'learning_rate': 4.099652719414684e-06, 'epoch': 0.3} 30%|███ | 3702/12313 [2:45:57<6:38:14, 2.77s/it] 30%|███ | 3703/12313 [2:45:59<6:23:32, 2.67s/it] {'loss': 0.4435, 'grad_norm': 4.544818358602807, 'learning_rate': 4.099147287623012e-06, 'epoch': 0.3} 30%|███ | 3703/12313 [2:45:59<6:23:32, 2.67s/it] 30%|███ | 3704/12313 [2:46:02<6:19:08, 2.64s/it] {'loss': 0.5758, 'grad_norm': 3.843568388195506, 'learning_rate': 4.098641745178812e-06, 'epoch': 0.3} 30%|███ | 3704/12313 [2:46:02<6:19:08, 2.64s/it] 30%|███ | 3705/12313 [2:46:04<6:22:19, 2.66s/it] {'loss': 0.5837, 'grad_norm': 7.032873685349521, 'learning_rate': 4.098136092117063e-06, 'epoch': 0.3} 30%|███ | 3705/12313 [2:46:04<6:22:19, 2.66s/it] 30%|███ | 3706/12313 [2:46:07<6:14:44, 2.61s/it] {'loss': 0.4516, 'grad_norm': 4.039586791406998, 'learning_rate': 4.097630328472755e-06, 'epoch': 0.3} 30%|███ | 3706/12313 [2:46:07<6:14:44, 2.61s/it] 30%|███ | 3707/12313 [2:46:09<6:06:39, 2.56s/it] {'loss': 0.4981, 'grad_norm': 5.893186542446594, 'learning_rate': 4.097124454280883e-06, 'epoch': 0.3} 30%|███ | 3707/12313 [2:46:09<6:06:39, 2.56s/it] 30%|███ | 3708/12313 [2:46:12<6:08:41, 2.57s/it] {'loss': 0.721, 'grad_norm': 4.234572464495813, 'learning_rate': 4.096618469576451e-06, 'epoch': 0.3} 30%|███ | 3708/12313 [2:46:12<6:08:41, 2.57s/it] 30%|███ | 3709/12313 [2:46:15<6:20:15, 2.65s/it] {'loss': 0.4969, 'grad_norm': 4.138584167113376, 'learning_rate': 4.0961123743944715e-06, 'epoch': 0.3} 30%|███ | 3709/12313 [2:46:15<6:20:15, 2.65s/it] 30%|███ | 3710/12313 [2:46:18<6:32:52, 2.74s/it] {'loss': 0.65, 'grad_norm': 4.733715325450099, 'learning_rate': 4.095606168769964e-06, 'epoch': 0.3} 30%|███ | 3710/12313 [2:46:18<6:32:52, 2.74s/it] 30%|███ | 3711/12313 [2:46:20<6:20:03, 2.65s/it] {'loss': 0.6026, 'grad_norm': 5.217622232509772, 'learning_rate': 4.095099852737953e-06, 'epoch': 0.3} 30%|███ | 3711/12313 [2:46:20<6:20:03, 2.65s/it] 30%|███ | 3712/12313 [2:46:23<6:28:48, 2.71s/it] {'loss': 0.6201, 'grad_norm': 6.573735753701009, 'learning_rate': 4.094593426333474e-06, 'epoch': 0.3} 30%|███ | 3712/12313 [2:46:23<6:28:48, 2.71s/it] 30%|███ | 3713/12313 [2:46:26<6:20:34, 2.66s/it] {'loss': 0.6581, 'grad_norm': 7.983639715927653, 'learning_rate': 4.09408688959157e-06, 'epoch': 0.3} 30%|███ | 3713/12313 [2:46:26<6:20:34, 2.66s/it] 30%|███ | 3714/12313 [2:46:28<6:15:14, 2.62s/it] {'loss': 0.7463, 'grad_norm': 5.652732617432702, 'learning_rate': 4.093580242547289e-06, 'epoch': 0.3} 30%|███ | 3714/12313 [2:46:28<6:15:14, 2.62s/it] 30%|███ | 3715/12313 [2:46:31<6:07:42, 2.57s/it] {'loss': 0.4833, 'grad_norm': 5.775875513388475, 'learning_rate': 4.09307348523569e-06, 'epoch': 0.3} 30%|███ | 3715/12313 [2:46:31<6:07:42, 2.57s/it] 30%|███ | 3716/12313 [2:46:34<6:24:34, 2.68s/it] {'loss': 0.4648, 'grad_norm': 5.373772908608717, 'learning_rate': 4.092566617691837e-06, 'epoch': 0.3} 30%|███ | 3716/12313 [2:46:34<6:24:34, 2.68s/it] 30%|███ | 3717/12313 [2:46:37<6:50:19, 2.86s/it] {'loss': 0.543, 'grad_norm': 4.681484734835625, 'learning_rate': 4.092059639950802e-06, 'epoch': 0.3} 30%|███ | 3717/12313 [2:46:37<6:50:19, 2.86s/it] 30%|███ | 3718/12313 [2:46:40<6:52:55, 2.88s/it] {'loss': 0.7871, 'grad_norm': 3.293957563526287, 'learning_rate': 4.0915525520476665e-06, 'epoch': 0.3} 30%|███ | 3718/12313 [2:46:40<6:52:55, 2.88s/it] 30%|███ | 3719/12313 [2:46:43<6:48:44, 2.85s/it] {'loss': 0.54, 'grad_norm': 9.736689094555402, 'learning_rate': 4.091045354017517e-06, 'epoch': 0.3} 30%|███ | 3719/12313 [2:46:43<6:48:44, 2.85s/it] 30%|███ | 3720/12313 [2:46:45<6:44:07, 2.82s/it] {'loss': 0.4686, 'grad_norm': 7.004070985864832, 'learning_rate': 4.090538045895449e-06, 'epoch': 0.3} 30%|███ | 3720/12313 [2:46:45<6:44:07, 2.82s/it] 30%|███ | 3721/12313 [2:46:48<6:50:07, 2.86s/it] {'loss': 0.6621, 'grad_norm': 6.304287102750301, 'learning_rate': 4.090030627716567e-06, 'epoch': 0.3} 30%|███ | 3721/12313 [2:46:48<6:50:07, 2.86s/it] 30%|███ | 3722/12313 [2:46:51<6:30:04, 2.72s/it] {'loss': 0.6359, 'grad_norm': 5.678988794221106, 'learning_rate': 4.08952309951598e-06, 'epoch': 0.3} 30%|███ | 3722/12313 [2:46:51<6:30:04, 2.72s/it] 30%|███ | 3723/12313 [2:46:53<6:23:10, 2.68s/it] {'loss': 0.6547, 'grad_norm': 7.331654562862624, 'learning_rate': 4.0890154613288066e-06, 'epoch': 0.3} 30%|███ | 3723/12313 [2:46:53<6:23:10, 2.68s/it] 30%|███ | 3724/12313 [2:46:56<6:22:26, 2.67s/it] {'loss': 0.5675, 'grad_norm': 3.7966525875405277, 'learning_rate': 4.088507713190174e-06, 'epoch': 0.3} 30%|███ | 3724/12313 [2:46:56<6:22:26, 2.67s/it] 30%|███ | 3725/12313 [2:46:58<6:15:01, 2.62s/it] {'loss': 0.6325, 'grad_norm': 4.688025131548344, 'learning_rate': 4.087999855135215e-06, 'epoch': 0.3} 30%|███ | 3725/12313 [2:46:58<6:15:01, 2.62s/it] 30%|███ | 3726/12313 [2:47:01<6:17:45, 2.64s/it] {'loss': 0.5405, 'grad_norm': 4.823758976025031, 'learning_rate': 4.087491887199069e-06, 'epoch': 0.3} 30%|███ | 3726/12313 [2:47:01<6:17:45, 2.64s/it] 30%|███ | 3727/12313 [2:47:04<6:13:24, 2.61s/it] {'loss': 0.4993, 'grad_norm': 3.9620946246180884, 'learning_rate': 4.086983809416887e-06, 'epoch': 0.3} 30%|███ | 3727/12313 [2:47:04<6:13:24, 2.61s/it] 30%|███ | 3728/12313 [2:47:06<6:16:29, 2.63s/it] {'loss': 0.4079, 'grad_norm': 4.5032403641906305, 'learning_rate': 4.086475621823824e-06, 'epoch': 0.3} 30%|███ | 3728/12313 [2:47:06<6:16:29, 2.63s/it] 30%|███ | 3729/12313 [2:47:09<6:24:18, 2.69s/it] {'loss': 0.5201, 'grad_norm': 20.477342042531873, 'learning_rate': 4.085967324455045e-06, 'epoch': 0.3} 30%|███ | 3729/12313 [2:47:09<6:24:18, 2.69s/it] 30%|███ | 3730/12313 [2:47:12<6:22:04, 2.67s/it] {'loss': 0.7878, 'grad_norm': 6.231510482092364, 'learning_rate': 4.085458917345721e-06, 'epoch': 0.3} 30%|███ | 3730/12313 [2:47:12<6:22:04, 2.67s/it] 30%|███ | 3731/12313 [2:47:14<6:23:37, 2.68s/it] {'loss': 0.5936, 'grad_norm': 4.154775199567728, 'learning_rate': 4.084950400531029e-06, 'epoch': 0.3} 30%|███ | 3731/12313 [2:47:14<6:23:37, 2.68s/it] 30%|███ | 3732/12313 [2:47:17<6:17:48, 2.64s/it] {'loss': 0.4481, 'grad_norm': 4.694151372288833, 'learning_rate': 4.0844417740461586e-06, 'epoch': 0.3} 30%|███ | 3732/12313 [2:47:17<6:17:48, 2.64s/it] 30%|███ | 3733/12313 [2:47:20<6:25:44, 2.70s/it] {'loss': 0.6051, 'grad_norm': 4.709650246764357, 'learning_rate': 4.083933037926303e-06, 'epoch': 0.3} 30%|███ | 3733/12313 [2:47:20<6:25:44, 2.70s/it] 30%|███ | 3734/12313 [2:47:22<6:19:55, 2.66s/it] {'loss': 0.5312, 'grad_norm': 5.029331738383321, 'learning_rate': 4.0834241922066644e-06, 'epoch': 0.3} 30%|███ | 3734/12313 [2:47:22<6:19:55, 2.66s/it] 30%|███ | 3735/12313 [2:47:25<6:16:21, 2.63s/it] {'loss': 0.5698, 'grad_norm': 4.293546250505703, 'learning_rate': 4.082915236922451e-06, 'epoch': 0.3} 30%|███ | 3735/12313 [2:47:25<6:16:21, 2.63s/it] 30%|███ | 3736/12313 [2:47:28<6:48:09, 2.86s/it] {'loss': 0.6138, 'grad_norm': 3.9815078810069218, 'learning_rate': 4.082406172108882e-06, 'epoch': 0.3} 30%|███ | 3736/12313 [2:47:28<6:48:09, 2.86s/it] 30%|███ | 3737/12313 [2:47:31<6:28:59, 2.72s/it] {'loss': 0.5962, 'grad_norm': 8.25343748834109, 'learning_rate': 4.0818969978011795e-06, 'epoch': 0.3} 30%|███ | 3737/12313 [2:47:31<6:28:59, 2.72s/it] 30%|███ | 3738/12313 [2:47:33<6:29:13, 2.72s/it] {'loss': 0.5964, 'grad_norm': 8.293232016506174, 'learning_rate': 4.081387714034577e-06, 'epoch': 0.3} 30%|███ | 3738/12313 [2:47:33<6:29:13, 2.72s/it] 30%|███ | 3739/12313 [2:47:36<6:28:48, 2.72s/it] {'loss': 0.5288, 'grad_norm': 9.876486919775553, 'learning_rate': 4.080878320844315e-06, 'epoch': 0.3} 30%|███ | 3739/12313 [2:47:36<6:28:48, 2.72s/it] 30%|███ | 3740/12313 [2:47:39<6:18:42, 2.65s/it] {'loss': 0.542, 'grad_norm': 4.90237518330183, 'learning_rate': 4.080368818265639e-06, 'epoch': 0.3} 30%|███ | 3740/12313 [2:47:39<6:18:42, 2.65s/it] 30%|███ | 3741/12313 [2:47:41<6:19:37, 2.66s/it] {'loss': 0.5346, 'grad_norm': 5.433619699450851, 'learning_rate': 4.079859206333805e-06, 'epoch': 0.3} 30%|███ | 3741/12313 [2:47:41<6:19:37, 2.66s/it] 30%|███ | 3742/12313 [2:47:44<6:20:27, 2.66s/it] {'loss': 0.5068, 'grad_norm': 6.417127497404236, 'learning_rate': 4.079349485084074e-06, 'epoch': 0.3} 30%|███ | 3742/12313 [2:47:44<6:20:27, 2.66s/it] 30%|███ | 3743/12313 [2:47:47<6:20:42, 2.67s/it] {'loss': 0.5981, 'grad_norm': 9.395212073819222, 'learning_rate': 4.078839654551718e-06, 'epoch': 0.3} 30%|███ | 3743/12313 [2:47:47<6:20:42, 2.67s/it] 30%|███ | 3744/12313 [2:47:49<6:22:09, 2.68s/it] {'loss': 0.5716, 'grad_norm': 6.080896733160445, 'learning_rate': 4.078329714772015e-06, 'epoch': 0.3} 30%|███ | 3744/12313 [2:47:49<6:22:09, 2.68s/it] 30%|███ | 3745/12313 [2:47:52<6:23:40, 2.69s/it] {'loss': 0.589, 'grad_norm': 3.9795202208766463, 'learning_rate': 4.0778196657802484e-06, 'epoch': 0.3} 30%|███ | 3745/12313 [2:47:52<6:23:40, 2.69s/it] 30%|███ | 3746/12313 [2:47:55<6:24:19, 2.69s/it] {'loss': 0.559, 'grad_norm': 5.978058004337624, 'learning_rate': 4.077309507611711e-06, 'epoch': 0.3} 30%|███ | 3746/12313 [2:47:55<6:24:19, 2.69s/it] 30%|███ | 3747/12313 [2:47:58<6:37:49, 2.79s/it] {'loss': 0.5419, 'grad_norm': 10.73728162343595, 'learning_rate': 4.076799240301703e-06, 'epoch': 0.3} 30%|███ | 3747/12313 [2:47:58<6:37:49, 2.79s/it] 30%|███ | 3748/12313 [2:48:01<6:42:00, 2.82s/it] {'loss': 0.5277, 'grad_norm': 5.705981837585083, 'learning_rate': 4.076288863885533e-06, 'epoch': 0.3} 30%|███ | 3748/12313 [2:48:01<6:42:00, 2.82s/it] 30%|███ | 3749/12313 [2:48:03<6:38:23, 2.79s/it] {'loss': 0.6104, 'grad_norm': 4.148490626037343, 'learning_rate': 4.0757783783985164e-06, 'epoch': 0.3} 30%|███ | 3749/12313 [2:48:03<6:38:23, 2.79s/it] 30%|███ | 3750/12313 [2:48:06<6:28:50, 2.72s/it] {'loss': 0.6195, 'grad_norm': 7.3071974055301006, 'learning_rate': 4.0752677838759755e-06, 'epoch': 0.3} 30%|███ | 3750/12313 [2:48:06<6:28:50, 2.72s/it] 30%|███ | 3751/12313 [2:48:09<6:34:59, 2.77s/it] {'loss': 0.6436, 'grad_norm': 3.536531811654719, 'learning_rate': 4.074757080353241e-06, 'epoch': 0.3} 30%|███ | 3751/12313 [2:48:09<6:34:59, 2.77s/it] 30%|███ | 3752/12313 [2:48:11<6:16:30, 2.64s/it] {'loss': 0.6274, 'grad_norm': 4.25870092662668, 'learning_rate': 4.074246267865652e-06, 'epoch': 0.3} 30%|███ | 3752/12313 [2:48:11<6:16:30, 2.64s/it] 30%|███ | 3753/12313 [2:48:14<6:30:21, 2.74s/it] {'loss': 0.6034, 'grad_norm': 3.6079741565893895, 'learning_rate': 4.073735346448551e-06, 'epoch': 0.3} 30%|███ | 3753/12313 [2:48:14<6:30:21, 2.74s/it] 30%|███ | 3754/12313 [2:48:17<6:24:36, 2.70s/it] {'loss': 0.5586, 'grad_norm': 7.935117748318247, 'learning_rate': 4.073224316137293e-06, 'epoch': 0.3} 30%|███ | 3754/12313 [2:48:17<6:24:36, 2.70s/it] 30%|███ | 3755/12313 [2:48:20<6:28:14, 2.72s/it] {'loss': 0.5641, 'grad_norm': 6.813447251757785, 'learning_rate': 4.072713176967239e-06, 'epoch': 0.3} 30%|███ | 3755/12313 [2:48:20<6:28:14, 2.72s/it] 31%|███ | 3756/12313 [2:48:22<6:27:49, 2.72s/it] {'loss': 0.5836, 'grad_norm': 6.715619033709069, 'learning_rate': 4.072201928973757e-06, 'epoch': 0.31} 31%|███ | 3756/12313 [2:48:22<6:27:49, 2.72s/it] 31%|███ | 3757/12313 [2:48:25<6:20:24, 2.67s/it] {'loss': 0.6304, 'grad_norm': 3.5300282565737944, 'learning_rate': 4.071690572192222e-06, 'epoch': 0.31} 31%|███ | 3757/12313 [2:48:25<6:20:24, 2.67s/it] 31%|███ | 3758/12313 [2:48:27<6:17:59, 2.65s/it] {'loss': 0.5776, 'grad_norm': 9.284044398768938, 'learning_rate': 4.071179106658017e-06, 'epoch': 0.31} 31%|███ | 3758/12313 [2:48:27<6:17:59, 2.65s/it] 31%|███ | 3759/12313 [2:48:30<6:12:00, 2.61s/it] {'loss': 0.518, 'grad_norm': 6.469344755338421, 'learning_rate': 4.070667532406534e-06, 'epoch': 0.31} 31%|███ | 3759/12313 [2:48:30<6:12:00, 2.61s/it] 31%|███ | 3760/12313 [2:48:33<6:18:45, 2.66s/it] {'loss': 0.4228, 'grad_norm': 9.204050268328828, 'learning_rate': 4.070155849473169e-06, 'epoch': 0.31} 31%|███ | 3760/12313 [2:48:33<6:18:45, 2.66s/it] 31%|███ | 3761/12313 [2:48:35<6:07:49, 2.58s/it] {'loss': 0.5276, 'grad_norm': 7.33932480094826, 'learning_rate': 4.06964405789333e-06, 'epoch': 0.31} 31%|███ | 3761/12313 [2:48:35<6:07:49, 2.58s/it] 31%|███ | 3762/12313 [2:48:38<6:16:12, 2.64s/it] {'loss': 0.5275, 'grad_norm': 4.5551115854633695, 'learning_rate': 4.06913215770243e-06, 'epoch': 0.31} 31%|███ | 3762/12313 [2:48:38<6:16:12, 2.64s/it] 31%|███ | 3763/12313 [2:48:40<6:15:46, 2.64s/it] {'loss': 0.5706, 'grad_norm': 7.628505640253093, 'learning_rate': 4.068620148935889e-06, 'epoch': 0.31} 31%|███ | 3763/12313 [2:48:40<6:15:46, 2.64s/it] 31%|███ | 3764/12313 [2:48:43<6:18:28, 2.66s/it] {'loss': 0.4332, 'grad_norm': 7.296776756806712, 'learning_rate': 4.0681080316291355e-06, 'epoch': 0.31} 31%|███ | 3764/12313 [2:48:43<6:18:28, 2.66s/it] 31%|███ | 3765/12313 [2:48:46<6:18:46, 2.66s/it] {'loss': 0.5385, 'grad_norm': 6.211933336592377, 'learning_rate': 4.067595805817604e-06, 'epoch': 0.31} 31%|███ | 3765/12313 [2:48:46<6:18:46, 2.66s/it] 31%|███ | 3766/12313 [2:48:49<6:20:45, 2.67s/it] {'loss': 0.6172, 'grad_norm': 5.2682390368881125, 'learning_rate': 4.0670834715367405e-06, 'epoch': 0.31} 31%|███ | 3766/12313 [2:48:49<6:20:45, 2.67s/it] 31%|███ | 3767/12313 [2:48:51<6:19:06, 2.66s/it] {'loss': 0.663, 'grad_norm': 5.7266015459873465, 'learning_rate': 4.066571028821994e-06, 'epoch': 0.31} 31%|███ | 3767/12313 [2:48:51<6:19:06, 2.66s/it] 31%|███ | 3768/12313 [2:48:54<6:12:27, 2.62s/it] {'loss': 0.6204, 'grad_norm': 4.566744030336992, 'learning_rate': 4.066058477708824e-06, 'epoch': 0.31} 31%|███ | 3768/12313 [2:48:54<6:12:27, 2.62s/it] 31%|███ | 3769/12313 [2:48:56<6:17:30, 2.65s/it] {'loss': 0.5374, 'grad_norm': 5.241528003933075, 'learning_rate': 4.065545818232695e-06, 'epoch': 0.31} 31%|███ | 3769/12313 [2:48:56<6:17:30, 2.65s/it] 31%|███ | 3770/12313 [2:48:59<6:19:15, 2.66s/it] {'loss': 0.5627, 'grad_norm': 4.210043089768735, 'learning_rate': 4.06503305042908e-06, 'epoch': 0.31} 31%|███ | 3770/12313 [2:48:59<6:19:15, 2.66s/it] 31%|███ | 3771/12313 [2:49:02<6:20:47, 2.67s/it] {'loss': 0.4803, 'grad_norm': 24.358962625710497, 'learning_rate': 4.064520174333462e-06, 'epoch': 0.31} 31%|███ | 3771/12313 [2:49:02<6:20:47, 2.67s/it] 31%|███ | 3772/12313 [2:49:04<6:19:17, 2.66s/it] {'loss': 0.5884, 'grad_norm': 5.001904028877196, 'learning_rate': 4.0640071899813284e-06, 'epoch': 0.31} 31%|███ | 3772/12313 [2:49:04<6:19:17, 2.66s/it] 31%|███ | 3773/12313 [2:49:07<6:12:58, 2.62s/it] {'loss': 0.5976, 'grad_norm': 3.685895651696086, 'learning_rate': 4.0634940974081735e-06, 'epoch': 0.31} 31%|███ | 3773/12313 [2:49:07<6:12:58, 2.62s/it] 31%|███ | 3774/12313 [2:49:10<6:16:38, 2.65s/it] {'loss': 0.4432, 'grad_norm': 4.690707174624639, 'learning_rate': 4.062980896649502e-06, 'epoch': 0.31} 31%|███ | 3774/12313 [2:49:10<6:16:38, 2.65s/it] 31%|███ | 3775/12313 [2:49:12<6:18:18, 2.66s/it] {'loss': 0.5297, 'grad_norm': 7.66057017392417, 'learning_rate': 4.062467587740825e-06, 'epoch': 0.31} 31%|███ | 3775/12313 [2:49:12<6:18:18, 2.66s/it] 31%|███ | 3776/12313 [2:49:15<6:20:38, 2.68s/it] {'loss': 0.5717, 'grad_norm': 4.80151641086556, 'learning_rate': 4.0619541707176595e-06, 'epoch': 0.31} 31%|███ | 3776/12313 [2:49:15<6:20:38, 2.68s/it] 31%|███ | 3777/12313 [2:49:18<6:15:39, 2.64s/it] {'loss': 0.4853, 'grad_norm': 5.073279099762993, 'learning_rate': 4.061440645615532e-06, 'epoch': 0.31} 31%|███ | 3777/12313 [2:49:18<6:15:39, 2.64s/it] 31%|███ | 3778/12313 [2:49:20<6:07:33, 2.58s/it] {'loss': 0.5093, 'grad_norm': 6.406852301630108, 'learning_rate': 4.060927012469976e-06, 'epoch': 0.31} 31%|███ | 3778/12313 [2:49:20<6:07:33, 2.58s/it] 31%|███ | 3779/12313 [2:49:23<6:22:09, 2.69s/it] {'loss': 0.4647, 'grad_norm': 3.919683009512748, 'learning_rate': 4.060413271316531e-06, 'epoch': 0.31} 31%|███ | 3779/12313 [2:49:23<6:22:09, 2.69s/it] 31%|███ | 3780/12313 [2:49:26<6:25:40, 2.71s/it] {'loss': 0.3816, 'grad_norm': 4.810874854512679, 'learning_rate': 4.059899422190747e-06, 'epoch': 0.31} 31%|███ | 3780/12313 [2:49:26<6:25:40, 2.71s/it] 31%|███ | 3781/12313 [2:49:28<6:14:28, 2.63s/it] {'loss': 0.4818, 'grad_norm': 7.470318516960077, 'learning_rate': 4.059385465128179e-06, 'epoch': 0.31} 31%|███ | 3781/12313 [2:49:28<6:14:28, 2.63s/it] 31%|███ | 3782/12313 [2:49:31<6:07:31, 2.58s/it] {'loss': 0.6359, 'grad_norm': 4.142497626949441, 'learning_rate': 4.058871400164388e-06, 'epoch': 0.31} 31%|███ | 3782/12313 [2:49:31<6:07:31, 2.58s/it] 31%|███ | 3783/12313 [2:49:33<6:06:54, 2.58s/it] {'loss': 0.5755, 'grad_norm': 4.107423918701493, 'learning_rate': 4.058357227334947e-06, 'epoch': 0.31} 31%|███ | 3783/12313 [2:49:33<6:06:54, 2.58s/it] 31%|███ | 3784/12313 [2:49:36<6:17:33, 2.66s/it] {'loss': 0.4618, 'grad_norm': 7.190290906025693, 'learning_rate': 4.057842946675434e-06, 'epoch': 0.31} 31%|███ | 3784/12313 [2:49:36<6:17:33, 2.66s/it] 31%|███ | 3785/12313 [2:49:39<6:15:12, 2.64s/it] {'loss': 0.4605, 'grad_norm': 6.779887515845196, 'learning_rate': 4.057328558221434e-06, 'epoch': 0.31} 31%|███ | 3785/12313 [2:49:39<6:15:12, 2.64s/it] 31%|███ | 3786/12313 [2:49:41<6:18:36, 2.66s/it] {'loss': 0.4735, 'grad_norm': 6.8626494300279495, 'learning_rate': 4.056814062008539e-06, 'epoch': 0.31} 31%|███ | 3786/12313 [2:49:41<6:18:36, 2.66s/it] 31%|███ | 3787/12313 [2:49:44<6:15:19, 2.64s/it] {'loss': 0.5549, 'grad_norm': 4.43663040407582, 'learning_rate': 4.056299458072351e-06, 'epoch': 0.31} 31%|███ | 3787/12313 [2:49:44<6:15:19, 2.64s/it] 31%|███ | 3788/12313 [2:49:47<6:16:25, 2.65s/it] {'loss': 0.6012, 'grad_norm': 9.927091358251289, 'learning_rate': 4.0557847464484766e-06, 'epoch': 0.31} 31%|███ | 3788/12313 [2:49:47<6:16:25, 2.65s/it] 31%|███ | 3789/12313 [2:49:49<6:15:55, 2.65s/it] {'loss': 0.5754, 'grad_norm': 3.3047332458402874, 'learning_rate': 4.055269927172532e-06, 'epoch': 0.31} 31%|███ | 3789/12313 [2:49:49<6:15:55, 2.65s/it] 31%|███ | 3790/12313 [2:49:52<6:36:16, 2.79s/it] {'loss': 0.6482, 'grad_norm': 4.086984576275844, 'learning_rate': 4.054755000280139e-06, 'epoch': 0.31} 31%|███ | 3790/12313 [2:49:52<6:36:16, 2.79s/it] 31%|███ | 3791/12313 [2:49:55<6:35:59, 2.79s/it] {'loss': 0.6541, 'grad_norm': 10.124343594059361, 'learning_rate': 4.054239965806929e-06, 'epoch': 0.31} 31%|███ | 3791/12313 [2:49:55<6:35:59, 2.79s/it] 31%|███ | 3792/12313 [2:49:58<6:38:23, 2.81s/it] {'loss': 0.5313, 'grad_norm': 9.088285597560201, 'learning_rate': 4.053724823788538e-06, 'epoch': 0.31} 31%|███ | 3792/12313 [2:49:58<6:38:23, 2.81s/it] 31%|███ | 3793/12313 [2:50:01<6:45:44, 2.86s/it] {'loss': 0.5672, 'grad_norm': 10.64301926225871, 'learning_rate': 4.053209574260614e-06, 'epoch': 0.31} 31%|███ | 3793/12313 [2:50:01<6:45:44, 2.86s/it] 31%|███ | 3794/12313 [2:50:04<6:44:08, 2.85s/it] {'loss': 0.4908, 'grad_norm': 4.299438598485095, 'learning_rate': 4.052694217258806e-06, 'epoch': 0.31} 31%|███ | 3794/12313 [2:50:04<6:44:08, 2.85s/it] 31%|███ | 3795/12313 [2:50:07<6:47:05, 2.87s/it] {'loss': 0.4808, 'grad_norm': 50.64834820262253, 'learning_rate': 4.052178752818776e-06, 'epoch': 0.31} 31%|███ | 3795/12313 [2:50:07<6:47:05, 2.87s/it] 31%|███ | 3796/12313 [2:50:10<6:51:20, 2.90s/it] {'loss': 0.6804, 'grad_norm': 3.318247558859262, 'learning_rate': 4.051663180976192e-06, 'epoch': 0.31} 31%|███ | 3796/12313 [2:50:10<6:51:20, 2.90s/it] 31%|███ | 3797/12313 [2:50:12<6:40:19, 2.82s/it] {'loss': 0.5227, 'grad_norm': 15.92488834517386, 'learning_rate': 4.051147501766727e-06, 'epoch': 0.31} 31%|███ | 3797/12313 [2:50:12<6:40:19, 2.82s/it] 31%|███ | 3798/12313 [2:50:16<6:51:50, 2.90s/it] {'loss': 0.5733, 'grad_norm': 3.980666622212895, 'learning_rate': 4.050631715226064e-06, 'epoch': 0.31} 31%|███ | 3798/12313 [2:50:16<6:51:50, 2.90s/it] 31%|███ | 3799/12313 [2:50:18<6:43:24, 2.84s/it] {'loss': 0.4722, 'grad_norm': 5.924342358250464, 'learning_rate': 4.050115821389894e-06, 'epoch': 0.31} 31%|███ | 3799/12313 [2:50:18<6:43:24, 2.84s/it] 31%|███ | 3800/12313 [2:50:21<6:38:26, 2.81s/it] {'loss': 0.5608, 'grad_norm': 6.826343924107241, 'learning_rate': 4.049599820293913e-06, 'epoch': 0.31} 31%|███ | 3800/12313 [2:50:21<6:38:26, 2.81s/it] 31%|███ | 3801/12313 [2:50:25<7:17:07, 3.08s/it] {'loss': 0.482, 'grad_norm': 5.798254498370531, 'learning_rate': 4.049083711973824e-06, 'epoch': 0.31} 31%|███ | 3801/12313 [2:50:25<7:17:07, 3.08s/it] 31%|███ | 3802/12313 [2:50:27<6:53:49, 2.92s/it] {'loss': 0.4834, 'grad_norm': 4.3433768010776905, 'learning_rate': 4.0485674964653424e-06, 'epoch': 0.31} 31%|███ | 3802/12313 [2:50:27<6:53:49, 2.92s/it] 31%|███ | 3803/12313 [2:50:30<6:28:29, 2.74s/it] {'loss': 0.5556, 'grad_norm': 4.326553027616668, 'learning_rate': 4.048051173804185e-06, 'epoch': 0.31} 31%|███ | 3803/12313 [2:50:30<6:28:29, 2.74s/it] 31%|███ | 3804/12313 [2:50:32<6:26:18, 2.72s/it] {'loss': 0.6, 'grad_norm': 6.800606044283711, 'learning_rate': 4.047534744026079e-06, 'epoch': 0.31} 31%|███ | 3804/12313 [2:50:32<6:26:18, 2.72s/it] 31%|███ | 3805/12313 [2:50:35<6:20:06, 2.68s/it] {'loss': 0.4796, 'grad_norm': 10.590880966190726, 'learning_rate': 4.04701820716676e-06, 'epoch': 0.31} 31%|███ | 3805/12313 [2:50:35<6:20:06, 2.68s/it] 31%|███ | 3806/12313 [2:50:38<6:22:12, 2.70s/it] {'loss': 0.5129, 'grad_norm': 5.6795091144428635, 'learning_rate': 4.046501563261968e-06, 'epoch': 0.31} 31%|███ | 3806/12313 [2:50:38<6:22:12, 2.70s/it] 31%|███ | 3807/12313 [2:50:40<6:17:44, 2.66s/it] {'loss': 0.5332, 'grad_norm': 3.210645271484872, 'learning_rate': 4.045984812347452e-06, 'epoch': 0.31} 31%|███ | 3807/12313 [2:50:40<6:17:44, 2.66s/it] 31%|███ | 3808/12313 [2:50:43<6:08:20, 2.60s/it] {'loss': 0.6181, 'grad_norm': 6.294410231720947, 'learning_rate': 4.045467954458969e-06, 'epoch': 0.31} 31%|███ | 3808/12313 [2:50:43<6:08:20, 2.60s/it] 31%|███ | 3809/12313 [2:50:45<6:07:09, 2.59s/it] {'loss': 0.4792, 'grad_norm': 4.474575703083307, 'learning_rate': 4.044950989632283e-06, 'epoch': 0.31} 31%|███ | 3809/12313 [2:50:45<6:07:09, 2.59s/it] 31%|███ | 3810/12313 [2:50:48<6:05:47, 2.58s/it] {'loss': 0.5667, 'grad_norm': 4.779056570525405, 'learning_rate': 4.044433917903166e-06, 'epoch': 0.31} 31%|███ | 3810/12313 [2:50:48<6:05:47, 2.58s/it] 31%|███ | 3811/12313 [2:50:50<6:08:08, 2.60s/it] {'loss': 0.7153, 'grad_norm': 4.782139569666095, 'learning_rate': 4.043916739307394e-06, 'epoch': 0.31} 31%|███ | 3811/12313 [2:50:50<6:08:08, 2.60s/it] 31%|███ | 3812/12313 [2:50:53<6:06:05, 2.58s/it] {'loss': 0.5078, 'grad_norm': 6.550010939029318, 'learning_rate': 4.0433994538807564e-06, 'epoch': 0.31} 31%|███ | 3812/12313 [2:50:53<6:06:05, 2.58s/it] 31%|███ | 3813/12313 [2:50:55<6:05:33, 2.58s/it] {'loss': 0.5411, 'grad_norm': 7.70154315773568, 'learning_rate': 4.042882061659043e-06, 'epoch': 0.31} 31%|███ | 3813/12313 [2:50:55<6:05:33, 2.58s/it] 31%|███ | 3814/12313 [2:50:58<6:07:11, 2.59s/it] {'loss': 0.6321, 'grad_norm': 5.468435592136346, 'learning_rate': 4.042364562678059e-06, 'epoch': 0.31} 31%|███ | 3814/12313 [2:50:58<6:07:11, 2.59s/it] 31%|███ | 3815/12313 [2:51:01<6:19:25, 2.68s/it] {'loss': 0.46, 'grad_norm': 4.143944075441312, 'learning_rate': 4.041846956973608e-06, 'epoch': 0.31} 31%|███ | 3815/12313 [2:51:01<6:19:25, 2.68s/it] 31%|███ | 3816/12313 [2:51:04<6:15:31, 2.65s/it] {'loss': 0.4921, 'grad_norm': 4.798354632196031, 'learning_rate': 4.041329244581509e-06, 'epoch': 0.31} 31%|███ | 3816/12313 [2:51:04<6:15:31, 2.65s/it] 31%|███ | 3817/12313 [2:51:06<6:20:08, 2.68s/it] {'loss': 0.6239, 'grad_norm': 5.155312927681158, 'learning_rate': 4.040811425537583e-06, 'epoch': 0.31} 31%|███ | 3817/12313 [2:51:06<6:20:08, 2.68s/it] 31%|███ | 3818/12313 [2:51:09<6:13:45, 2.64s/it] {'loss': 0.6161, 'grad_norm': 3.701165442814568, 'learning_rate': 4.040293499877661e-06, 'epoch': 0.31} 31%|███ | 3818/12313 [2:51:09<6:13:45, 2.64s/it] 31%|███ | 3819/12313 [2:51:11<6:04:16, 2.57s/it] {'loss': 0.4338, 'grad_norm': 5.396166406448343, 'learning_rate': 4.039775467637581e-06, 'epoch': 0.31} 31%|███ | 3819/12313 [2:51:11<6:04:16, 2.57s/it] 31%|███ | 3820/12313 [2:51:14<6:05:13, 2.58s/it] {'loss': 0.7018, 'grad_norm': 4.72888620508123, 'learning_rate': 4.039257328853188e-06, 'epoch': 0.31} 31%|███ | 3820/12313 [2:51:14<6:05:13, 2.58s/it] 31%|███ | 3821/12313 [2:51:17<6:09:21, 2.61s/it] {'loss': 0.526, 'grad_norm': 4.775493593228451, 'learning_rate': 4.038739083560334e-06, 'epoch': 0.31} 31%|███ | 3821/12313 [2:51:17<6:09:21, 2.61s/it] 31%|███ | 3822/12313 [2:51:19<6:02:56, 2.56s/it] {'loss': 0.6318, 'grad_norm': 4.103968235635027, 'learning_rate': 4.038220731794878e-06, 'epoch': 0.31} 31%|███ | 3822/12313 [2:51:19<6:02:56, 2.56s/it] 31%|███ | 3823/12313 [2:51:22<6:13:59, 2.64s/it] {'loss': 0.5303, 'grad_norm': 8.31477225841899, 'learning_rate': 4.03770227359269e-06, 'epoch': 0.31} 31%|███ | 3823/12313 [2:51:22<6:13:59, 2.64s/it] 31%|███ | 3824/12313 [2:51:25<6:56:34, 2.94s/it] {'loss': 0.4775, 'grad_norm': 3.8725355108338766, 'learning_rate': 4.037183708989642e-06, 'epoch': 0.31} 31%|███ | 3824/12313 [2:51:25<6:56:34, 2.94s/it] 31%|███ | 3825/12313 [2:51:28<6:50:02, 2.90s/it] {'loss': 0.4905, 'grad_norm': 7.614229033887325, 'learning_rate': 4.0366650380216165e-06, 'epoch': 0.31} 31%|███ | 3825/12313 [2:51:28<6:50:02, 2.90s/it] 31%|███ | 3826/12313 [2:51:31<6:44:53, 2.86s/it] {'loss': 0.5249, 'grad_norm': 5.153358446059254, 'learning_rate': 4.036146260724503e-06, 'epoch': 0.31} 31%|███ | 3826/12313 [2:51:31<6:44:53, 2.86s/it] 31%|███ | 3827/12313 [2:51:33<6:25:58, 2.73s/it] {'loss': 0.6979, 'grad_norm': 8.578326851480215, 'learning_rate': 4.0356273771341984e-06, 'epoch': 0.31} 31%|███ | 3827/12313 [2:51:33<6:25:58, 2.73s/it] 31%|███ | 3828/12313 [2:51:37<6:47:19, 2.88s/it] {'loss': 0.4557, 'grad_norm': 5.785574362196001, 'learning_rate': 4.035108387286607e-06, 'epoch': 0.31} 31%|███ | 3828/12313 [2:51:37<6:47:19, 2.88s/it] 31%|███ | 3829/12313 [2:51:39<6:35:05, 2.79s/it] {'loss': 0.6176, 'grad_norm': 3.55379778886365, 'learning_rate': 4.03458929121764e-06, 'epoch': 0.31} 31%|███ | 3829/12313 [2:51:39<6:35:05, 2.79s/it] 31%|███ | 3830/12313 [2:51:42<6:31:52, 2.77s/it] {'loss': 0.7673, 'grad_norm': 7.612980594268243, 'learning_rate': 4.0340700889632145e-06, 'epoch': 0.31} 31%|███ | 3830/12313 [2:51:42<6:31:52, 2.77s/it] 31%|███ | 3831/12313 [2:51:45<6:31:31, 2.77s/it] {'loss': 0.5036, 'grad_norm': 9.252507294094109, 'learning_rate': 4.033550780559258e-06, 'epoch': 0.31} 31%|███ | 3831/12313 [2:51:45<6:31:31, 2.77s/it] 31%|███ | 3832/12313 [2:51:47<6:23:52, 2.72s/it] {'loss': 0.4821, 'grad_norm': 3.594686390622405, 'learning_rate': 4.033031366041704e-06, 'epoch': 0.31} 31%|███ | 3832/12313 [2:51:47<6:23:52, 2.72s/it] 31%|███ | 3833/12313 [2:51:50<6:12:31, 2.64s/it] {'loss': 0.5007, 'grad_norm': 6.285694736961024, 'learning_rate': 4.0325118454464935e-06, 'epoch': 0.31} 31%|███ | 3833/12313 [2:51:50<6:12:31, 2.64s/it] 31%|███ | 3834/12313 [2:51:53<6:25:19, 2.73s/it] {'loss': 0.5703, 'grad_norm': 3.160762643660874, 'learning_rate': 4.031992218809573e-06, 'epoch': 0.31} 31%|███ | 3834/12313 [2:51:53<6:25:19, 2.73s/it] 31%|███ | 3835/12313 [2:51:55<6:23:02, 2.71s/it] {'loss': 0.5602, 'grad_norm': 8.890289793696613, 'learning_rate': 4.0314724861669e-06, 'epoch': 0.31} 31%|███ | 3835/12313 [2:51:55<6:23:02, 2.71s/it] 31%|███ | 3836/12313 [2:51:58<6:19:26, 2.69s/it] {'loss': 0.6061, 'grad_norm': 4.321571181434268, 'learning_rate': 4.0309526475544355e-06, 'epoch': 0.31} 31%|███ | 3836/12313 [2:51:58<6:19:26, 2.69s/it] 31%|███ | 3837/12313 [2:52:01<6:16:11, 2.66s/it] {'loss': 0.5242, 'grad_norm': 7.365273193938057, 'learning_rate': 4.03043270300815e-06, 'epoch': 0.31} 31%|███ | 3837/12313 [2:52:01<6:16:11, 2.66s/it] 31%|███ | 3838/12313 [2:52:03<6:14:40, 2.65s/it] {'loss': 0.527, 'grad_norm': 6.152202896973093, 'learning_rate': 4.029912652564022e-06, 'epoch': 0.31} 31%|███ | 3838/12313 [2:52:03<6:14:40, 2.65s/it] 31%|███ | 3839/12313 [2:52:06<6:10:08, 2.62s/it] {'loss': 0.6952, 'grad_norm': 9.939281394272458, 'learning_rate': 4.029392496258035e-06, 'epoch': 0.31} 31%|███ | 3839/12313 [2:52:06<6:10:08, 2.62s/it] 31%|███ | 3840/12313 [2:52:09<6:14:39, 2.65s/it] {'loss': 0.4901, 'grad_norm': 7.9876078180108685, 'learning_rate': 4.028872234126181e-06, 'epoch': 0.31} 31%|███ | 3840/12313 [2:52:09<6:14:39, 2.65s/it] 31%|███ | 3841/12313 [2:52:11<6:17:00, 2.67s/it] {'loss': 0.4127, 'grad_norm': 4.522557141556952, 'learning_rate': 4.02835186620446e-06, 'epoch': 0.31} 31%|███ | 3841/12313 [2:52:11<6:17:00, 2.67s/it] 31%|███ | 3842/12313 [2:52:14<6:21:49, 2.70s/it] {'loss': 0.5045, 'grad_norm': 5.5314558917789505, 'learning_rate': 4.027831392528879e-06, 'epoch': 0.31} 31%|███ | 3842/12313 [2:52:14<6:21:49, 2.70s/it] 31%|███ | 3843/12313 [2:52:17<6:15:12, 2.66s/it] {'loss': 0.5654, 'grad_norm': 6.372253503387104, 'learning_rate': 4.027310813135451e-06, 'epoch': 0.31} 31%|███ | 3843/12313 [2:52:17<6:15:12, 2.66s/it] 31%|███ | 3844/12313 [2:52:19<6:14:24, 2.65s/it] {'loss': 0.4954, 'grad_norm': 3.5189797244740446, 'learning_rate': 4.0267901280601985e-06, 'epoch': 0.31} 31%|███ | 3844/12313 [2:52:19<6:14:24, 2.65s/it] 31%|███ | 3845/12313 [2:52:22<6:10:54, 2.63s/it] {'loss': 0.5535, 'grad_norm': 6.001913797204444, 'learning_rate': 4.026269337339149e-06, 'epoch': 0.31} 31%|███ | 3845/12313 [2:52:22<6:10:54, 2.63s/it] 31%|███ | 3846/12313 [2:52:24<6:11:52, 2.64s/it] {'loss': 0.5175, 'grad_norm': 6.805227975140487, 'learning_rate': 4.025748441008339e-06, 'epoch': 0.31} 31%|███ | 3846/12313 [2:52:24<6:11:52, 2.64s/it] 31%|███ | 3847/12313 [2:52:27<6:19:09, 2.69s/it] {'loss': 0.6538, 'grad_norm': 3.6933697953285423, 'learning_rate': 4.0252274391038125e-06, 'epoch': 0.31} 31%|███ | 3847/12313 [2:52:27<6:19:09, 2.69s/it] 31%|███▏ | 3848/12313 [2:52:30<6:14:15, 2.65s/it] {'loss': 0.6194, 'grad_norm': 4.975606249432986, 'learning_rate': 4.024706331661618e-06, 'epoch': 0.31} 31%|███▏ | 3848/12313 [2:52:30<6:14:15, 2.65s/it] 31%|███▏ | 3849/12313 [2:52:33<6:17:20, 2.67s/it] {'loss': 0.5108, 'grad_norm': 21.413001819693548, 'learning_rate': 4.024185118717816e-06, 'epoch': 0.31} 31%|███▏ | 3849/12313 [2:52:33<6:17:20, 2.67s/it] 31%|███▏ | 3850/12313 [2:52:35<6:13:37, 2.65s/it] {'loss': 0.5959, 'grad_norm': 4.688146949121658, 'learning_rate': 4.023663800308471e-06, 'epoch': 0.31} 31%|███▏ | 3850/12313 [2:52:35<6:13:37, 2.65s/it] 31%|███▏ | 3851/12313 [2:52:38<6:13:47, 2.65s/it] {'loss': 0.4974, 'grad_norm': 4.929413723785714, 'learning_rate': 4.023142376469653e-06, 'epoch': 0.31} 31%|███▏ | 3851/12313 [2:52:38<6:13:47, 2.65s/it] 31%|███▏ | 3852/12313 [2:52:41<6:16:52, 2.67s/it] {'loss': 0.5106, 'grad_norm': 7.165568666050521, 'learning_rate': 4.022620847237445e-06, 'epoch': 0.31} 31%|███▏ | 3852/12313 [2:52:41<6:16:52, 2.67s/it] 31%|███▏ | 3853/12313 [2:52:43<6:17:38, 2.68s/it] {'loss': 0.6104, 'grad_norm': 5.314358756089364, 'learning_rate': 4.022099212647933e-06, 'epoch': 0.31} 31%|███▏ | 3853/12313 [2:52:43<6:17:38, 2.68s/it] 31%|███▏ | 3854/12313 [2:52:46<6:06:49, 2.60s/it] {'loss': 0.587, 'grad_norm': 4.330125714880257, 'learning_rate': 4.021577472737209e-06, 'epoch': 0.31} 31%|███▏ | 3854/12313 [2:52:46<6:06:49, 2.60s/it] 31%|███▏ | 3855/12313 [2:52:48<5:59:11, 2.55s/it] {'loss': 0.4938, 'grad_norm': 4.654860553507325, 'learning_rate': 4.021055627541379e-06, 'epoch': 0.31} 31%|███▏ | 3855/12313 [2:52:48<5:59:11, 2.55s/it] 31%|███▏ | 3856/12313 [2:52:51<5:57:14, 2.53s/it] {'loss': 0.5747, 'grad_norm': 5.006258634790825, 'learning_rate': 4.020533677096549e-06, 'epoch': 0.31} 31%|███▏ | 3856/12313 [2:52:51<5:57:14, 2.53s/it] 31%|███▏ | 3857/12313 [2:52:54<6:15:24, 2.66s/it] {'loss': 0.5764, 'grad_norm': 6.0280062807614545, 'learning_rate': 4.020011621438836e-06, 'epoch': 0.31} 31%|███▏ | 3857/12313 [2:52:54<6:15:24, 2.66s/it] 31%|███▏ | 3858/12313 [2:52:56<6:12:25, 2.64s/it] {'loss': 0.5299, 'grad_norm': 6.574325175703227, 'learning_rate': 4.019489460604364e-06, 'epoch': 0.31} 31%|███▏ | 3858/12313 [2:52:56<6:12:25, 2.64s/it] 31%|███▏ | 3859/12313 [2:52:59<6:09:20, 2.62s/it] {'loss': 0.7868, 'grad_norm': 4.080793981103117, 'learning_rate': 4.018967194629261e-06, 'epoch': 0.31} 31%|███▏ | 3859/12313 [2:52:59<6:09:20, 2.62s/it] 31%|███▏ | 3860/12313 [2:53:01<6:05:20, 2.59s/it] {'loss': 0.6058, 'grad_norm': 3.847772634998914, 'learning_rate': 4.0184448235496685e-06, 'epoch': 0.31} 31%|███▏ | 3860/12313 [2:53:01<6:05:20, 2.59s/it] 31%|███▏ | 3861/12313 [2:53:04<6:10:24, 2.63s/it] {'loss': 0.523, 'grad_norm': 4.404444873697644, 'learning_rate': 4.017922347401731e-06, 'epoch': 0.31} 31%|███▏ | 3861/12313 [2:53:04<6:10:24, 2.63s/it] 31%|███▏ | 3862/12313 [2:53:07<6:30:02, 2.77s/it] {'loss': 0.4319, 'grad_norm': 9.45105154996585, 'learning_rate': 4.017399766221599e-06, 'epoch': 0.31} 31%|███▏ | 3862/12313 [2:53:07<6:30:02, 2.77s/it][2024-12-05 15:26:35,531] [WARNING] [stage3.py:1949:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time 31%|███▏ | 3863/12313 [2:53:11<7:26:03, 3.17s/it] {'loss': 0.4388, 'grad_norm': 4.265430944389896, 'learning_rate': 4.016877080045435e-06, 'epoch': 0.31} 31%|███▏ | 3863/12313 [2:53:11<7:26:03, 3.17s/it] 31%|███▏ | 3864/12313 [2:53:15<7:35:41, 3.24s/it] {'loss': 0.4834, 'grad_norm': 4.626292776222082, 'learning_rate': 4.016354288909405e-06, 'epoch': 0.31} 31%|███▏ | 3864/12313 [2:53:15<7:35:41, 3.24s/it] 31%|███▏ | 3865/12313 [2:53:17<7:06:25, 3.03s/it] {'loss': 0.5888, 'grad_norm': 3.893386508353787, 'learning_rate': 4.0158313928496826e-06, 'epoch': 0.31} 31%|███▏ | 3865/12313 [2:53:17<7:06:25, 3.03s/it] 31%|███▏ | 3866/12313 [2:53:20<6:59:23, 2.98s/it] {'loss': 0.5323, 'grad_norm': 5.310652622885137, 'learning_rate': 4.015308391902452e-06, 'epoch': 0.31} 31%|███▏ | 3866/12313 [2:53:20<6:59:23, 2.98s/it] 31%|███▏ | 3867/12313 [2:53:22<6:40:59, 2.85s/it] {'loss': 0.5397, 'grad_norm': 6.885983808946358, 'learning_rate': 4.014785286103898e-06, 'epoch': 0.31} 31%|███▏ | 3867/12313 [2:53:22<6:40:59, 2.85s/it] 31%|███▏ | 3868/12313 [2:53:25<6:30:16, 2.77s/it] {'loss': 0.4684, 'grad_norm': 7.793035367661995, 'learning_rate': 4.014262075490221e-06, 'epoch': 0.31} 31%|███▏ | 3868/12313 [2:53:25<6:30:16, 2.77s/it] 31%|███▏ | 3869/12313 [2:53:28<6:24:37, 2.73s/it] {'loss': 0.5751, 'grad_norm': 14.532888308919345, 'learning_rate': 4.013738760097622e-06, 'epoch': 0.31} 31%|███▏ | 3869/12313 [2:53:28<6:24:37, 2.73s/it] 31%|███▏ | 3870/12313 [2:53:30<6:18:53, 2.69s/it] {'loss': 0.5754, 'grad_norm': 6.316239281714061, 'learning_rate': 4.0132153399623106e-06, 'epoch': 0.31} 31%|███▏ | 3870/12313 [2:53:30<6:18:53, 2.69s/it] 31%|███▏ | 3871/12313 [2:53:33<6:31:29, 2.78s/it] {'loss': 0.5718, 'grad_norm': 5.218130390396817, 'learning_rate': 4.012691815120508e-06, 'epoch': 0.31} 31%|███▏ | 3871/12313 [2:53:33<6:31:29, 2.78s/it] 31%|███▏ | 3872/12313 [2:53:36<6:25:50, 2.74s/it] {'loss': 0.4341, 'grad_norm': 10.486211867182627, 'learning_rate': 4.012168185608437e-06, 'epoch': 0.31} 31%|███▏ | 3872/12313 [2:53:36<6:25:50, 2.74s/it] 31%|███▏ | 3873/12313 [2:53:39<6:22:33, 2.72s/it] {'loss': 0.6967, 'grad_norm': 13.04476716032136, 'learning_rate': 4.011644451462331e-06, 'epoch': 0.31} 31%|███▏ | 3873/12313 [2:53:39<6:22:33, 2.72s/it] 31%|███▏ | 3874/12313 [2:53:41<6:16:34, 2.68s/it] {'loss': 0.5778, 'grad_norm': 4.374789125818755, 'learning_rate': 4.011120612718429e-06, 'epoch': 0.31} 31%|███▏ | 3874/12313 [2:53:41<6:16:34, 2.68s/it] 31%|███▏ | 3875/12313 [2:53:44<6:12:48, 2.65s/it] {'loss': 0.5147, 'grad_norm': 4.427126765436375, 'learning_rate': 4.010596669412978e-06, 'epoch': 0.31} 31%|███▏ | 3875/12313 [2:53:44<6:12:48, 2.65s/it] 31%|███▏ | 3876/12313 [2:53:47<6:16:55, 2.68s/it] {'loss': 0.4541, 'grad_norm': 6.121898739230059, 'learning_rate': 4.010072621582233e-06, 'epoch': 0.31} 31%|███▏ | 3876/12313 [2:53:47<6:16:55, 2.68s/it] 31%|███▏ | 3877/12313 [2:53:49<6:20:31, 2.71s/it] {'loss': 0.7081, 'grad_norm': 8.191495131008242, 'learning_rate': 4.009548469262453e-06, 'epoch': 0.31} 31%|███▏ | 3877/12313 [2:53:49<6:20:31, 2.71s/it] 31%|███▏ | 3878/12313 [2:53:52<6:24:18, 2.73s/it] {'loss': 0.588, 'grad_norm': 5.939103893735796, 'learning_rate': 4.009024212489909e-06, 'epoch': 0.31} 31%|███▏ | 3878/12313 [2:53:52<6:24:18, 2.73s/it] 32%|███▏ | 3879/12313 [2:53:55<6:20:11, 2.70s/it] {'loss': 0.5442, 'grad_norm': 5.468809556850152, 'learning_rate': 4.0084998513008765e-06, 'epoch': 0.32} 32%|███▏ | 3879/12313 [2:53:55<6:20:11, 2.70s/it] 32%|███▏ | 3880/12313 [2:53:57<6:15:40, 2.67s/it] {'loss': 0.771, 'grad_norm': 3.760788030695487, 'learning_rate': 4.007975385731637e-06, 'epoch': 0.32} 32%|███▏ | 3880/12313 [2:53:57<6:15:40, 2.67s/it] 32%|███▏ | 3881/12313 [2:54:00<6:20:24, 2.71s/it] {'loss': 0.5091, 'grad_norm': 4.150500328342237, 'learning_rate': 4.007450815818481e-06, 'epoch': 0.32} 32%|███▏ | 3881/12313 [2:54:00<6:20:24, 2.71s/it] 32%|███▏ | 3882/12313 [2:54:03<6:14:03, 2.66s/it] {'loss': 0.5819, 'grad_norm': 4.7154167246286285, 'learning_rate': 4.0069261415977075e-06, 'epoch': 0.32} 32%|███▏ | 3882/12313 [2:54:03<6:14:03, 2.66s/it] 32%|███▏ | 3883/12313 [2:54:05<6:12:36, 2.65s/it] {'loss': 0.5721, 'grad_norm': 4.032822924231925, 'learning_rate': 4.006401363105621e-06, 'epoch': 0.32} 32%|███▏ | 3883/12313 [2:54:05<6:12:36, 2.65s/it] 32%|███▏ | 3884/12313 [2:54:08<6:12:03, 2.65s/it] {'loss': 0.6929, 'grad_norm': 4.862866759718993, 'learning_rate': 4.0058764803785325e-06, 'epoch': 0.32} 32%|███▏ | 3884/12313 [2:54:08<6:12:03, 2.65s/it] 32%|███▏ | 3885/12313 [2:54:11<6:18:53, 2.70s/it] {'loss': 0.5521, 'grad_norm': 3.918915992085951, 'learning_rate': 4.00535149345276e-06, 'epoch': 0.32} 32%|███▏ | 3885/12313 [2:54:11<6:18:53, 2.70s/it] 32%|███▏ | 3886/12313 [2:54:14<6:24:24, 2.74s/it] {'loss': 0.5472, 'grad_norm': 5.105255048171445, 'learning_rate': 4.0048264023646325e-06, 'epoch': 0.32} 32%|███▏ | 3886/12313 [2:54:14<6:24:24, 2.74s/it] 32%|███▏ | 3887/12313 [2:54:16<6:16:59, 2.68s/it] {'loss': 0.4525, 'grad_norm': 8.077151245590299, 'learning_rate': 4.004301207150482e-06, 'epoch': 0.32} 32%|███▏ | 3887/12313 [2:54:16<6:16:59, 2.68s/it] 32%|███▏ | 3888/12313 [2:54:19<6:14:59, 2.67s/it] {'loss': 0.566, 'grad_norm': 8.364488519427972, 'learning_rate': 4.003775907846648e-06, 'epoch': 0.32} 32%|███▏ | 3888/12313 [2:54:19<6:14:59, 2.67s/it] 32%|███▏ | 3889/12313 [2:54:22<6:18:03, 2.69s/it] {'loss': 0.4639, 'grad_norm': 4.65775942487372, 'learning_rate': 4.003250504489481e-06, 'epoch': 0.32} 32%|███▏ | 3889/12313 [2:54:22<6:18:03, 2.69s/it] 32%|███▏ | 3890/12313 [2:54:24<6:21:33, 2.72s/it] {'loss': 0.3836, 'grad_norm': 10.681842547013567, 'learning_rate': 4.002724997115335e-06, 'epoch': 0.32} 32%|███▏ | 3890/12313 [2:54:24<6:21:33, 2.72s/it] 32%|███▏ | 3891/12313 [2:54:27<6:12:44, 2.66s/it] {'loss': 0.5612, 'grad_norm': 4.910721299022231, 'learning_rate': 4.002199385760571e-06, 'epoch': 0.32} 32%|███▏ | 3891/12313 [2:54:27<6:12:44, 2.66s/it] 32%|███▏ | 3892/12313 [2:54:29<6:09:58, 2.64s/it] {'loss': 0.6232, 'grad_norm': 5.963214780659344, 'learning_rate': 4.001673670461561e-06, 'epoch': 0.32} 32%|███▏ | 3892/12313 [2:54:29<6:09:58, 2.64s/it] 32%|███▏ | 3893/12313 [2:54:32<6:13:27, 2.66s/it] {'loss': 0.5664, 'grad_norm': 6.652157701535889, 'learning_rate': 4.0011478512546805e-06, 'epoch': 0.32} 32%|███▏ | 3893/12313 [2:54:32<6:13:27, 2.66s/it] 32%|███▏ | 3894/12313 [2:54:35<6:08:47, 2.63s/it] {'loss': 0.5419, 'grad_norm': 3.8712273283479233, 'learning_rate': 4.000621928176313e-06, 'epoch': 0.32} 32%|███▏ | 3894/12313 [2:54:35<6:08:47, 2.63s/it] 32%|███▏ | 3895/12313 [2:54:37<6:06:26, 2.61s/it] {'loss': 0.5002, 'grad_norm': 5.721147211228416, 'learning_rate': 4.000095901262851e-06, 'epoch': 0.32} 32%|███▏ | 3895/12313 [2:54:37<6:06:26, 2.61s/it] 32%|███▏ | 3896/12313 [2:54:40<6:00:58, 2.57s/it] {'loss': 0.5785, 'grad_norm': 5.522015478399547, 'learning_rate': 3.99956977055069e-06, 'epoch': 0.32} 32%|███▏ | 3896/12313 [2:54:40<6:00:58, 2.57s/it] 32%|███▏ | 3897/12313 [2:54:42<6:00:14, 2.57s/it] {'loss': 0.5987, 'grad_norm': 12.710524189514066, 'learning_rate': 3.999043536076238e-06, 'epoch': 0.32} 32%|███▏ | 3897/12313 [2:54:42<6:00:14, 2.57s/it] 32%|███▏ | 3898/12313 [2:54:45<6:07:11, 2.62s/it] {'loss': 0.5758, 'grad_norm': 15.432871133292979, 'learning_rate': 3.998517197875908e-06, 'epoch': 0.32} 32%|███▏ | 3898/12313 [2:54:45<6:07:11, 2.62s/it] 32%|███▏ | 3899/12313 [2:54:48<6:21:01, 2.72s/it] {'loss': 0.3758, 'grad_norm': 3.2538000933185622, 'learning_rate': 3.997990755986117e-06, 'epoch': 0.32} 32%|███▏ | 3899/12313 [2:54:48<6:21:01, 2.72s/it] 32%|███▏ | 3900/12313 [2:54:51<6:23:38, 2.74s/it] {'loss': 0.6192, 'grad_norm': 6.504572502780165, 'learning_rate': 3.9974642104432945e-06, 'epoch': 0.32} 32%|███▏ | 3900/12313 [2:54:51<6:23:38, 2.74s/it] 32%|███▏ | 3901/12313 [2:54:53<6:16:10, 2.68s/it] {'loss': 0.5603, 'grad_norm': 5.380343912168226, 'learning_rate': 3.996937561283874e-06, 'epoch': 0.32} 32%|███▏ | 3901/12313 [2:54:53<6:16:10, 2.68s/it] 32%|███▏ | 3902/12313 [2:54:56<6:06:02, 2.61s/it] {'loss': 0.5061, 'grad_norm': 8.424493108304915, 'learning_rate': 3.996410808544296e-06, 'epoch': 0.32} 32%|███▏ | 3902/12313 [2:54:56<6:06:02, 2.61s/it] 32%|███▏ | 3903/12313 [2:54:58<6:09:13, 2.63s/it] {'loss': 0.5234, 'grad_norm': 5.3477146060500225, 'learning_rate': 3.99588395226101e-06, 'epoch': 0.32} 32%|███▏ | 3903/12313 [2:54:58<6:09:13, 2.63s/it] 32%|███▏ | 3904/12313 [2:55:01<6:01:38, 2.58s/it] {'loss': 0.5047, 'grad_norm': 12.609875677912806, 'learning_rate': 3.9953569924704715e-06, 'epoch': 0.32} 32%|███▏ | 3904/12313 [2:55:01<6:01:38, 2.58s/it] 32%|███▏ | 3905/12313 [2:55:04<6:20:21, 2.71s/it] {'loss': 0.5393, 'grad_norm': 6.884048976659566, 'learning_rate': 3.994829929209143e-06, 'epoch': 0.32} 32%|███▏ | 3905/12313 [2:55:04<6:20:21, 2.71s/it] 32%|███▏ | 3906/12313 [2:55:07<6:18:30, 2.70s/it] {'loss': 0.5701, 'grad_norm': 5.4396250935337465, 'learning_rate': 3.994302762513496e-06, 'epoch': 0.32} 32%|███▏ | 3906/12313 [2:55:07<6:18:30, 2.70s/it] 32%|███▏ | 3907/12313 [2:55:09<6:17:48, 2.70s/it] {'loss': 0.8038, 'grad_norm': 3.9789051349804905, 'learning_rate': 3.993775492420005e-06, 'epoch': 0.32} 32%|███▏ | 3907/12313 [2:55:09<6:17:48, 2.70s/it] 32%|███▏ | 3908/12313 [2:55:12<6:18:33, 2.70s/it] {'loss': 0.5355, 'grad_norm': 6.0730946208938335, 'learning_rate': 3.993248118965155e-06, 'epoch': 0.32} 32%|███▏ | 3908/12313 [2:55:12<6:18:33, 2.70s/it] 32%|███▏ | 3909/12313 [2:55:14<6:08:47, 2.63s/it] {'loss': 0.4949, 'grad_norm': 7.073767654590764, 'learning_rate': 3.992720642185439e-06, 'epoch': 0.32} 32%|███▏ | 3909/12313 [2:55:14<6:08:47, 2.63s/it] 32%|███▏ | 3910/12313 [2:55:17<6:15:06, 2.68s/it] {'loss': 0.5065, 'grad_norm': 6.028367836015118, 'learning_rate': 3.992193062117354e-06, 'epoch': 0.32} 32%|███▏ | 3910/12313 [2:55:17<6:15:06, 2.68s/it] 32%|███▏ | 3911/12313 [2:55:20<6:03:43, 2.60s/it] {'loss': 0.6328, 'grad_norm': 5.478458346134022, 'learning_rate': 3.991665378797408e-06, 'epoch': 0.32} 32%|███▏ | 3911/12313 [2:55:20<6:03:43, 2.60s/it] 32%|███▏ | 3912/12313 [2:55:22<6:06:38, 2.62s/it] {'loss': 0.3775, 'grad_norm': 5.80032080288049, 'learning_rate': 3.991137592262111e-06, 'epoch': 0.32} 32%|███▏ | 3912/12313 [2:55:22<6:06:38, 2.62s/it] 32%|███▏ | 3913/12313 [2:55:25<6:06:07, 2.62s/it] {'loss': 0.6169, 'grad_norm': 5.364477054479598, 'learning_rate': 3.990609702547985e-06, 'epoch': 0.32} 32%|███▏ | 3913/12313 [2:55:25<6:06:07, 2.62s/it] 32%|███▏ | 3914/12313 [2:55:27<6:02:49, 2.59s/it] {'loss': 0.5666, 'grad_norm': 6.357814340494215, 'learning_rate': 3.990081709691556e-06, 'epoch': 0.32} 32%|███▏ | 3914/12313 [2:55:27<6:02:49, 2.59s/it] 32%|███▏ | 3915/12313 [2:55:30<5:53:49, 2.53s/it] {'loss': 0.5291, 'grad_norm': 3.0964122690765055, 'learning_rate': 3.989553613729359e-06, 'epoch': 0.32} 32%|███▏ | 3915/12313 [2:55:30<5:53:49, 2.53s/it] 32%|███▏ | 3916/12313 [2:55:33<6:06:05, 2.62s/it] {'loss': 0.4764, 'grad_norm': 6.1633146191090615, 'learning_rate': 3.989025414697935e-06, 'epoch': 0.32} 32%|███▏ | 3916/12313 [2:55:33<6:06:05, 2.62s/it] 32%|███▏ | 3917/12313 [2:55:35<6:04:16, 2.60s/it] {'loss': 0.4923, 'grad_norm': 5.380367903861603, 'learning_rate': 3.988497112633834e-06, 'epoch': 0.32} 32%|███▏ | 3917/12313 [2:55:35<6:04:16, 2.60s/it] 32%|███▏ | 3918/12313 [2:55:38<6:06:07, 2.62s/it] {'loss': 0.5211, 'grad_norm': 4.635335577517618, 'learning_rate': 3.98796870757361e-06, 'epoch': 0.32} 32%|███▏ | 3918/12313 [2:55:38<6:06:07, 2.62s/it] 32%|███▏ | 3919/12313 [2:55:40<6:03:48, 2.60s/it] {'loss': 0.5608, 'grad_norm': 4.837892560831567, 'learning_rate': 3.987440199553826e-06, 'epoch': 0.32} 32%|███▏ | 3919/12313 [2:55:40<6:03:48, 2.60s/it] 32%|███▏ | 3920/12313 [2:55:43<6:16:07, 2.69s/it] {'loss': 0.5652, 'grad_norm': 5.412511747936819, 'learning_rate': 3.986911588611052e-06, 'epoch': 0.32} 32%|███▏ | 3920/12313 [2:55:43<6:16:07, 2.69s/it] 32%|███▏ | 3921/12313 [2:55:46<6:11:23, 2.66s/it] {'loss': 0.6123, 'grad_norm': 6.754512301450377, 'learning_rate': 3.986382874781866e-06, 'epoch': 0.32} 32%|███▏ | 3921/12313 [2:55:46<6:11:23, 2.66s/it] 32%|███▏ | 3922/12313 [2:55:49<6:11:08, 2.65s/it] {'loss': 0.4627, 'grad_norm': 3.851560698751757, 'learning_rate': 3.985854058102851e-06, 'epoch': 0.32} 32%|███▏ | 3922/12313 [2:55:49<6:11:08, 2.65s/it] 32%|███▏ | 3923/12313 [2:55:51<6:09:36, 2.64s/it] {'loss': 0.5239, 'grad_norm': 9.180563910365933, 'learning_rate': 3.9853251386106e-06, 'epoch': 0.32} 32%|███▏ | 3923/12313 [2:55:51<6:09:36, 2.64s/it] 32%|███▏ | 3924/12313 [2:55:54<6:08:23, 2.63s/it] {'loss': 0.4989, 'grad_norm': 6.398627354244881, 'learning_rate': 3.9847961163417094e-06, 'epoch': 0.32} 32%|███▏ | 3924/12313 [2:55:54<6:08:23, 2.63s/it] 32%|███▏ | 3925/12313 [2:55:57<6:16:42, 2.69s/it] {'loss': 0.5573, 'grad_norm': 4.516597706332039, 'learning_rate': 3.984266991332787e-06, 'epoch': 0.32} 32%|███▏ | 3925/12313 [2:55:57<6:16:42, 2.69s/it] 32%|███▏ | 3926/12313 [2:55:59<6:17:07, 2.70s/it] {'loss': 0.3659, 'grad_norm': 6.97958059945626, 'learning_rate': 3.9837377636204435e-06, 'epoch': 0.32} 32%|███▏ | 3926/12313 [2:55:59<6:17:07, 2.70s/it] 32%|███▏ | 3927/12313 [2:56:02<6:23:16, 2.74s/it] {'loss': 0.5665, 'grad_norm': 9.380173505814613, 'learning_rate': 3.983208433241298e-06, 'epoch': 0.32} 32%|███▏ | 3927/12313 [2:56:02<6:23:16, 2.74s/it] 32%|███▏ | 3928/12313 [2:56:05<6:12:36, 2.67s/it] {'loss': 0.6098, 'grad_norm': 5.179839950219316, 'learning_rate': 3.98267900023198e-06, 'epoch': 0.32} 32%|███▏ | 3928/12313 [2:56:05<6:12:36, 2.67s/it] 32%|███▏ | 3929/12313 [2:56:07<6:11:41, 2.66s/it] {'loss': 0.5793, 'grad_norm': 11.243873982204496, 'learning_rate': 3.982149464629123e-06, 'epoch': 0.32} 32%|███▏ | 3929/12313 [2:56:07<6:11:41, 2.66s/it] 32%|███▏ | 3930/12313 [2:56:11<6:49:05, 2.93s/it] {'loss': 0.5195, 'grad_norm': 4.1291560709585164, 'learning_rate': 3.981619826469366e-06, 'epoch': 0.32} 32%|███▏ | 3930/12313 [2:56:11<6:49:05, 2.93s/it] 32%|███▏ | 3931/12313 [2:56:13<6:31:09, 2.80s/it] {'loss': 0.5506, 'grad_norm': 4.373741253760804, 'learning_rate': 3.981090085789359e-06, 'epoch': 0.32} 32%|███▏ | 3931/12313 [2:56:13<6:31:09, 2.80s/it] 32%|███▏ | 3932/12313 [2:56:16<6:21:29, 2.73s/it] {'loss': 0.6606, 'grad_norm': 6.318431118742852, 'learning_rate': 3.980560242625756e-06, 'epoch': 0.32} 32%|███▏ | 3932/12313 [2:56:16<6:21:29, 2.73s/it] 32%|███▏ | 3933/12313 [2:56:19<6:19:46, 2.72s/it] {'loss': 0.6272, 'grad_norm': 4.240677812054353, 'learning_rate': 3.9800302970152205e-06, 'epoch': 0.32} 32%|███▏ | 3933/12313 [2:56:19<6:19:46, 2.72s/it] 32%|███▏ | 3934/12313 [2:56:21<6:08:51, 2.64s/it] {'loss': 0.6, 'grad_norm': 5.672962140852729, 'learning_rate': 3.9795002489944216e-06, 'epoch': 0.32} 32%|███▏ | 3934/12313 [2:56:21<6:08:51, 2.64s/it] 32%|███▏ | 3935/12313 [2:56:24<6:16:38, 2.70s/it] {'loss': 0.5626, 'grad_norm': 6.714289970142845, 'learning_rate': 3.978970098600035e-06, 'epoch': 0.32} 32%|███▏ | 3935/12313 [2:56:24<6:16:38, 2.70s/it] 32%|███▏ | 3936/12313 [2:56:26<6:06:39, 2.63s/it] {'loss': 0.4723, 'grad_norm': 4.761339278563214, 'learning_rate': 3.978439845868745e-06, 'epoch': 0.32} 32%|███▏ | 3936/12313 [2:56:26<6:06:39, 2.63s/it] 32%|███▏ | 3937/12313 [2:56:29<6:21:41, 2.73s/it] {'loss': 0.5259, 'grad_norm': 13.677606899149971, 'learning_rate': 3.977909490837242e-06, 'epoch': 0.32} 32%|███▏ | 3937/12313 [2:56:29<6:21:41, 2.73s/it] 32%|███▏ | 3938/12313 [2:56:32<6:17:46, 2.71s/it] {'loss': 0.438, 'grad_norm': 6.128118023978932, 'learning_rate': 3.977379033542225e-06, 'epoch': 0.32} 32%|███▏ | 3938/12313 [2:56:32<6:17:46, 2.71s/it] 32%|███▏ | 3939/12313 [2:56:35<6:11:18, 2.66s/it] {'loss': 0.4292, 'grad_norm': 6.303253700795032, 'learning_rate': 3.976848474020397e-06, 'epoch': 0.32} 32%|███▏ | 3939/12313 [2:56:35<6:11:18, 2.66s/it] 32%|███▏ | 3940/12313 [2:56:37<6:08:37, 2.64s/it] {'loss': 0.635, 'grad_norm': 10.396013099121385, 'learning_rate': 3.97631781230847e-06, 'epoch': 0.32} 32%|███▏ | 3940/12313 [2:56:37<6:08:37, 2.64s/it] 32%|███▏ | 3941/12313 [2:56:40<6:22:54, 2.74s/it] {'loss': 0.7294, 'grad_norm': 3.2724038031097575, 'learning_rate': 3.975787048443165e-06, 'epoch': 0.32} 32%|███▏ | 3941/12313 [2:56:40<6:22:54, 2.74s/it] 32%|███▏ | 3942/12313 [2:56:43<6:18:56, 2.72s/it] {'loss': 0.5492, 'grad_norm': 5.0292671115469645, 'learning_rate': 3.975256182461206e-06, 'epoch': 0.32} 32%|███▏ | 3942/12313 [2:56:43<6:18:56, 2.72s/it] 32%|███▏ | 3943/12313 [2:56:45<6:13:56, 2.68s/it] {'loss': 0.5846, 'grad_norm': 15.222409035535888, 'learning_rate': 3.9747252143993265e-06, 'epoch': 0.32} 32%|███▏ | 3943/12313 [2:56:45<6:13:56, 2.68s/it] 32%|███▏ | 3944/12313 [2:56:49<6:44:48, 2.90s/it] {'loss': 0.7537, 'grad_norm': 3.721523920711308, 'learning_rate': 3.9741941442942685e-06, 'epoch': 0.32} 32%|███▏ | 3944/12313 [2:56:49<6:44:48, 2.90s/it] 32%|███▏ | 3945/12313 [2:56:52<6:49:46, 2.94s/it] {'loss': 0.4823, 'grad_norm': 9.490205761589818, 'learning_rate': 3.973662972182777e-06, 'epoch': 0.32} 32%|███▏ | 3945/12313 [2:56:52<6:49:46, 2.94s/it] 32%|███▏ | 3946/12313 [2:56:54<6:33:20, 2.82s/it] {'loss': 0.6342, 'grad_norm': 5.92729992222592, 'learning_rate': 3.973131698101606e-06, 'epoch': 0.32} 32%|███▏ | 3946/12313 [2:56:54<6:33:20, 2.82s/it] 32%|███▏ | 3947/12313 [2:56:57<6:28:40, 2.79s/it] {'loss': 0.6895, 'grad_norm': 3.889265293501103, 'learning_rate': 3.97260032208752e-06, 'epoch': 0.32} 32%|███▏ | 3947/12313 [2:56:57<6:28:40, 2.79s/it] 32%|███▏ | 3948/12313 [2:57:00<6:27:47, 2.78s/it] {'loss': 0.5711, 'grad_norm': 5.718930345280851, 'learning_rate': 3.972068844177284e-06, 'epoch': 0.32} 32%|███▏ | 3948/12313 [2:57:00<6:27:47, 2.78s/it] 32%|███▏ | 3949/12313 [2:57:02<6:15:34, 2.69s/it] {'loss': 0.4434, 'grad_norm': 5.665338399507542, 'learning_rate': 3.971537264407674e-06, 'epoch': 0.32} 32%|███▏ | 3949/12313 [2:57:02<6:15:34, 2.69s/it] 32%|███▏ | 3950/12313 [2:57:05<6:12:56, 2.68s/it] {'loss': 0.5603, 'grad_norm': 5.195191497507738, 'learning_rate': 3.971005582815475e-06, 'epoch': 0.32} 32%|███▏ | 3950/12313 [2:57:05<6:12:56, 2.68s/it] 32%|███▏ | 3951/12313 [2:57:08<6:14:13, 2.69s/it] {'loss': 0.5229, 'grad_norm': 4.595813111552932, 'learning_rate': 3.970473799437475e-06, 'epoch': 0.32} 32%|███▏ | 3951/12313 [2:57:08<6:14:13, 2.69s/it] 32%|███▏ | 3952/12313 [2:57:10<6:09:40, 2.65s/it] {'loss': 0.5054, 'grad_norm': 7.011551829999244, 'learning_rate': 3.969941914310469e-06, 'epoch': 0.32} 32%|███▏ | 3952/12313 [2:57:10<6:09:40, 2.65s/it] 32%|███▏ | 3953/12313 [2:57:13<6:00:12, 2.59s/it] {'loss': 0.4704, 'grad_norm': 5.133064719413326, 'learning_rate': 3.969409927471263e-06, 'epoch': 0.32} 32%|███▏ | 3953/12313 [2:57:13<6:00:12, 2.59s/it] 32%|███▏ | 3954/12313 [2:57:15<6:01:09, 2.59s/it] {'loss': 0.525, 'grad_norm': 4.606505759316552, 'learning_rate': 3.968877838956667e-06, 'epoch': 0.32} 32%|███▏ | 3954/12313 [2:57:15<6:01:09, 2.59s/it] 32%|███▏ | 3955/12313 [2:57:18<6:10:29, 2.66s/it] {'loss': 0.4454, 'grad_norm': 5.298403703345239, 'learning_rate': 3.968345648803497e-06, 'epoch': 0.32} 32%|███▏ | 3955/12313 [2:57:18<6:10:29, 2.66s/it] 32%|███▏ | 3956/12313 [2:57:21<6:17:18, 2.71s/it] {'loss': 0.5577, 'grad_norm': 3.561393835187397, 'learning_rate': 3.96781335704858e-06, 'epoch': 0.32} 32%|███▏ | 3956/12313 [2:57:21<6:17:18, 2.71s/it] 32%|███▏ | 3957/12313 [2:57:24<6:14:09, 2.69s/it] {'loss': 0.4494, 'grad_norm': 4.307683957878129, 'learning_rate': 3.967280963728748e-06, 'epoch': 0.32} 32%|███▏ | 3957/12313 [2:57:24<6:14:09, 2.69s/it] 32%|███▏ | 3958/12313 [2:57:26<6:05:50, 2.63s/it] {'loss': 0.6197, 'grad_norm': 6.747635355041429, 'learning_rate': 3.966748468880838e-06, 'epoch': 0.32} 32%|███▏ | 3958/12313 [2:57:26<6:05:50, 2.63s/it] 32%|███▏ | 3959/12313 [2:57:29<6:02:06, 2.60s/it] {'loss': 0.5736, 'grad_norm': 7.367076215445246, 'learning_rate': 3.9662158725416964e-06, 'epoch': 0.32} 32%|███▏ | 3959/12313 [2:57:29<6:02:06, 2.60s/it] 32%|███▏ | 3960/12313 [2:57:31<6:04:31, 2.62s/it] {'loss': 0.6222, 'grad_norm': 4.986542323666267, 'learning_rate': 3.965683174748176e-06, 'epoch': 0.32} 32%|███▏ | 3960/12313 [2:57:31<6:04:31, 2.62s/it] 32%|███▏ | 3961/12313 [2:57:34<6:20:12, 2.73s/it] {'loss': 0.4032, 'grad_norm': 4.343391538341711, 'learning_rate': 3.965150375537137e-06, 'epoch': 0.32} 32%|███▏ | 3961/12313 [2:57:34<6:20:12, 2.73s/it] 32%|███▏ | 3962/12313 [2:57:37<6:15:21, 2.70s/it] {'loss': 0.5128, 'grad_norm': 10.317677958429377, 'learning_rate': 3.964617474945447e-06, 'epoch': 0.32} 32%|███▏ | 3962/12313 [2:57:37<6:15:21, 2.70s/it] 32%|███▏ | 3963/12313 [2:57:40<6:12:10, 2.67s/it] {'loss': 0.5081, 'grad_norm': 5.753574432660448, 'learning_rate': 3.9640844730099795e-06, 'epoch': 0.32} 32%|███▏ | 3963/12313 [2:57:40<6:12:10, 2.67s/it] 32%|███▏ | 3964/12313 [2:57:42<6:10:20, 2.66s/it] {'loss': 0.5913, 'grad_norm': 3.9575067307483045, 'learning_rate': 3.963551369767613e-06, 'epoch': 0.32} 32%|███▏ | 3964/12313 [2:57:42<6:10:20, 2.66s/it] 32%|███▏ | 3965/12313 [2:57:45<6:14:36, 2.69s/it] {'loss': 0.5454, 'grad_norm': 4.7333217124955365, 'learning_rate': 3.963018165255239e-06, 'epoch': 0.32} 32%|███▏ | 3965/12313 [2:57:45<6:14:36, 2.69s/it] 32%|███▏ | 3966/12313 [2:57:48<6:14:49, 2.69s/it] {'loss': 0.4283, 'grad_norm': 4.107636292267094, 'learning_rate': 3.962484859509751e-06, 'epoch': 0.32} 32%|███▏ | 3966/12313 [2:57:48<6:14:49, 2.69s/it] 32%|███▏ | 3967/12313 [2:57:50<6:20:53, 2.74s/it] {'loss': 0.5421, 'grad_norm': 3.710054258246495, 'learning_rate': 3.96195145256805e-06, 'epoch': 0.32} 32%|███▏ | 3967/12313 [2:57:50<6:20:53, 2.74s/it] 32%|███▏ | 3968/12313 [2:57:53<6:05:33, 2.63s/it] {'loss': 0.624, 'grad_norm': 4.914093323605704, 'learning_rate': 3.961417944467046e-06, 'epoch': 0.32} 32%|███▏ | 3968/12313 [2:57:53<6:05:33, 2.63s/it] 32%|███▏ | 3969/12313 [2:57:55<6:03:29, 2.61s/it] {'loss': 0.557, 'grad_norm': 6.367334808510856, 'learning_rate': 3.960884335243655e-06, 'epoch': 0.32} 32%|███▏ | 3969/12313 [2:57:55<6:03:29, 2.61s/it] 32%|███▏ | 3970/12313 [2:57:58<5:59:41, 2.59s/it] {'loss': 0.7381, 'grad_norm': 7.4083421441872455, 'learning_rate': 3.9603506249348e-06, 'epoch': 0.32} 32%|███▏ | 3970/12313 [2:57:58<5:59:41, 2.59s/it] 32%|███▏ | 3971/12313 [2:58:01<6:00:35, 2.59s/it] {'loss': 0.4419, 'grad_norm': 5.365590371657556, 'learning_rate': 3.959816813577409e-06, 'epoch': 0.32} 32%|███▏ | 3971/12313 [2:58:01<6:00:35, 2.59s/it] 32%|███▏ | 3972/12313 [2:58:03<6:12:47, 2.68s/it] {'loss': 0.5859, 'grad_norm': 4.032690708340649, 'learning_rate': 3.959282901208422e-06, 'epoch': 0.32} 32%|███▏ | 3972/12313 [2:58:03<6:12:47, 2.68s/it] 32%|███▏ | 3973/12313 [2:58:06<6:10:37, 2.67s/it] {'loss': 0.5464, 'grad_norm': 5.285352909806383, 'learning_rate': 3.9587488878647816e-06, 'epoch': 0.32} 32%|███▏ | 3973/12313 [2:58:06<6:10:37, 2.67s/it] 32%|███▏ | 3974/12313 [2:58:09<6:10:22, 2.66s/it] {'loss': 0.5481, 'grad_norm': 6.70864234598179, 'learning_rate': 3.958214773583437e-06, 'epoch': 0.32} 32%|███▏ | 3974/12313 [2:58:09<6:10:22, 2.66s/it] 32%|███▏ | 3975/12313 [2:58:12<6:18:25, 2.72s/it] {'loss': 0.5129, 'grad_norm': 8.065418291262025, 'learning_rate': 3.957680558401348e-06, 'epoch': 0.32} 32%|███▏ | 3975/12313 [2:58:12<6:18:25, 2.72s/it] 32%|███▏ | 3976/12313 [2:58:15<6:27:04, 2.79s/it] {'loss': 0.5457, 'grad_norm': 5.8858306697747755, 'learning_rate': 3.95714624235548e-06, 'epoch': 0.32} 32%|███▏ | 3976/12313 [2:58:15<6:27:04, 2.79s/it] 32%|███▏ | 3977/12313 [2:58:17<6:24:44, 2.77s/it] {'loss': 0.5355, 'grad_norm': 4.006111281727507, 'learning_rate': 3.956611825482803e-06, 'epoch': 0.32} 32%|███▏ | 3977/12313 [2:58:17<6:24:44, 2.77s/it] 32%|███▏ | 3978/12313 [2:58:20<6:14:36, 2.70s/it] {'loss': 0.4682, 'grad_norm': 5.150201779401238, 'learning_rate': 3.956077307820296e-06, 'epoch': 0.32} 32%|███▏ | 3978/12313 [2:58:20<6:14:36, 2.70s/it] 32%|███▏ | 3979/12313 [2:58:22<6:07:01, 2.64s/it] {'loss': 0.5036, 'grad_norm': 5.941131977925157, 'learning_rate': 3.955542689404948e-06, 'epoch': 0.32} 32%|███▏ | 3979/12313 [2:58:22<6:07:01, 2.64s/it] 32%|███▏ | 3980/12313 [2:58:25<6:24:44, 2.77s/it] {'loss': 0.7358, 'grad_norm': 5.8735315327446, 'learning_rate': 3.955007970273747e-06, 'epoch': 0.32} 32%|███▏ | 3980/12313 [2:58:25<6:24:44, 2.77s/it] 32%|███▏ | 3981/12313 [2:58:28<6:13:39, 2.69s/it] {'loss': 0.4277, 'grad_norm': 6.004080782992699, 'learning_rate': 3.954473150463696e-06, 'epoch': 0.32} 32%|███▏ | 3981/12313 [2:58:28<6:13:39, 2.69s/it] 32%|███▏ | 3982/12313 [2:58:30<6:08:36, 2.65s/it] {'loss': 0.6674, 'grad_norm': 3.969562702014225, 'learning_rate': 3.9539382300117995e-06, 'epoch': 0.32} 32%|███▏ | 3982/12313 [2:58:30<6:08:36, 2.65s/it] 32%|███▏ | 3983/12313 [2:58:33<6:01:56, 2.61s/it] {'loss': 0.5466, 'grad_norm': 5.552544120013278, 'learning_rate': 3.953403208955074e-06, 'epoch': 0.32} 32%|███▏ | 3983/12313 [2:58:33<6:01:56, 2.61s/it] 32%|███▏ | 3984/12313 [2:58:36<6:04:51, 2.63s/it] {'loss': 0.5557, 'grad_norm': 6.198037037595219, 'learning_rate': 3.952868087330537e-06, 'epoch': 0.32} 32%|███▏ | 3984/12313 [2:58:36<6:04:51, 2.63s/it] 32%|███▏ | 3985/12313 [2:58:38<6:06:58, 2.64s/it] {'loss': 0.622, 'grad_norm': 3.7865591645101278, 'learning_rate': 3.952332865175218e-06, 'epoch': 0.32} 32%|███▏ | 3985/12313 [2:58:38<6:06:58, 2.64s/it] 32%|███▏ | 3986/12313 [2:58:41<6:07:43, 2.65s/it] {'loss': 0.5681, 'grad_norm': 4.933123244556156, 'learning_rate': 3.951797542526151e-06, 'epoch': 0.32} 32%|███▏ | 3986/12313 [2:58:41<6:07:43, 2.65s/it] 32%|███▏ | 3987/12313 [2:58:44<6:21:33, 2.75s/it] {'loss': 0.4988, 'grad_norm': 5.397627483042807, 'learning_rate': 3.951262119420378e-06, 'epoch': 0.32} 32%|███▏ | 3987/12313 [2:58:44<6:21:33, 2.75s/it] 32%|███▏ | 3988/12313 [2:58:46<6:07:52, 2.65s/it] {'loss': 0.4111, 'grad_norm': 4.851399386478982, 'learning_rate': 3.950726595894947e-06, 'epoch': 0.32} 32%|███▏ | 3988/12313 [2:58:46<6:07:52, 2.65s/it] 32%|███▏ | 3989/12313 [2:58:49<6:04:38, 2.63s/it] {'loss': 0.6122, 'grad_norm': 3.1067739446054636, 'learning_rate': 3.950190971986913e-06, 'epoch': 0.32} 32%|███▏ | 3989/12313 [2:58:49<6:04:38, 2.63s/it] 32%|███▏ | 3990/12313 [2:58:51<5:59:58, 2.60s/it] {'loss': 0.5503, 'grad_norm': 5.367173504535532, 'learning_rate': 3.9496552477333396e-06, 'epoch': 0.32} 32%|███▏ | 3990/12313 [2:58:51<5:59:58, 2.60s/it] 32%|███▏ | 3991/12313 [2:58:54<6:03:27, 2.62s/it] {'loss': 0.4951, 'grad_norm': 6.546774201268415, 'learning_rate': 3.9491194231712945e-06, 'epoch': 0.32} 32%|███▏ | 3991/12313 [2:58:54<6:03:27, 2.62s/it] 32%|███▏ | 3992/12313 [2:58:57<6:11:26, 2.68s/it] {'loss': 0.4695, 'grad_norm': 3.3008359196847876, 'learning_rate': 3.948583498337854e-06, 'epoch': 0.32} 32%|███▏ | 3992/12313 [2:58:57<6:11:26, 2.68s/it] 32%|███▏ | 3993/12313 [2:59:00<6:06:47, 2.65s/it] {'loss': 0.6426, 'grad_norm': 6.2106989027955315, 'learning_rate': 3.9480474732701034e-06, 'epoch': 0.32} 32%|███▏ | 3993/12313 [2:59:00<6:06:47, 2.65s/it] 32%|███▏ | 3994/12313 [2:59:02<6:15:02, 2.70s/it] {'loss': 0.5088, 'grad_norm': 7.437097992706253, 'learning_rate': 3.9475113480051305e-06, 'epoch': 0.32} 32%|███▏ | 3994/12313 [2:59:02<6:15:02, 2.70s/it] 32%|███▏ | 3995/12313 [2:59:05<6:04:47, 2.63s/it] {'loss': 0.3348, 'grad_norm': 4.649897071407217, 'learning_rate': 3.9469751225800344e-06, 'epoch': 0.32} 32%|███▏ | 3995/12313 [2:59:05<6:04:47, 2.63s/it] 32%|███▏ | 3996/12313 [2:59:07<6:04:54, 2.63s/it] {'loss': 0.5809, 'grad_norm': 7.807412990752628, 'learning_rate': 3.946438797031916e-06, 'epoch': 0.32} 32%|███▏ | 3996/12313 [2:59:07<6:04:54, 2.63s/it] 32%|███▏ | 3997/12313 [2:59:10<6:03:52, 2.63s/it] {'loss': 0.4846, 'grad_norm': 10.891441494223873, 'learning_rate': 3.9459023713978895e-06, 'epoch': 0.32} 32%|███▏ | 3997/12313 [2:59:10<6:03:52, 2.63s/it] 32%|███▏ | 3998/12313 [2:59:13<6:05:05, 2.63s/it] {'loss': 0.6143, 'grad_norm': 8.904826186725101, 'learning_rate': 3.94536584571507e-06, 'epoch': 0.32} 32%|███▏ | 3998/12313 [2:59:13<6:05:05, 2.63s/it] 32%|███▏ | 3999/12313 [2:59:16<6:12:44, 2.69s/it] {'loss': 0.5203, 'grad_norm': 5.6954977350735785, 'learning_rate': 3.944829220020584e-06, 'epoch': 0.32} 32%|███▏ | 3999/12313 [2:59:16<6:12:44, 2.69s/it] 32%|███▏ | 4000/12313 [2:59:18<6:06:58, 2.65s/it] {'loss': 0.6325, 'grad_norm': 4.93125171789273, 'learning_rate': 3.944292494351563e-06, 'epoch': 0.32} 32%|███▏ | 4000/12313 [2:59:18<6:06:58, 2.65s/it] 32%|███▏ | 4001/12313 [2:59:21<6:10:27, 2.67s/it] {'loss': 0.5805, 'grad_norm': 4.965862139362356, 'learning_rate': 3.943755668745145e-06, 'epoch': 0.32} 32%|███▏ | 4001/12313 [2:59:21<6:10:27, 2.67s/it] 33%|███▎ | 4002/12313 [2:59:24<6:13:11, 2.69s/it] {'loss': 0.5562, 'grad_norm': 5.16787159248045, 'learning_rate': 3.943218743238476e-06, 'epoch': 0.33} 33%|███▎ | 4002/12313 [2:59:24<6:13:11, 2.69s/it] 33%|███▎ | 4003/12313 [2:59:27<6:34:31, 2.85s/it] {'loss': 0.5688, 'grad_norm': 7.5362717832466055, 'learning_rate': 3.942681717868707e-06, 'epoch': 0.33} 33%|███▎ | 4003/12313 [2:59:27<6:34:31, 2.85s/it] 33%|███▎ | 4004/12313 [2:59:29<6:24:57, 2.78s/it] {'loss': 0.5032, 'grad_norm': 4.981243847271942, 'learning_rate': 3.942144592673e-06, 'epoch': 0.33} 33%|███▎ | 4004/12313 [2:59:29<6:24:57, 2.78s/it] 33%|███▎ | 4005/12313 [2:59:32<6:22:23, 2.76s/it] {'loss': 0.6819, 'grad_norm': 5.040020101083127, 'learning_rate': 3.941607367688518e-06, 'epoch': 0.33} 33%|███▎ | 4005/12313 [2:59:32<6:22:23, 2.76s/it] 33%|███▎ | 4006/12313 [2:59:35<6:29:10, 2.81s/it] {'loss': 0.602, 'grad_norm': 5.368844738609336, 'learning_rate': 3.941070042952437e-06, 'epoch': 0.33} 33%|███▎ | 4006/12313 [2:59:35<6:29:10, 2.81s/it] 33%|███▎ | 4007/12313 [2:59:38<6:22:21, 2.76s/it] {'loss': 0.4266, 'grad_norm': 6.620173821128225, 'learning_rate': 3.940532618501935e-06, 'epoch': 0.33} 33%|███▎ | 4007/12313 [2:59:38<6:22:21, 2.76s/it] 33%|███▎ | 4008/12313 [2:59:40<6:22:35, 2.76s/it] {'loss': 0.6278, 'grad_norm': 3.763790781156802, 'learning_rate': 3.9399950943742e-06, 'epoch': 0.33} 33%|███▎ | 4008/12313 [2:59:40<6:22:35, 2.76s/it] 33%|███▎ | 4009/12313 [2:59:43<6:20:44, 2.75s/it] {'loss': 0.5235, 'grad_norm': 5.280317791344952, 'learning_rate': 3.939457470606426e-06, 'epoch': 0.33} 33%|███▎ | 4009/12313 [2:59:43<6:20:44, 2.75s/it] 33%|███▎ | 4010/12313 [2:59:46<6:13:35, 2.70s/it] {'loss': 0.498, 'grad_norm': 6.207070016608509, 'learning_rate': 3.938919747235812e-06, 'epoch': 0.33} 33%|███▎ | 4010/12313 [2:59:46<6:13:35, 2.70s/it] 33%|███▎ | 4011/12313 [2:59:48<6:11:30, 2.68s/it] {'loss': 0.6105, 'grad_norm': 3.9148661505895923, 'learning_rate': 3.938381924299568e-06, 'epoch': 0.33} 33%|███▎ | 4011/12313 [2:59:48<6:11:30, 2.68s/it] 33%|███▎ | 4012/12313 [2:59:51<6:08:14, 2.66s/it] {'loss': 0.4771, 'grad_norm': 11.982123261840117, 'learning_rate': 3.937844001834907e-06, 'epoch': 0.33} 33%|███▎ | 4012/12313 [2:59:51<6:08:14, 2.66s/it] 33%|███▎ | 4013/12313 [2:59:54<6:07:16, 2.66s/it] {'loss': 0.5762, 'grad_norm': 8.177076369022492, 'learning_rate': 3.93730597987905e-06, 'epoch': 0.33} 33%|███▎ | 4013/12313 [2:59:54<6:07:16, 2.66s/it] 33%|███▎ | 4014/12313 [2:59:56<6:07:48, 2.66s/it] {'loss': 0.4574, 'grad_norm': 4.802261707731817, 'learning_rate': 3.936767858469228e-06, 'epoch': 0.33} 33%|███▎ | 4014/12313 [2:59:56<6:07:48, 2.66s/it] 33%|███▎ | 4015/12313 [2:59:59<6:12:42, 2.69s/it] {'loss': 0.4855, 'grad_norm': 4.140558850278409, 'learning_rate': 3.936229637642672e-06, 'epoch': 0.33} 33%|███▎ | 4015/12313 [2:59:59<6:12:42, 2.69s/it] 33%|███▎ | 4016/12313 [3:00:02<6:16:43, 2.72s/it] {'loss': 0.6508, 'grad_norm': 3.5682197931038644, 'learning_rate': 3.935691317436628e-06, 'epoch': 0.33} 33%|███▎ | 4016/12313 [3:00:02<6:16:43, 2.72s/it] 33%|███▎ | 4017/12313 [3:00:05<6:13:29, 2.70s/it] {'loss': 0.6438, 'grad_norm': 4.124877825005949, 'learning_rate': 3.9351528978883425e-06, 'epoch': 0.33} 33%|███▎ | 4017/12313 [3:00:05<6:13:29, 2.70s/it] 33%|███▎ | 4018/12313 [3:00:07<6:02:25, 2.62s/it] {'loss': 0.5822, 'grad_norm': 5.830213074804944, 'learning_rate': 3.934614379035071e-06, 'epoch': 0.33} 33%|███▎ | 4018/12313 [3:00:07<6:02:25, 2.62s/it] 33%|███▎ | 4019/12313 [3:00:10<6:03:32, 2.63s/it] {'loss': 0.5308, 'grad_norm': 8.315171390345546, 'learning_rate': 3.9340757609140785e-06, 'epoch': 0.33} 33%|███▎ | 4019/12313 [3:00:10<6:03:32, 2.63s/it] 33%|███▎ | 4020/12313 [3:00:12<6:06:48, 2.65s/it] {'loss': 0.4855, 'grad_norm': 3.4244913994169313, 'learning_rate': 3.933537043562632e-06, 'epoch': 0.33} 33%|███▎ | 4020/12313 [3:00:12<6:06:48, 2.65s/it] 33%|███▎ | 4021/12313 [3:00:15<6:22:44, 2.77s/it] {'loss': 0.6688, 'grad_norm': 4.156835180938438, 'learning_rate': 3.932998227018009e-06, 'epoch': 0.33} 33%|███▎ | 4021/12313 [3:00:15<6:22:44, 2.77s/it] 33%|███▎ | 4022/12313 [3:00:18<6:18:45, 2.74s/it] {'loss': 0.5877, 'grad_norm': 3.4822522397229956, 'learning_rate': 3.932459311317494e-06, 'epoch': 0.33} 33%|███▎ | 4022/12313 [3:00:18<6:18:45, 2.74s/it] 33%|███▎ | 4023/12313 [3:00:21<6:12:37, 2.70s/it] {'loss': 0.5496, 'grad_norm': 4.707979946257839, 'learning_rate': 3.931920296498374e-06, 'epoch': 0.33} 33%|███▎ | 4023/12313 [3:00:21<6:12:37, 2.70s/it] 33%|███▎ | 4024/12313 [3:00:23<6:11:06, 2.69s/it] {'loss': 0.4053, 'grad_norm': 5.369501192774783, 'learning_rate': 3.931381182597949e-06, 'epoch': 0.33} 33%|███▎ | 4024/12313 [3:00:23<6:11:06, 2.69s/it] 33%|███▎ | 4025/12313 [3:00:26<6:07:26, 2.66s/it] {'loss': 0.5032, 'grad_norm': 7.0165291185775, 'learning_rate': 3.930841969653521e-06, 'epoch': 0.33} 33%|███▎ | 4025/12313 [3:00:26<6:07:26, 2.66s/it] 33%|███▎ | 4026/12313 [3:00:29<6:10:47, 2.68s/it] {'loss': 0.5277, 'grad_norm': 3.7285656291064844, 'learning_rate': 3.930302657702402e-06, 'epoch': 0.33} 33%|███▎ | 4026/12313 [3:00:29<6:10:47, 2.68s/it] 33%|███▎ | 4027/12313 [3:00:31<6:11:46, 2.69s/it] {'loss': 0.5248, 'grad_norm': 4.479547869133663, 'learning_rate': 3.929763246781909e-06, 'epoch': 0.33} 33%|███▎ | 4027/12313 [3:00:31<6:11:46, 2.69s/it] 33%|███▎ | 4028/12313 [3:00:34<6:08:17, 2.67s/it] {'loss': 0.4248, 'grad_norm': 5.115409985178063, 'learning_rate': 3.929223736929366e-06, 'epoch': 0.33} 33%|███▎ | 4028/12313 [3:00:34<6:08:17, 2.67s/it] 33%|███▎ | 4029/12313 [3:00:37<6:12:59, 2.70s/it] {'loss': 0.5976, 'grad_norm': 6.753042464948224, 'learning_rate': 3.928684128182104e-06, 'epoch': 0.33} 33%|███▎ | 4029/12313 [3:00:37<6:12:59, 2.70s/it] 33%|███▎ | 4030/12313 [3:00:39<6:12:12, 2.70s/it] {'loss': 0.5214, 'grad_norm': 6.503506149392067, 'learning_rate': 3.9281444205774625e-06, 'epoch': 0.33} 33%|███▎ | 4030/12313 [3:00:39<6:12:12, 2.70s/it] 33%|███▎ | 4031/12313 [3:00:42<6:10:56, 2.69s/it] {'loss': 0.6028, 'grad_norm': 3.8958786789276787, 'learning_rate': 3.927604614152784e-06, 'epoch': 0.33} 33%|███▎ | 4031/12313 [3:00:42<6:10:56, 2.69s/it] 33%|███▎ | 4032/12313 [3:00:45<6:11:22, 2.69s/it] {'loss': 0.5836, 'grad_norm': 4.602684285068746, 'learning_rate': 3.927064708945423e-06, 'epoch': 0.33} 33%|███▎ | 4032/12313 [3:00:45<6:11:22, 2.69s/it] 33%|███▎ | 4033/12313 [3:00:47<6:07:14, 2.66s/it] {'loss': 0.4976, 'grad_norm': 5.4356792062574515, 'learning_rate': 3.926524704992736e-06, 'epoch': 0.33} 33%|███▎ | 4033/12313 [3:00:47<6:07:14, 2.66s/it] 33%|███▎ | 4034/12313 [3:00:50<5:56:31, 2.58s/it] {'loss': 0.6328, 'grad_norm': 5.1282337398644, 'learning_rate': 3.9259846023320895e-06, 'epoch': 0.33} 33%|███▎ | 4034/12313 [3:00:50<5:56:31, 2.58s/it] 33%|███▎ | 4035/12313 [3:00:52<6:00:51, 2.62s/it] {'loss': 0.5181, 'grad_norm': 5.574289088333828, 'learning_rate': 3.925444401000855e-06, 'epoch': 0.33} 33%|███▎ | 4035/12313 [3:00:52<6:00:51, 2.62s/it] 33%|███▎ | 4036/12313 [3:00:55<5:55:13, 2.58s/it] {'loss': 0.8373, 'grad_norm': 4.638010244640014, 'learning_rate': 3.924904101036413e-06, 'epoch': 0.33} 33%|███▎ | 4036/12313 [3:00:55<5:55:13, 2.58s/it] 33%|███▎ | 4037/12313 [3:00:58<6:00:03, 2.61s/it] {'loss': 0.5468, 'grad_norm': 3.1082588154104225, 'learning_rate': 3.924363702476147e-06, 'epoch': 0.33} 33%|███▎ | 4037/12313 [3:00:58<6:00:03, 2.61s/it] 33%|███▎ | 4038/12313 [3:01:00<6:03:20, 2.63s/it] {'loss': 0.555, 'grad_norm': 3.8088490874389427, 'learning_rate': 3.923823205357453e-06, 'epoch': 0.33} 33%|███▎ | 4038/12313 [3:01:00<6:03:20, 2.63s/it] 33%|███▎ | 4039/12313 [3:01:03<6:06:02, 2.65s/it] {'loss': 0.4554, 'grad_norm': 3.64966498327372, 'learning_rate': 3.923282609717727e-06, 'epoch': 0.33} 33%|███▎ | 4039/12313 [3:01:03<6:06:02, 2.65s/it] 33%|███▎ | 4040/12313 [3:01:06<6:08:51, 2.68s/it] {'loss': 0.5311, 'grad_norm': 5.086174759269899, 'learning_rate': 3.922741915594378e-06, 'epoch': 0.33} 33%|███▎ | 4040/12313 [3:01:06<6:08:51, 2.68s/it] 33%|███▎ | 4041/12313 [3:01:08<6:04:57, 2.65s/it] {'loss': 0.6889, 'grad_norm': 3.596353017219979, 'learning_rate': 3.9222011230248175e-06, 'epoch': 0.33} 33%|███▎ | 4041/12313 [3:01:08<6:04:57, 2.65s/it] 33%|███▎ | 4042/12313 [3:01:11<6:08:09, 2.67s/it] {'loss': 0.6185, 'grad_norm': 4.391572461946081, 'learning_rate': 3.9216602320464655e-06, 'epoch': 0.33} 33%|███▎ | 4042/12313 [3:01:11<6:08:09, 2.67s/it] 33%|███▎ | 4043/12313 [3:01:14<6:30:02, 2.83s/it] {'loss': 0.5438, 'grad_norm': 3.615775625258503, 'learning_rate': 3.921119242696751e-06, 'epoch': 0.33} 33%|███▎ | 4043/12313 [3:01:14<6:30:02, 2.83s/it] 33%|███▎ | 4044/12313 [3:01:17<6:17:11, 2.74s/it] {'loss': 0.6468, 'grad_norm': 5.820794815401509, 'learning_rate': 3.920578155013106e-06, 'epoch': 0.33} 33%|███▎ | 4044/12313 [3:01:17<6:17:11, 2.74s/it] 33%|███▎ | 4045/12313 [3:01:19<6:15:33, 2.73s/it] {'loss': 0.4855, 'grad_norm': 5.467081465670503, 'learning_rate': 3.92003696903297e-06, 'epoch': 0.33} 33%|███▎ | 4045/12313 [3:01:19<6:15:33, 2.73s/it] 33%|███▎ | 4046/12313 [3:01:22<6:14:00, 2.71s/it] {'loss': 0.6468, 'grad_norm': 5.561814863652521, 'learning_rate': 3.919495684793792e-06, 'epoch': 0.33} 33%|███▎ | 4046/12313 [3:01:22<6:14:00, 2.71s/it] 33%|███▎ | 4047/12313 [3:01:25<6:17:12, 2.74s/it] {'loss': 0.5918, 'grad_norm': 3.807058673477291, 'learning_rate': 3.918954302333025e-06, 'epoch': 0.33} 33%|███▎ | 4047/12313 [3:01:25<6:17:12, 2.74s/it] 33%|███▎ | 4048/12313 [3:01:28<6:11:12, 2.69s/it] {'loss': 0.6432, 'grad_norm': 13.31883734675245, 'learning_rate': 3.91841282168813e-06, 'epoch': 0.33} 33%|███▎ | 4048/12313 [3:01:28<6:11:12, 2.69s/it] 33%|███▎ | 4049/12313 [3:01:30<6:12:09, 2.70s/it] {'loss': 0.4209, 'grad_norm': 7.056921038692695, 'learning_rate': 3.917871242896575e-06, 'epoch': 0.33} 33%|███▎ | 4049/12313 [3:01:30<6:12:09, 2.70s/it] 33%|███▎ | 4050/12313 [3:01:33<6:04:11, 2.64s/it] {'loss': 0.5187, 'grad_norm': 3.9195936437606025, 'learning_rate': 3.917329565995833e-06, 'epoch': 0.33} 33%|███▎ | 4050/12313 [3:01:33<6:04:11, 2.64s/it] 33%|███▎ | 4051/12313 [3:01:35<6:00:29, 2.62s/it] {'loss': 0.5355, 'grad_norm': 4.550380167725966, 'learning_rate': 3.916787791023386e-06, 'epoch': 0.33} 33%|███▎ | 4051/12313 [3:01:35<6:00:29, 2.62s/it] 33%|███▎ | 4052/12313 [3:01:38<6:01:27, 2.63s/it] {'loss': 0.5518, 'grad_norm': 3.6800755148359148, 'learning_rate': 3.916245918016724e-06, 'epoch': 0.33} 33%|███▎ | 4052/12313 [3:01:38<6:01:27, 2.63s/it] 33%|███▎ | 4053/12313 [3:01:40<5:51:34, 2.55s/it] {'loss': 0.7117, 'grad_norm': 3.478748983074201, 'learning_rate': 3.915703947013338e-06, 'epoch': 0.33} 33%|███▎ | 4053/12313 [3:01:40<5:51:34, 2.55s/it] 33%|███▎ | 4054/12313 [3:01:43<6:02:38, 2.63s/it] {'loss': 0.5708, 'grad_norm': 3.7123489856745957, 'learning_rate': 3.9151618780507316e-06, 'epoch': 0.33} 33%|███▎ | 4054/12313 [3:01:43<6:02:38, 2.63s/it] 33%|███▎ | 4055/12313 [3:01:46<6:04:17, 2.65s/it] {'loss': 0.5249, 'grad_norm': 5.643512507779105, 'learning_rate': 3.914619711166413e-06, 'epoch': 0.33} 33%|███▎ | 4055/12313 [3:01:46<6:04:17, 2.65s/it] 33%|███▎ | 4056/12313 [3:01:49<6:14:07, 2.72s/it] {'loss': 0.493, 'grad_norm': 3.920365869559619, 'learning_rate': 3.914077446397897e-06, 'epoch': 0.33} 33%|███▎ | 4056/12313 [3:01:49<6:14:07, 2.72s/it] 33%|███▎ | 4057/12313 [3:01:52<6:15:17, 2.73s/it] {'loss': 0.4857, 'grad_norm': 5.253760200255425, 'learning_rate': 3.913535083782707e-06, 'epoch': 0.33} 33%|███▎ | 4057/12313 [3:01:52<6:15:17, 2.73s/it] 33%|███▎ | 4058/12313 [3:01:54<6:08:42, 2.68s/it] {'loss': 0.5694, 'grad_norm': 5.037318300624211, 'learning_rate': 3.912992623358368e-06, 'epoch': 0.33} 33%|███▎ | 4058/12313 [3:01:54<6:08:42, 2.68s/it] 33%|███▎ | 4059/12313 [3:01:57<6:07:18, 2.67s/it] {'loss': 0.407, 'grad_norm': 16.068629096649826, 'learning_rate': 3.91245006516242e-06, 'epoch': 0.33} 33%|███▎ | 4059/12313 [3:01:57<6:07:18, 2.67s/it] 33%|███▎ | 4060/12313 [3:01:59<6:07:33, 2.67s/it] {'loss': 0.5709, 'grad_norm': 8.438998004731252, 'learning_rate': 3.911907409232402e-06, 'epoch': 0.33} 33%|███▎ | 4060/12313 [3:01:59<6:07:33, 2.67s/it] 33%|███▎ | 4061/12313 [3:02:02<6:17:04, 2.74s/it] {'loss': 0.5698, 'grad_norm': 4.031413575021964, 'learning_rate': 3.911364655605863e-06, 'epoch': 0.33} 33%|███▎ | 4061/12313 [3:02:02<6:17:04, 2.74s/it] 33%|███▎ | 4062/12313 [3:02:05<6:10:28, 2.69s/it] {'loss': 0.5103, 'grad_norm': 6.129803113623354, 'learning_rate': 3.9108218043203595e-06, 'epoch': 0.33} 33%|███▎ | 4062/12313 [3:02:05<6:10:28, 2.69s/it] 33%|███▎ | 4063/12313 [3:02:08<6:13:01, 2.71s/it] {'loss': 0.6426, 'grad_norm': 5.08608452932562, 'learning_rate': 3.910278855413454e-06, 'epoch': 0.33} 33%|███▎ | 4063/12313 [3:02:08<6:13:01, 2.71s/it] 33%|███▎ | 4064/12313 [3:02:10<6:17:13, 2.74s/it] {'loss': 0.5326, 'grad_norm': 4.487322847861987, 'learning_rate': 3.909735808922716e-06, 'epoch': 0.33} 33%|███▎ | 4064/12313 [3:02:10<6:17:13, 2.74s/it] 33%|███▎ | 4065/12313 [3:02:13<6:18:15, 2.75s/it] {'loss': 0.481, 'grad_norm': 5.951925940505768, 'learning_rate': 3.90919266488572e-06, 'epoch': 0.33} 33%|███▎ | 4065/12313 [3:02:13<6:18:15, 2.75s/it] 33%|███▎ | 4066/12313 [3:02:16<6:13:28, 2.72s/it] {'loss': 0.5089, 'grad_norm': 5.479126876701789, 'learning_rate': 3.908649423340049e-06, 'epoch': 0.33} 33%|███▎ | 4066/12313 [3:02:16<6:13:28, 2.72s/it] 33%|███▎ | 4067/12313 [3:02:18<6:03:34, 2.65s/it] {'loss': 0.3857, 'grad_norm': 4.922261403979778, 'learning_rate': 3.908106084323295e-06, 'epoch': 0.33} 33%|███▎ | 4067/12313 [3:02:18<6:03:34, 2.65s/it] 33%|███▎ | 4068/12313 [3:02:21<5:57:39, 2.60s/it] {'loss': 0.6416, 'grad_norm': 5.574237760697828, 'learning_rate': 3.9075626478730515e-06, 'epoch': 0.33} 33%|███▎ | 4068/12313 [3:02:21<5:57:39, 2.60s/it] 33%|███▎ | 4069/12313 [3:02:24<6:01:10, 2.63s/it] {'loss': 0.5985, 'grad_norm': 3.790844587172487, 'learning_rate': 3.907019114026922e-06, 'epoch': 0.33} 33%|███▎ | 4069/12313 [3:02:24<6:01:10, 2.63s/it] 33%|███▎ | 4070/12313 [3:02:26<6:04:38, 2.65s/it] {'loss': 0.5749, 'grad_norm': 6.112434755772752, 'learning_rate': 3.906475482822517e-06, 'epoch': 0.33} 33%|███▎ | 4070/12313 [3:02:26<6:04:38, 2.65s/it] 33%|███▎ | 4071/12313 [3:02:29<6:03:04, 2.64s/it] {'loss': 0.5349, 'grad_norm': 3.6083314382705836, 'learning_rate': 3.905931754297451e-06, 'epoch': 0.33} 33%|███▎ | 4071/12313 [3:02:29<6:03:04, 2.64s/it] 33%|███▎ | 4072/12313 [3:02:32<6:10:41, 2.70s/it] {'loss': 1.0363, 'grad_norm': 7.284570880294314, 'learning_rate': 3.905387928489349e-06, 'epoch': 0.33} 33%|███▎ | 4072/12313 [3:02:32<6:10:41, 2.70s/it] 33%|███▎ | 4073/12313 [3:02:35<6:22:22, 2.78s/it] {'loss': 0.5585, 'grad_norm': 4.232160296762864, 'learning_rate': 3.904844005435841e-06, 'epoch': 0.33} 33%|███▎ | 4073/12313 [3:02:35<6:22:22, 2.78s/it] 33%|███▎ | 4074/12313 [3:02:37<6:14:29, 2.73s/it] {'loss': 0.6862, 'grad_norm': 6.814310707158021, 'learning_rate': 3.904299985174562e-06, 'epoch': 0.33} 33%|███▎ | 4074/12313 [3:02:37<6:14:29, 2.73s/it] 33%|███▎ | 4075/12313 [3:02:40<6:12:33, 2.71s/it] {'loss': 0.5983, 'grad_norm': 4.171877579913468, 'learning_rate': 3.903755867743156e-06, 'epoch': 0.33} 33%|███▎ | 4075/12313 [3:02:40<6:12:33, 2.71s/it] 33%|███▎ | 4076/12313 [3:02:43<6:24:02, 2.80s/it] {'loss': 0.5303, 'grad_norm': 7.101609370383942, 'learning_rate': 3.9032116531792745e-06, 'epoch': 0.33} 33%|███▎ | 4076/12313 [3:02:43<6:24:02, 2.80s/it] 33%|███▎ | 4077/12313 [3:02:46<6:19:53, 2.77s/it] {'loss': 0.5949, 'grad_norm': 5.802670937749591, 'learning_rate': 3.902667341520572e-06, 'epoch': 0.33} 33%|███▎ | 4077/12313 [3:02:46<6:19:53, 2.77s/it] 33%|███▎ | 4078/12313 [3:02:49<6:23:15, 2.79s/it] {'loss': 0.5337, 'grad_norm': 4.414654939089757, 'learning_rate': 3.902122932804713e-06, 'epoch': 0.33} 33%|███▎ | 4078/12313 [3:02:49<6:23:15, 2.79s/it] 33%|███▎ | 4079/12313 [3:02:51<6:10:42, 2.70s/it] {'loss': 0.5897, 'grad_norm': 5.085650564698827, 'learning_rate': 3.901578427069368e-06, 'epoch': 0.33} 33%|███▎ | 4079/12313 [3:02:51<6:10:42, 2.70s/it] 33%|███▎ | 4080/12313 [3:02:54<6:06:45, 2.67s/it] {'loss': 0.5486, 'grad_norm': 3.7722391508616036, 'learning_rate': 3.901033824352213e-06, 'epoch': 0.33} 33%|███▎ | 4080/12313 [3:02:54<6:06:45, 2.67s/it] 33%|███▎ | 4081/12313 [3:02:56<6:08:52, 2.69s/it] {'loss': 0.5186, 'grad_norm': 6.237526026700791, 'learning_rate': 3.9004891246909325e-06, 'epoch': 0.33} 33%|███▎ | 4081/12313 [3:02:56<6:08:52, 2.69s/it] 33%|███▎ | 4082/12313 [3:02:59<6:13:40, 2.72s/it] {'loss': 0.727, 'grad_norm': 4.2314785310516365, 'learning_rate': 3.8999443281232175e-06, 'epoch': 0.33} 33%|███▎ | 4082/12313 [3:02:59<6:13:40, 2.72s/it] 33%|███▎ | 4083/12313 [3:03:02<6:12:14, 2.71s/it] {'loss': 0.4363, 'grad_norm': 4.575453746497402, 'learning_rate': 3.899399434686762e-06, 'epoch': 0.33} 33%|███▎ | 4083/12313 [3:03:02<6:12:14, 2.71s/it] 33%|███▎ | 4084/12313 [3:03:04<6:02:51, 2.65s/it] {'loss': 0.4231, 'grad_norm': 5.329498253898677, 'learning_rate': 3.898854444419274e-06, 'epoch': 0.33} 33%|███▎ | 4084/12313 [3:03:04<6:02:51, 2.65s/it] 33%|███▎ | 4085/12313 [3:03:07<6:05:49, 2.67s/it] {'loss': 0.4597, 'grad_norm': 7.175459151315008, 'learning_rate': 3.8983093573584605e-06, 'epoch': 0.33} 33%|███▎ | 4085/12313 [3:03:07<6:05:49, 2.67s/it] 33%|███▎ | 4086/12313 [3:03:10<6:02:00, 2.64s/it] {'loss': 0.4702, 'grad_norm': 4.3131878789803855, 'learning_rate': 3.89776417354204e-06, 'epoch': 0.33} 33%|███▎ | 4086/12313 [3:03:10<6:02:00, 2.64s/it] 33%|███▎ | 4087/12313 [3:03:12<6:06:01, 2.67s/it] {'loss': 0.5382, 'grad_norm': 4.921638293579504, 'learning_rate': 3.897218893007737e-06, 'epoch': 0.33} 33%|███▎ | 4087/12313 [3:03:12<6:06:01, 2.67s/it] 33%|███▎ | 4088/12313 [3:03:15<6:04:09, 2.66s/it] {'loss': 0.5162, 'grad_norm': 5.520197777335041, 'learning_rate': 3.896673515793281e-06, 'epoch': 0.33} 33%|███▎ | 4088/12313 [3:03:15<6:04:09, 2.66s/it] 33%|███▎ | 4089/12313 [3:03:18<6:07:15, 2.68s/it] {'loss': 0.4928, 'grad_norm': 4.2991891020242265, 'learning_rate': 3.89612804193641e-06, 'epoch': 0.33} 33%|███▎ | 4089/12313 [3:03:18<6:07:15, 2.68s/it] 33%|███▎ | 4090/12313 [3:03:21<6:14:04, 2.73s/it] {'loss': 0.5771, 'grad_norm': 5.160012794460195, 'learning_rate': 3.895582471474866e-06, 'epoch': 0.33} 33%|███▎ | 4090/12313 [3:03:21<6:14:04, 2.73s/it] 33%|███▎ | 4091/12313 [3:03:23<6:04:38, 2.66s/it] {'loss': 0.4006, 'grad_norm': 4.051259036428905, 'learning_rate': 3.895036804446402e-06, 'epoch': 0.33} 33%|███▎ | 4091/12313 [3:03:23<6:04:38, 2.66s/it] 33%|███▎ | 4092/12313 [3:03:26<6:06:37, 2.68s/it] {'loss': 0.7044, 'grad_norm': 8.065546594512844, 'learning_rate': 3.894491040888774e-06, 'epoch': 0.33} 33%|███▎ | 4092/12313 [3:03:26<6:06:37, 2.68s/it] 33%|███▎ | 4093/12313 [3:03:28<6:04:31, 2.66s/it] {'loss': 0.6109, 'grad_norm': 5.703512084951918, 'learning_rate': 3.893945180839747e-06, 'epoch': 0.33} 33%|███▎ | 4093/12313 [3:03:28<6:04:31, 2.66s/it] 33%|███▎ | 4094/12313 [3:03:31<6:14:02, 2.73s/it] {'loss': 0.5347, 'grad_norm': 6.178693238141215, 'learning_rate': 3.893399224337089e-06, 'epoch': 0.33} 33%|███▎ | 4094/12313 [3:03:31<6:14:02, 2.73s/it] 33%|███▎ | 4095/12313 [3:03:34<6:17:22, 2.76s/it] {'loss': 0.7827, 'grad_norm': 3.9476533225077293, 'learning_rate': 3.892853171418581e-06, 'epoch': 0.33} 33%|███▎ | 4095/12313 [3:03:34<6:17:22, 2.76s/it] 33%|███▎ | 4096/12313 [3:03:37<6:04:58, 2.67s/it] {'loss': 0.5795, 'grad_norm': 6.5472391710424755, 'learning_rate': 3.8923070221220035e-06, 'epoch': 0.33} 33%|███▎ | 4096/12313 [3:03:37<6:04:58, 2.67s/it] 33%|███▎ | 4097/12313 [3:03:39<5:55:26, 2.60s/it] {'loss': 0.4096, 'grad_norm': 6.934804355699152, 'learning_rate': 3.891760776485151e-06, 'epoch': 0.33} 33%|███▎ | 4097/12313 [3:03:39<5:55:26, 2.60s/it] 33%|███▎ | 4098/12313 [3:03:42<6:11:06, 2.71s/it] {'loss': 0.5433, 'grad_norm': 4.704286066411406, 'learning_rate': 3.891214434545817e-06, 'epoch': 0.33} 33%|███▎ | 4098/12313 [3:03:42<6:11:06, 2.71s/it] 33%|███▎ | 4099/12313 [3:03:44<6:03:01, 2.65s/it] {'loss': 0.4911, 'grad_norm': 6.41984014774679, 'learning_rate': 3.890667996341806e-06, 'epoch': 0.33} 33%|███▎ | 4099/12313 [3:03:44<6:03:01, 2.65s/it] 33%|███▎ | 4100/12313 [3:03:47<6:16:04, 2.75s/it] {'loss': 0.6617, 'grad_norm': 5.021861875613623, 'learning_rate': 3.8901214619109315e-06, 'epoch': 0.33} 33%|███▎ | 4100/12313 [3:03:47<6:16:04, 2.75s/it] 33%|███▎ | 4101/12313 [3:03:50<6:17:36, 2.76s/it] {'loss': 0.5158, 'grad_norm': 5.862846079718947, 'learning_rate': 3.889574831291008e-06, 'epoch': 0.33} 33%|███▎ | 4101/12313 [3:03:50<6:17:36, 2.76s/it]/usr/local/lib/python3.9/dist-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants. warnings.warn( 33%|███▎ | 4102/12313 [3:04:28<30:09:46, 13.22s/it] {'loss': 0.6377, 'grad_norm': 3.942541898863967, 'learning_rate': 3.88902810451986e-06, 'epoch': 0.33} 33%|███▎ | 4102/12313 [3:04:28<30:09:46, 13.22s/it] 33%|███▎ | 4103/12313 [3:04:30<22:52:15, 10.03s/it] {'loss': 0.516, 'grad_norm': 4.915257424154396, 'learning_rate': 3.88848128163532e-06, 'epoch': 0.33} 33%|███▎ | 4103/12313 [3:04:30<22:52:15, 10.03s/it] 33%|███▎ | 4104/12313 [3:04:33<17:56:41, 7.87s/it] {'loss': 0.427, 'grad_norm': 4.848991134824205, 'learning_rate': 3.887934362675223e-06, 'epoch': 0.33} 33%|███▎ | 4104/12313 [3:04:33<17:56:41, 7.87s/it] 33%|███▎ | 4105/12313 [3:04:36<14:34:52, 6.40s/it] {'loss': 0.4525, 'grad_norm': 5.880446166851823, 'learning_rate': 3.887387347677413e-06, 'epoch': 0.33} 33%|███▎ | 4105/12313 [3:04:36<14:34:52, 6.40s/it] 33%|███▎ | 4106/12313 [3:04:39<12:02:38, 5.28s/it] {'loss': 0.4618, 'grad_norm': 7.1524682868349405, 'learning_rate': 3.886840236679742e-06, 'epoch': 0.33} 33%|███▎ | 4106/12313 [3:04:39<12:02:38, 5.28s/it] 33%|███▎ | 4107/12313 [3:04:42<10:14:02, 4.49s/it] {'loss': 0.4553, 'grad_norm': 4.096340585575771, 'learning_rate': 3.8862930297200665e-06, 'epoch': 0.33} 33%|███▎ | 4107/12313 [3:04:42<10:14:02, 4.49s/it] 33%|███▎ | 4108/12313 [3:04:44<8:55:09, 3.91s/it] {'loss': 0.5802, 'grad_norm': 3.963306450481558, 'learning_rate': 3.885745726836249e-06, 'epoch': 0.33} 33%|███▎ | 4108/12313 [3:04:44<8:55:09, 3.91s/it] 33%|███▎ | 4109/12313 [3:04:47<7:58:28, 3.50s/it] {'loss': 0.4519, 'grad_norm': 7.460142742604042, 'learning_rate': 3.885198328066163e-06, 'epoch': 0.33} 33%|███▎ | 4109/12313 [3:04:47<7:58:28, 3.50s/it] 33%|███▎ | 4110/12313 [3:04:49<7:23:07, 3.24s/it] {'loss': 0.5484, 'grad_norm': 4.260196594904515, 'learning_rate': 3.8846508334476824e-06, 'epoch': 0.33} 33%|███▎ | 4110/12313 [3:04:49<7:23:07, 3.24s/it] 33%|███▎ | 4111/12313 [3:04:52<7:14:19, 3.18s/it] {'loss': 0.5141, 'grad_norm': 3.7783084323283074, 'learning_rate': 3.884103243018693e-06, 'epoch': 0.33} 33%|███▎ | 4111/12313 [3:04:52<7:14:19, 3.18s/it] 33%|███▎ | 4112/12313 [3:04:55<7:03:18, 3.10s/it] {'loss': 0.6465, 'grad_norm': 5.894477787381174, 'learning_rate': 3.883555556817083e-06, 'epoch': 0.33} 33%|███▎ | 4112/12313 [3:04:55<7:03:18, 3.10s/it] 33%|███▎ | 4113/12313 [3:04:58<6:37:59, 2.91s/it] {'loss': 0.4949, 'grad_norm': 3.0063514235175917, 'learning_rate': 3.883007774880753e-06, 'epoch': 0.33} 33%|███▎ | 4113/12313 [3:04:58<6:37:59, 2.91s/it] 33%|███▎ | 4114/12313 [3:05:00<6:25:17, 2.82s/it] {'loss': 0.3705, 'grad_norm': 5.033096732224493, 'learning_rate': 3.882459897247603e-06, 'epoch': 0.33} 33%|███▎ | 4114/12313 [3:05:00<6:25:17, 2.82s/it] 33%|███▎ | 4115/12313 [3:05:03<6:09:06, 2.70s/it] {'loss': 0.4903, 'grad_norm': 5.991095754571292, 'learning_rate': 3.881911923955545e-06, 'epoch': 0.33} 33%|███▎ | 4115/12313 [3:05:03<6:09:06, 2.70s/it] 33%|███▎ | 4116/12313 [3:05:06<6:11:53, 2.72s/it] {'loss': 0.5439, 'grad_norm': 6.237218731747614, 'learning_rate': 3.881363855042496e-06, 'epoch': 0.33} 33%|███▎ | 4116/12313 [3:05:06<6:11:53, 2.72s/it] 33%|███▎ | 4117/12313 [3:05:08<6:10:23, 2.71s/it] {'loss': 0.4756, 'grad_norm': 4.522440392622378, 'learning_rate': 3.880815690546378e-06, 'epoch': 0.33} 33%|███▎ | 4117/12313 [3:05:08<6:10:23, 2.71s/it] 33%|███▎ | 4118/12313 [3:05:11<6:08:27, 2.70s/it] {'loss': 0.5839, 'grad_norm': 12.342772678402822, 'learning_rate': 3.880267430505123e-06, 'epoch': 0.33} 33%|███▎ | 4118/12313 [3:05:11<6:08:27, 2.70s/it] 33%|███▎ | 4119/12313 [3:05:14<6:27:41, 2.84s/it] {'loss': 0.5127, 'grad_norm': 4.148599127985545, 'learning_rate': 3.879719074956667e-06, 'epoch': 0.33} 33%|███▎ | 4119/12313 [3:05:14<6:27:41, 2.84s/it] 33%|███▎ | 4120/12313 [3:05:18<6:57:41, 3.06s/it] {'loss': 0.6514, 'grad_norm': 4.58831638518393, 'learning_rate': 3.879170623938951e-06, 'epoch': 0.33} 33%|███▎ | 4120/12313 [3:05:18<6:57:41, 3.06s/it] 33%|███▎ | 4121/12313 [3:05:20<6:42:10, 2.95s/it] {'loss': 0.5412, 'grad_norm': 4.795703097401847, 'learning_rate': 3.878622077489929e-06, 'epoch': 0.33} 33%|███▎ | 4121/12313 [3:05:20<6:42:10, 2.95s/it] 33%|███▎ | 4122/12313 [3:05:23<6:35:41, 2.90s/it] {'loss': 0.5741, 'grad_norm': 2.8973056633483325, 'learning_rate': 3.8780734356475555e-06, 'epoch': 0.33} 33%|███▎ | 4122/12313 [3:05:23<6:35:41, 2.90s/it] 33%|███▎ | 4123/12313 [3:05:26<6:50:36, 3.01s/it] {'loss': 0.5285, 'grad_norm': 3.5866605917438474, 'learning_rate': 3.8775246984497924e-06, 'epoch': 0.33} 33%|███▎ | 4123/12313 [3:05:26<6:50:36, 3.01s/it] 33%|███▎ | 4124/12313 [3:05:29<6:39:10, 2.92s/it] {'loss': 0.6005, 'grad_norm': 4.6085635915185, 'learning_rate': 3.876975865934612e-06, 'epoch': 0.33} 33%|███▎ | 4124/12313 [3:05:29<6:39:10, 2.92s/it] 34%|███▎ | 4125/12313 [3:05:32<6:29:03, 2.85s/it] {'loss': 0.3847, 'grad_norm': 8.852381490643886, 'learning_rate': 3.876426938139988e-06, 'epoch': 0.34} 34%|███▎ | 4125/12313 [3:05:32<6:29:03, 2.85s/it] 34%|███▎ | 4126/12313 [3:05:34<6:22:07, 2.80s/it] {'loss': 0.5997, 'grad_norm': 4.677167179306538, 'learning_rate': 3.875877915103905e-06, 'epoch': 0.34} 34%|███▎ | 4126/12313 [3:05:34<6:22:07, 2.80s/it] 34%|███▎ | 4127/12313 [3:05:37<6:16:52, 2.76s/it] {'loss': 0.4351, 'grad_norm': 5.192252334603382, 'learning_rate': 3.875328796864353e-06, 'epoch': 0.34} 34%|███▎ | 4127/12313 [3:05:37<6:16:52, 2.76s/it] 34%|███▎ | 4128/12313 [3:05:40<6:14:39, 2.75s/it] {'loss': 0.5296, 'grad_norm': 5.52994136456474, 'learning_rate': 3.8747795834593255e-06, 'epoch': 0.34} 34%|███▎ | 4128/12313 [3:05:40<6:14:39, 2.75s/it] 34%|███▎ | 4129/12313 [3:05:42<5:59:00, 2.63s/it] {'loss': 0.5424, 'grad_norm': 9.716260977214207, 'learning_rate': 3.8742302749268264e-06, 'epoch': 0.34} 34%|███▎ | 4129/12313 [3:05:42<5:59:00, 2.63s/it] 34%|███▎ | 4130/12313 [3:05:45<6:01:36, 2.65s/it] {'loss': 0.6156, 'grad_norm': 5.297918507808388, 'learning_rate': 3.873680871304867e-06, 'epoch': 0.34} 34%|███▎ | 4130/12313 [3:05:45<6:01:36, 2.65s/it] 34%|███▎ | 4131/12313 [3:05:48<6:06:25, 2.69s/it] {'loss': 0.5344, 'grad_norm': 6.873079350204181, 'learning_rate': 3.8731313726314615e-06, 'epoch': 0.34} 34%|███▎ | 4131/12313 [3:05:48<6:06:25, 2.69s/it] 34%|███▎ | 4132/12313 [3:05:50<6:10:57, 2.72s/it] {'loss': 0.4929, 'grad_norm': 7.36681510636934, 'learning_rate': 3.87258177894463e-06, 'epoch': 0.34} 34%|███▎ | 4132/12313 [3:05:50<6:10:57, 2.72s/it] 34%|███▎ | 4133/12313 [3:05:53<6:13:26, 2.74s/it] {'loss': 0.5442, 'grad_norm': 5.602356345828153, 'learning_rate': 3.872032090282406e-06, 'epoch': 0.34} 34%|███▎ | 4133/12313 [3:05:53<6:13:26, 2.74s/it] 34%|███▎ | 4134/12313 [3:05:56<6:05:51, 2.68s/it] {'loss': 0.492, 'grad_norm': 7.3672095285990435, 'learning_rate': 3.871482306682821e-06, 'epoch': 0.34} 34%|███▎ | 4134/12313 [3:05:56<6:05:51, 2.68s/it] 34%|███▎ | 4135/12313 [3:05:59<6:07:10, 2.69s/it] {'loss': 0.5198, 'grad_norm': 3.5366167997788427, 'learning_rate': 3.8709324281839205e-06, 'epoch': 0.34} 34%|███▎ | 4135/12313 [3:05:59<6:07:10, 2.69s/it] 34%|███▎ | 4136/12313 [3:06:01<6:12:56, 2.74s/it] {'loss': 0.5951, 'grad_norm': 6.858040921971297, 'learning_rate': 3.87038245482375e-06, 'epoch': 0.34} 34%|███▎ | 4136/12313 [3:06:01<6:12:56, 2.74s/it] 34%|███▎ | 4137/12313 [3:06:04<6:13:34, 2.74s/it] {'loss': 0.5853, 'grad_norm': 6.617980225631782, 'learning_rate': 3.869832386640367e-06, 'epoch': 0.34} 34%|███▎ | 4137/12313 [3:06:04<6:13:34, 2.74s/it] 34%|███▎ | 4138/12313 [3:06:07<6:17:42, 2.77s/it] {'loss': 0.5662, 'grad_norm': 4.219229609437151, 'learning_rate': 3.8692822236718334e-06, 'epoch': 0.34} 34%|███▎ | 4138/12313 [3:06:07<6:17:42, 2.77s/it] 34%|███▎ | 4139/12313 [3:06:10<6:13:23, 2.74s/it] {'loss': 0.4658, 'grad_norm': 7.439696385730913, 'learning_rate': 3.868731965956215e-06, 'epoch': 0.34} 34%|███▎ | 4139/12313 [3:06:10<6:13:23, 2.74s/it] 34%|███▎ | 4140/12313 [3:06:12<6:14:50, 2.75s/it] {'loss': 0.4537, 'grad_norm': 5.487826086727248, 'learning_rate': 3.86818161353159e-06, 'epoch': 0.34} 34%|███▎ | 4140/12313 [3:06:12<6:14:50, 2.75s/it] 34%|███▎ | 4141/12313 [3:06:15<6:11:40, 2.73s/it] {'loss': 0.5663, 'grad_norm': 4.200360102230179, 'learning_rate': 3.867631166436038e-06, 'epoch': 0.34} 34%|███▎ | 4141/12313 [3:06:15<6:11:40, 2.73s/it] 34%|███▎ | 4142/12313 [3:06:18<6:17:48, 2.77s/it] {'loss': 0.6134, 'grad_norm': 6.128604530592701, 'learning_rate': 3.867080624707647e-06, 'epoch': 0.34} 34%|███▎ | 4142/12313 [3:06:18<6:17:48, 2.77s/it] 34%|███▎ | 4143/12313 [3:06:21<6:10:11, 2.72s/it] {'loss': 0.5773, 'grad_norm': 8.81104997626726, 'learning_rate': 3.866529988384512e-06, 'epoch': 0.34} 34%|███▎ | 4143/12313 [3:06:21<6:10:11, 2.72s/it] 34%|███▎ | 4144/12313 [3:06:23<6:05:39, 2.69s/it] {'loss': 0.6132, 'grad_norm': 5.16122326363201, 'learning_rate': 3.865979257504734e-06, 'epoch': 0.34} 34%|███▎ | 4144/12313 [3:06:23<6:05:39, 2.69s/it] 34%|███▎ | 4145/12313 [3:06:26<6:00:04, 2.65s/it] {'loss': 0.6016, 'grad_norm': 5.870897322328888, 'learning_rate': 3.8654284321064205e-06, 'epoch': 0.34} 34%|███▎ | 4145/12313 [3:06:26<6:00:04, 2.65s/it] 34%|███▎ | 4146/12313 [3:06:29<6:12:21, 2.74s/it] {'loss': 0.5678, 'grad_norm': 5.314926965529579, 'learning_rate': 3.864877512227686e-06, 'epoch': 0.34} 34%|███▎ | 4146/12313 [3:06:29<6:12:21, 2.74s/it] 34%|███▎ | 4147/12313 [3:06:32<6:25:41, 2.83s/it] {'loss': 0.5558, 'grad_norm': 4.818645386450623, 'learning_rate': 3.864326497906652e-06, 'epoch': 0.34} 34%|███▎ | 4147/12313 [3:06:32<6:25:41, 2.83s/it] 34%|███▎ | 4148/12313 [3:06:34<6:14:17, 2.75s/it] {'loss': 0.5669, 'grad_norm': 14.41001497081986, 'learning_rate': 3.8637753891814435e-06, 'epoch': 0.34} 34%|███▎ | 4148/12313 [3:06:34<6:14:17, 2.75s/it] 34%|███▎ | 4149/12313 [3:06:37<6:17:09, 2.77s/it] {'loss': 0.5596, 'grad_norm': 10.608448522114792, 'learning_rate': 3.863224186090197e-06, 'epoch': 0.34} 34%|███▎ | 4149/12313 [3:06:37<6:17:09, 2.77s/it] 34%|███▎ | 4150/12313 [3:06:40<6:24:50, 2.83s/it] {'loss': 0.5213, 'grad_norm': 6.7059941990937775, 'learning_rate': 3.862672888671051e-06, 'epoch': 0.34} 34%|███▎ | 4150/12313 [3:06:40<6:24:50, 2.83s/it] 34%|███▎ | 4151/12313 [3:06:43<6:37:08, 2.92s/it] {'loss': 0.7185, 'grad_norm': 3.7804743609104237, 'learning_rate': 3.862121496962153e-06, 'epoch': 0.34} 34%|███▎ | 4151/12313 [3:06:43<6:37:08, 2.92s/it] 34%|███▎ | 4152/12313 [3:06:46<6:31:53, 2.88s/it] {'loss': 0.5231, 'grad_norm': 5.24213546954736, 'learning_rate': 3.861570011001658e-06, 'epoch': 0.34} 34%|███▎ | 4152/12313 [3:06:46<6:31:53, 2.88s/it] 34%|███▎ | 4153/12313 [3:06:49<6:24:31, 2.83s/it] {'loss': 0.4785, 'grad_norm': 7.547577006376037, 'learning_rate': 3.8610184308277216e-06, 'epoch': 0.34} 34%|███▎ | 4153/12313 [3:06:49<6:24:31, 2.83s/it] 34%|███▎ | 4154/12313 [3:06:52<6:38:59, 2.93s/it] {'loss': 0.6946, 'grad_norm': 2.5300512242174387, 'learning_rate': 3.860466756478514e-06, 'epoch': 0.34} 34%|███▎ | 4154/12313 [3:06:52<6:38:59, 2.93s/it] 34%|███▎ | 4155/12313 [3:06:54<6:23:37, 2.82s/it] {'loss': 0.7683, 'grad_norm': 4.653767821488039, 'learning_rate': 3.859914987992207e-06, 'epoch': 0.34} 34%|███▎ | 4155/12313 [3:06:54<6:23:37, 2.82s/it] 34%|███▍ | 4156/12313 [3:06:57<6:15:10, 2.76s/it] {'loss': 0.5301, 'grad_norm': 4.930191892040406, 'learning_rate': 3.85936312540698e-06, 'epoch': 0.34} 34%|███▍ | 4156/12313 [3:06:57<6:15:10, 2.76s/it] 34%|███▍ | 4157/12313 [3:07:00<6:15:08, 2.76s/it] {'loss': 0.4413, 'grad_norm': 5.429582397608964, 'learning_rate': 3.858811168761019e-06, 'epoch': 0.34} 34%|███▍ | 4157/12313 [3:07:00<6:15:08, 2.76s/it] 34%|███▍ | 4158/12313 [3:07:02<6:09:32, 2.72s/it] {'loss': 0.4454, 'grad_norm': 4.863739044933661, 'learning_rate': 3.8582591180925164e-06, 'epoch': 0.34} 34%|███▍ | 4158/12313 [3:07:02<6:09:32, 2.72s/it] 34%|███▍ | 4159/12313 [3:07:05<6:01:45, 2.66s/it] {'loss': 0.507, 'grad_norm': 5.522590385270159, 'learning_rate': 3.857706973439672e-06, 'epoch': 0.34} 34%|███▍ | 4159/12313 [3:07:05<6:01:45, 2.66s/it] 34%|███▍ | 4160/12313 [3:07:08<6:14:18, 2.75s/it] {'loss': 0.5373, 'grad_norm': 5.8411296020442, 'learning_rate': 3.85715473484069e-06, 'epoch': 0.34} 34%|███▍ | 4160/12313 [3:07:08<6:14:18, 2.75s/it] 34%|███▍ | 4161/12313 [3:07:11<6:09:44, 2.72s/it] {'loss': 0.5099, 'grad_norm': 6.215435578421796, 'learning_rate': 3.856602402333783e-06, 'epoch': 0.34} 34%|███▍ | 4161/12313 [3:07:11<6:09:44, 2.72s/it] 34%|███▍ | 4162/12313 [3:07:13<6:14:40, 2.76s/it] {'loss': 0.5364, 'grad_norm': 3.6082614721689215, 'learning_rate': 3.85604997595717e-06, 'epoch': 0.34} 34%|███▍ | 4162/12313 [3:07:13<6:14:40, 2.76s/it] 34%|███▍ | 4163/12313 [3:07:16<6:13:05, 2.75s/it] {'loss': 0.5525, 'grad_norm': 6.122674136049005, 'learning_rate': 3.855497455749076e-06, 'epoch': 0.34} 34%|███▍ | 4163/12313 [3:07:16<6:13:05, 2.75s/it] 34%|███▍ | 4164/12313 [3:07:19<5:59:13, 2.64s/it] {'loss': 0.4537, 'grad_norm': 7.470369967210209, 'learning_rate': 3.854944841747731e-06, 'epoch': 0.34} 34%|███▍ | 4164/12313 [3:07:19<5:59:13, 2.64s/it] 34%|███▍ | 4165/12313 [3:07:21<6:02:55, 2.67s/it] {'loss': 0.4119, 'grad_norm': 3.13970191041356, 'learning_rate': 3.854392133991373e-06, 'epoch': 0.34} 34%|███▍ | 4165/12313 [3:07:21<6:02:55, 2.67s/it] 34%|███▍ | 4166/12313 [3:07:24<6:00:56, 2.66s/it] {'loss': 0.6385, 'grad_norm': 5.817426875860977, 'learning_rate': 3.853839332518249e-06, 'epoch': 0.34} 34%|███▍ | 4166/12313 [3:07:24<6:00:56, 2.66s/it] 34%|███▍ | 4167/12313 [3:07:27<6:01:45, 2.66s/it] {'loss': 0.5794, 'grad_norm': 6.211091908912435, 'learning_rate': 3.8532864373666076e-06, 'epoch': 0.34} 34%|███▍ | 4167/12313 [3:07:27<6:01:45, 2.66s/it] 34%|███▍ | 4168/12313 [3:07:29<5:51:37, 2.59s/it] {'loss': 0.53, 'grad_norm': 5.831264510593337, 'learning_rate': 3.852733448574707e-06, 'epoch': 0.34} 34%|███▍ | 4168/12313 [3:07:29<5:51:37, 2.59s/it] 34%|███▍ | 4169/12313 [3:07:32<5:56:09, 2.62s/it] {'loss': 0.4364, 'grad_norm': 6.324531527655222, 'learning_rate': 3.8521803661808105e-06, 'epoch': 0.34} 34%|███▍ | 4169/12313 [3:07:32<5:56:09, 2.62s/it] 34%|███▍ | 4170/12313 [3:07:34<5:52:40, 2.60s/it] {'loss': 0.6626, 'grad_norm': 4.057887664623083, 'learning_rate': 3.851627190223189e-06, 'epoch': 0.34} 34%|███▍ | 4170/12313 [3:07:34<5:52:40, 2.60s/it] 34%|███▍ | 4171/12313 [3:07:37<5:42:39, 2.53s/it] {'loss': 0.5495, 'grad_norm': 6.855870724748997, 'learning_rate': 3.85107392074012e-06, 'epoch': 0.34} 34%|███▍ | 4171/12313 [3:07:37<5:42:39, 2.53s/it] 34%|███▍ | 4172/12313 [3:07:39<5:50:34, 2.58s/it] {'loss': 0.5127, 'grad_norm': 4.761555193030634, 'learning_rate': 3.850520557769886e-06, 'epoch': 0.34} 34%|███▍ | 4172/12313 [3:07:39<5:50:34, 2.58s/it] 34%|███▍ | 4173/12313 [3:07:42<5:49:50, 2.58s/it] {'loss': 0.4324, 'grad_norm': 4.705148709464969, 'learning_rate': 3.849967101350777e-06, 'epoch': 0.34} 34%|███▍ | 4173/12313 [3:07:42<5:49:50, 2.58s/it] 34%|███▍ | 4174/12313 [3:07:44<5:48:50, 2.57s/it] {'loss': 0.4936, 'grad_norm': 6.188404278264953, 'learning_rate': 3.849413551521089e-06, 'epoch': 0.34} 34%|███▍ | 4174/12313 [3:07:44<5:48:50, 2.57s/it] 34%|███▍ | 4175/12313 [3:07:47<5:53:24, 2.61s/it] {'loss': 0.4387, 'grad_norm': 4.011266087707078, 'learning_rate': 3.848859908319124e-06, 'epoch': 0.34} 34%|███▍ | 4175/12313 [3:07:47<5:53:24, 2.61s/it] 34%|███▍ | 4176/12313 [3:07:50<6:01:55, 2.67s/it] {'loss': 0.6163, 'grad_norm': 3.339221489587599, 'learning_rate': 3.8483061717831935e-06, 'epoch': 0.34} 34%|███▍ | 4176/12313 [3:07:50<6:01:55, 2.67s/it] 34%|███▍ | 4177/12313 [3:07:52<5:55:27, 2.62s/it] {'loss': 0.5683, 'grad_norm': 5.751859436326058, 'learning_rate': 3.8477523419516115e-06, 'epoch': 0.34} 34%|███▍ | 4177/12313 [3:07:52<5:55:27, 2.62s/it] 34%|███▍ | 4178/12313 [3:07:55<5:53:11, 2.60s/it] {'loss': 0.5431, 'grad_norm': 5.088627831398807, 'learning_rate': 3.8471984188627e-06, 'epoch': 0.34} 34%|███▍ | 4178/12313 [3:07:55<5:53:11, 2.60s/it] 34%|███▍ | 4179/12313 [3:07:58<6:01:09, 2.66s/it] {'loss': 0.7555, 'grad_norm': 5.37045736248948, 'learning_rate': 3.846644402554788e-06, 'epoch': 0.34} 34%|███▍ | 4179/12313 [3:07:58<6:01:09, 2.66s/it] 34%|███▍ | 4180/12313 [3:08:00<5:56:09, 2.63s/it] {'loss': 0.4728, 'grad_norm': 5.257725496690353, 'learning_rate': 3.84609029306621e-06, 'epoch': 0.34} 34%|███▍ | 4180/12313 [3:08:00<5:56:09, 2.63s/it] 34%|███▍ | 4181/12313 [3:08:03<5:57:39, 2.64s/it] {'loss': 0.5399, 'grad_norm': 4.575264421254666, 'learning_rate': 3.845536090435308e-06, 'epoch': 0.34} 34%|███▍ | 4181/12313 [3:08:03<5:57:39, 2.64s/it] 34%|███▍ | 4182/12313 [3:08:06<5:57:21, 2.64s/it] {'loss': 0.6235, 'grad_norm': 8.153018797116616, 'learning_rate': 3.84498179470043e-06, 'epoch': 0.34} 34%|███▍ | 4182/12313 [3:08:06<5:57:21, 2.64s/it] 34%|███▍ | 4183/12313 [3:08:08<6:00:21, 2.66s/it] {'loss': 0.4972, 'grad_norm': 6.682613939663991, 'learning_rate': 3.8444274058999295e-06, 'epoch': 0.34} 34%|███▍ | 4183/12313 [3:08:08<6:00:21, 2.66s/it] 34%|███▍ | 4184/12313 [3:08:11<6:02:28, 2.68s/it] {'loss': 0.4771, 'grad_norm': 4.184180026129862, 'learning_rate': 3.843872924072168e-06, 'epoch': 0.34} 34%|███▍ | 4184/12313 [3:08:11<6:02:28, 2.68s/it] 34%|███▍ | 4185/12313 [3:08:14<5:54:21, 2.62s/it] {'loss': 0.4656, 'grad_norm': 5.132706408054481, 'learning_rate': 3.843318349255512e-06, 'epoch': 0.34} 34%|███▍ | 4185/12313 [3:08:14<5:54:21, 2.62s/it] 34%|███▍ | 4186/12313 [3:08:16<6:07:09, 2.71s/it] {'loss': 0.5627, 'grad_norm': 4.503889789040779, 'learning_rate': 3.842763681488337e-06, 'epoch': 0.34} 34%|███▍ | 4186/12313 [3:08:16<6:07:09, 2.71s/it] 34%|███▍ | 4187/12313 [3:08:19<5:58:43, 2.65s/it] {'loss': 0.4399, 'grad_norm': 8.418497213540109, 'learning_rate': 3.84220892080902e-06, 'epoch': 0.34} 34%|███▍ | 4187/12313 [3:08:19<5:58:43, 2.65s/it] 34%|███▍ | 4188/12313 [3:08:22<5:55:14, 2.62s/it] {'loss': 0.4365, 'grad_norm': 5.704270836674129, 'learning_rate': 3.841654067255951e-06, 'epoch': 0.34} 34%|███▍ | 4188/12313 [3:08:22<5:55:14, 2.62s/it] 34%|███▍ | 4189/12313 [3:08:25<6:10:26, 2.74s/it] {'loss': 0.6448, 'grad_norm': 3.7971463715319813, 'learning_rate': 3.84109912086752e-06, 'epoch': 0.34} 34%|███▍ | 4189/12313 [3:08:25<6:10:26, 2.74s/it] 34%|███▍ | 4190/12313 [3:08:27<6:01:56, 2.67s/it] {'loss': 0.7299, 'grad_norm': 5.144040812396076, 'learning_rate': 3.840544081682128e-06, 'epoch': 0.34} 34%|███▍ | 4190/12313 [3:08:27<6:01:56, 2.67s/it] 34%|███▍ | 4191/12313 [3:08:29<5:51:40, 2.60s/it] {'loss': 0.4847, 'grad_norm': 4.053682645967981, 'learning_rate': 3.839988949738179e-06, 'epoch': 0.34} 34%|███▍ | 4191/12313 [3:08:29<5:51:40, 2.60s/it] 34%|███▍ | 4192/12313 [3:08:32<5:44:04, 2.54s/it] {'loss': 0.4542, 'grad_norm': 4.2685472291584245, 'learning_rate': 3.8394337250740886e-06, 'epoch': 0.34} 34%|███▍ | 4192/12313 [3:08:32<5:44:04, 2.54s/it] 34%|███▍ | 4193/12313 [3:08:35<5:51:08, 2.59s/it] {'loss': 0.5573, 'grad_norm': 5.165758993343546, 'learning_rate': 3.838878407728272e-06, 'epoch': 0.34} 34%|███▍ | 4193/12313 [3:08:35<5:51:08, 2.59s/it] 34%|███▍ | 4194/12313 [3:08:37<5:50:08, 2.59s/it] {'loss': 0.6386, 'grad_norm': 4.470607814383757, 'learning_rate': 3.838322997739155e-06, 'epoch': 0.34} 34%|███▍ | 4194/12313 [3:08:37<5:50:08, 2.59s/it] 34%|███▍ | 4195/12313 [3:08:40<5:47:07, 2.57s/it] {'loss': 0.6893, 'grad_norm': 6.678206627035974, 'learning_rate': 3.837767495145171e-06, 'epoch': 0.34} 34%|███▍ | 4195/12313 [3:08:40<5:47:07, 2.57s/it] 34%|███▍ | 4196/12313 [3:08:42<5:47:38, 2.57s/it] {'loss': 0.6608, 'grad_norm': 3.54047842198611, 'learning_rate': 3.837211899984756e-06, 'epoch': 0.34} 34%|███▍ | 4196/12313 [3:08:42<5:47:38, 2.57s/it] 34%|███▍ | 4197/12313 [3:08:45<5:38:20, 2.50s/it] {'loss': 0.752, 'grad_norm': 2.4752964852325023, 'learning_rate': 3.836656212296353e-06, 'epoch': 0.34} 34%|███▍ | 4197/12313 [3:08:45<5:38:20, 2.50s/it] 34%|███▍ | 4198/12313 [3:08:47<5:40:42, 2.52s/it] {'loss': 0.5224, 'grad_norm': 5.131230574367217, 'learning_rate': 3.836100432118416e-06, 'epoch': 0.34} 34%|███▍ | 4198/12313 [3:08:47<5:40:42, 2.52s/it] 34%|███▍ | 4199/12313 [3:08:50<5:47:54, 2.57s/it] {'loss': 0.5236, 'grad_norm': 7.654106670047192, 'learning_rate': 3.8355445594894e-06, 'epoch': 0.34} 34%|███▍ | 4199/12313 [3:08:50<5:47:54, 2.57s/it] 34%|███▍ | 4200/12313 [3:08:52<5:48:44, 2.58s/it] {'loss': 0.3241, 'grad_norm': 5.475141337183228, 'learning_rate': 3.834988594447768e-06, 'epoch': 0.34} 34%|███▍ | 4200/12313 [3:08:52<5:48:44, 2.58s/it] 34%|███▍ | 4201/12313 [3:08:55<5:56:10, 2.63s/it] {'loss': 0.5689, 'grad_norm': 4.9511971467274964, 'learning_rate': 3.8344325370319914e-06, 'epoch': 0.34} 34%|███▍ | 4201/12313 [3:08:55<5:56:10, 2.63s/it] 34%|███▍ | 4202/12313 [3:08:58<5:53:04, 2.61s/it] {'loss': 0.6184, 'grad_norm': 4.8632486482468, 'learning_rate': 3.833876387280546e-06, 'epoch': 0.34} 34%|███▍ | 4202/12313 [3:08:58<5:53:04, 2.61s/it] 34%|███▍ | 4203/12313 [3:09:00<5:53:16, 2.61s/it] {'loss': 0.6777, 'grad_norm': 8.274806032942742, 'learning_rate': 3.833320145231913e-06, 'epoch': 0.34} 34%|███▍ | 4203/12313 [3:09:00<5:53:16, 2.61s/it] 34%|███▍ | 4204/12313 [3:09:03<5:55:33, 2.63s/it] {'loss': 0.5455, 'grad_norm': 3.411012348367635, 'learning_rate': 3.832763810924583e-06, 'epoch': 0.34} 34%|███▍ | 4204/12313 [3:09:03<5:55:33, 2.63s/it] 34%|███▍ | 4205/12313 [3:09:06<5:54:49, 2.63s/it] {'loss': 0.5742, 'grad_norm': 4.294223710510589, 'learning_rate': 3.832207384397051e-06, 'epoch': 0.34} 34%|███▍ | 4205/12313 [3:09:06<5:54:49, 2.63s/it] 34%|███▍ | 4206/12313 [3:09:08<5:58:23, 2.65s/it] {'loss': 0.8163, 'grad_norm': 7.056992590176889, 'learning_rate': 3.831650865687818e-06, 'epoch': 0.34} 34%|███▍ | 4206/12313 [3:09:08<5:58:23, 2.65s/it] 34%|███▍ | 4207/12313 [3:09:11<5:55:03, 2.63s/it] {'loss': 0.4881, 'grad_norm': 4.513215530320878, 'learning_rate': 3.831094254835393e-06, 'epoch': 0.34} 34%|███▍ | 4207/12313 [3:09:11<5:55:03, 2.63s/it] 34%|███▍ | 4208/12313 [3:09:14<5:54:05, 2.62s/it] {'loss': 0.5084, 'grad_norm': 7.046767735752562, 'learning_rate': 3.8305375518782905e-06, 'epoch': 0.34} 34%|███▍ | 4208/12313 [3:09:14<5:54:05, 2.62s/it] 34%|███▍ | 4209/12313 [3:09:16<6:00:21, 2.67s/it] {'loss': 0.4564, 'grad_norm': 4.65613704507043, 'learning_rate': 3.829980756855032e-06, 'epoch': 0.34} 34%|███▍ | 4209/12313 [3:09:16<6:00:21, 2.67s/it] 34%|███▍ | 4210/12313 [3:09:19<6:00:51, 2.67s/it] {'loss': 0.5426, 'grad_norm': 5.501379878695886, 'learning_rate': 3.829423869804143e-06, 'epoch': 0.34} 34%|███▍ | 4210/12313 [3:09:19<6:00:51, 2.67s/it] 34%|███▍ | 4211/12313 [3:09:22<5:58:13, 2.65s/it] {'loss': 0.4953, 'grad_norm': 7.787760682762794, 'learning_rate': 3.828866890764157e-06, 'epoch': 0.34} 34%|███▍ | 4211/12313 [3:09:22<5:58:13, 2.65s/it] 34%|███▍ | 4212/12313 [3:09:25<6:06:10, 2.71s/it] {'loss': 0.7606, 'grad_norm': 4.761244297351993, 'learning_rate': 3.828309819773617e-06, 'epoch': 0.34} 34%|███▍ | 4212/12313 [3:09:25<6:06:10, 2.71s/it] 34%|███▍ | 4213/12313 [3:09:27<6:02:14, 2.68s/it] {'loss': 0.4022, 'grad_norm': 6.816779994137413, 'learning_rate': 3.827752656871067e-06, 'epoch': 0.34} 34%|███▍ | 4213/12313 [3:09:27<6:02:14, 2.68s/it] 34%|███▍ | 4214/12313 [3:09:30<6:07:53, 2.73s/it] {'loss': 0.5904, 'grad_norm': 5.030056000377152, 'learning_rate': 3.827195402095059e-06, 'epoch': 0.34} 34%|███▍ | 4214/12313 [3:09:30<6:07:53, 2.73s/it] 34%|███▍ | 4215/12313 [3:09:33<6:03:00, 2.69s/it] {'loss': 0.423, 'grad_norm': 5.852140493007803, 'learning_rate': 3.826638055484154e-06, 'epoch': 0.34} 34%|███▍ | 4215/12313 [3:09:33<6:03:00, 2.69s/it] 34%|███▍ | 4216/12313 [3:09:35<6:02:50, 2.69s/it] {'loss': 0.6244, 'grad_norm': 4.727427200553123, 'learning_rate': 3.826080617076917e-06, 'epoch': 0.34} 34%|███▍ | 4216/12313 [3:09:35<6:02:50, 2.69s/it] 34%|███▍ | 4217/12313 [3:09:38<6:14:51, 2.78s/it] {'loss': 0.4194, 'grad_norm': 8.080683087721372, 'learning_rate': 3.825523086911919e-06, 'epoch': 0.34} 34%|███▍ | 4217/12313 [3:09:38<6:14:51, 2.78s/it] 34%|███▍ | 4218/12313 [3:09:41<6:20:42, 2.82s/it] {'loss': 0.5493, 'grad_norm': 3.4664753816902834, 'learning_rate': 3.824965465027739e-06, 'epoch': 0.34} 34%|███▍ | 4218/12313 [3:09:41<6:20:42, 2.82s/it] 34%|███▍ | 4219/12313 [3:09:44<6:14:40, 2.78s/it] {'loss': 0.5962, 'grad_norm': 8.5527804211556, 'learning_rate': 3.824407751462962e-06, 'epoch': 0.34} 34%|███▍ | 4219/12313 [3:09:44<6:14:40, 2.78s/it] 34%|███▍ | 4220/12313 [3:09:46<6:01:35, 2.68s/it] {'loss': 0.5413, 'grad_norm': 9.91272395777568, 'learning_rate': 3.823849946256176e-06, 'epoch': 0.34} 34%|███▍ | 4220/12313 [3:09:46<6:01:35, 2.68s/it] 34%|███▍ | 4221/12313 [3:09:49<6:02:02, 2.68s/it] {'loss': 0.4414, 'grad_norm': 5.49826382214047, 'learning_rate': 3.82329204944598e-06, 'epoch': 0.34} 34%|███▍ | 4221/12313 [3:09:49<6:02:02, 2.68s/it] 34%|███▍ | 4222/12313 [3:09:52<6:04:57, 2.71s/it] {'loss': 0.653, 'grad_norm': 5.466358398271932, 'learning_rate': 3.822734061070979e-06, 'epoch': 0.34} 34%|███▍ | 4222/12313 [3:09:52<6:04:57, 2.71s/it] 34%|███▍ | 4223/12313 [3:09:54<6:05:33, 2.71s/it] {'loss': 0.6647, 'grad_norm': 11.56485219391147, 'learning_rate': 3.8221759811697814e-06, 'epoch': 0.34} 34%|███▍ | 4223/12313 [3:09:54<6:05:33, 2.71s/it] 34%|███▍ | 4224/12313 [3:09:57<6:02:04, 2.69s/it] {'loss': 0.5001, 'grad_norm': 17.302847263409163, 'learning_rate': 3.821617809781004e-06, 'epoch': 0.34} 34%|███▍ | 4224/12313 [3:09:57<6:02:04, 2.69s/it] 34%|███▍ | 4225/12313 [3:10:00<5:53:51, 2.63s/it] {'loss': 0.4429, 'grad_norm': 4.9393254820446, 'learning_rate': 3.821059546943268e-06, 'epoch': 0.34} 34%|███▍ | 4225/12313 [3:10:00<5:53:51, 2.63s/it] 34%|███▍ | 4226/12313 [3:10:02<5:57:11, 2.65s/it] {'loss': 0.6441, 'grad_norm': 4.737433746889213, 'learning_rate': 3.820501192695202e-06, 'epoch': 0.34} 34%|███▍ | 4226/12313 [3:10:02<5:57:11, 2.65s/it] 34%|███▍ | 4227/12313 [3:10:05<5:56:53, 2.65s/it] {'loss': 0.4669, 'grad_norm': 5.321040703740516, 'learning_rate': 3.819942747075443e-06, 'epoch': 0.34} 34%|███▍ | 4227/12313 [3:10:05<5:56:53, 2.65s/it] 34%|███▍ | 4228/12313 [3:10:08<5:58:51, 2.66s/it] {'loss': 0.682, 'grad_norm': 5.28796113251718, 'learning_rate': 3.819384210122631e-06, 'epoch': 0.34} 34%|███▍ | 4228/12313 [3:10:08<5:58:51, 2.66s/it] 34%|███▍ | 4229/12313 [3:10:11<6:08:55, 2.74s/it] {'loss': 0.5883, 'grad_norm': 4.339078198914844, 'learning_rate': 3.818825581875415e-06, 'epoch': 0.34} 34%|███▍ | 4229/12313 [3:10:11<6:08:55, 2.74s/it] 34%|███▍ | 4230/12313 [3:10:13<5:56:54, 2.65s/it] {'loss': 0.5184, 'grad_norm': 4.128567458915762, 'learning_rate': 3.818266862372449e-06, 'epoch': 0.34} 34%|███▍ | 4230/12313 [3:10:13<5:56:54, 2.65s/it] 34%|███▍ | 4231/12313 [3:10:15<5:48:04, 2.58s/it] {'loss': 0.6334, 'grad_norm': 3.9206753012437146, 'learning_rate': 3.817708051652392e-06, 'epoch': 0.34} 34%|███▍ | 4231/12313 [3:10:15<5:48:04, 2.58s/it] 34%|███▍ | 4232/12313 [3:10:18<5:40:19, 2.53s/it] {'loss': 0.5891, 'grad_norm': 4.303153750803405, 'learning_rate': 3.817149149753912e-06, 'epoch': 0.34} 34%|███▍ | 4232/12313 [3:10:18<5:40:19, 2.53s/it] 34%|███▍ | 4233/12313 [3:10:20<5:45:59, 2.57s/it] {'loss': 0.5449, 'grad_norm': 4.474909831406013, 'learning_rate': 3.816590156715682e-06, 'epoch': 0.34} 34%|███▍ | 4233/12313 [3:10:20<5:45:59, 2.57s/it] 34%|███▍ | 4234/12313 [3:10:23<5:49:46, 2.60s/it] {'loss': 0.4061, 'grad_norm': 4.504556577904222, 'learning_rate': 3.81603107257638e-06, 'epoch': 0.34} 34%|███▍ | 4234/12313 [3:10:23<5:49:46, 2.60s/it] 34%|███▍ | 4235/12313 [3:10:26<5:52:13, 2.62s/it] {'loss': 0.5448, 'grad_norm': 2.939069265586592, 'learning_rate': 3.815471897374695e-06, 'epoch': 0.34} 34%|███▍ | 4235/12313 [3:10:26<5:52:13, 2.62s/it] 34%|███▍ | 4236/12313 [3:10:29<6:05:24, 2.71s/it] {'loss': 0.4506, 'grad_norm': 4.384216019391756, 'learning_rate': 3.814912631149315e-06, 'epoch': 0.34} 34%|███▍ | 4236/12313 [3:10:29<6:05:24, 2.71s/it] 34%|███▍ | 4237/12313 [3:10:32<6:10:48, 2.75s/it] {'loss': 0.5923, 'grad_norm': 5.19610868271772, 'learning_rate': 3.8143532739389403e-06, 'epoch': 0.34} 34%|███▍ | 4237/12313 [3:10:32<6:10:48, 2.75s/it] 34%|███▍ | 4238/12313 [3:10:34<6:06:40, 2.72s/it] {'loss': 0.6411, 'grad_norm': 4.2187938416631185, 'learning_rate': 3.813793825782276e-06, 'epoch': 0.34} 34%|███▍ | 4238/12313 [3:10:34<6:06:40, 2.72s/it] 34%|███▍ | 4239/12313 [3:10:37<5:58:09, 2.66s/it] {'loss': 0.6972, 'grad_norm': 5.412344609770045, 'learning_rate': 3.8132342867180318e-06, 'epoch': 0.34} 34%|███▍ | 4239/12313 [3:10:37<5:58:09, 2.66s/it] 34%|███▍ | 4240/12313 [3:10:40<6:10:53, 2.76s/it] {'loss': 0.6241, 'grad_norm': 3.8719949393667887, 'learning_rate': 3.812674656784924e-06, 'epoch': 0.34} 34%|███▍ | 4240/12313 [3:10:40<6:10:53, 2.76s/it] 34%|███▍ | 4241/12313 [3:10:43<6:14:47, 2.79s/it] {'loss': 0.4416, 'grad_norm': 6.272686879652281, 'learning_rate': 3.812114936021678e-06, 'epoch': 0.34} 34%|███▍ | 4241/12313 [3:10:43<6:14:47, 2.79s/it] 34%|███▍ | 4242/12313 [3:10:45<6:07:41, 2.73s/it] {'loss': 0.6361, 'grad_norm': 4.522533410945012, 'learning_rate': 3.811555124467023e-06, 'epoch': 0.34} 34%|███▍ | 4242/12313 [3:10:45<6:07:41, 2.73s/it] 34%|███▍ | 4243/12313 [3:10:48<6:13:44, 2.78s/it] {'loss': 0.5521, 'grad_norm': 3.7359506733942554, 'learning_rate': 3.8109952221596948e-06, 'epoch': 0.34} 34%|███▍ | 4243/12313 [3:10:48<6:13:44, 2.78s/it] 34%|███▍ | 4244/12313 [3:10:51<6:05:24, 2.72s/it] {'loss': 0.6105, 'grad_norm': 5.599008701288348, 'learning_rate': 3.810435229138435e-06, 'epoch': 0.34} 34%|███▍ | 4244/12313 [3:10:51<6:05:24, 2.72s/it] 34%|███▍ | 4245/12313 [3:10:53<5:57:40, 2.66s/it] {'loss': 0.6491, 'grad_norm': 4.77660925406526, 'learning_rate': 3.8098751454419925e-06, 'epoch': 0.34} 34%|███▍ | 4245/12313 [3:10:53<5:57:40, 2.66s/it] 34%|███▍ | 4246/12313 [3:10:56<5:58:22, 2.67s/it] {'loss': 0.4824, 'grad_norm': 5.070740770359748, 'learning_rate': 3.8093149711091227e-06, 'epoch': 0.34} 34%|███▍ | 4246/12313 [3:10:56<5:58:22, 2.67s/it] 34%|███▍ | 4247/12313 [3:10:59<5:59:14, 2.67s/it] {'loss': 0.591, 'grad_norm': 5.29618525895374, 'learning_rate': 3.8087547061785864e-06, 'epoch': 0.34} 34%|███▍ | 4247/12313 [3:10:59<5:59:14, 2.67s/it] 35%|███▍ | 4248/12313 [3:11:01<5:48:03, 2.59s/it] {'loss': 0.6667, 'grad_norm': 4.011238439114341, 'learning_rate': 3.8081943506891505e-06, 'epoch': 0.35} 35%|███▍ | 4248/12313 [3:11:01<5:48:03, 2.59s/it] 35%|███▍ | 4249/12313 [3:11:03<5:40:01, 2.53s/it] {'loss': 0.5231, 'grad_norm': 4.640224694054099, 'learning_rate': 3.8076339046795897e-06, 'epoch': 0.35} 35%|███▍ | 4249/12313 [3:11:03<5:40:01, 2.53s/it] 35%|███▍ | 4250/12313 [3:11:06<5:37:19, 2.51s/it] {'loss': 0.7279, 'grad_norm': 9.930057551543705, 'learning_rate': 3.807073368188683e-06, 'epoch': 0.35} 35%|███▍ | 4250/12313 [3:11:06<5:37:19, 2.51s/it] 35%|███▍ | 4251/12313 [3:11:08<5:44:31, 2.56s/it] {'loss': 0.7463, 'grad_norm': 10.280051980432813, 'learning_rate': 3.8065127412552172e-06, 'epoch': 0.35} 35%|███▍ | 4251/12313 [3:11:08<5:44:31, 2.56s/it] 35%|███▍ | 4252/12313 [3:11:11<5:40:29, 2.53s/it] {'loss': 0.7628, 'grad_norm': 4.260596124125055, 'learning_rate': 3.8059520239179836e-06, 'epoch': 0.35} 35%|███▍ | 4252/12313 [3:11:11<5:40:29, 2.53s/it] 35%|███▍ | 4253/12313 [3:11:14<5:57:11, 2.66s/it] {'loss': 0.4457, 'grad_norm': 6.511349904040633, 'learning_rate': 3.805391216215782e-06, 'epoch': 0.35} 35%|███▍ | 4253/12313 [3:11:14<5:57:11, 2.66s/it] 35%|███▍ | 4254/12313 [3:11:16<5:52:26, 2.62s/it] {'loss': 0.4474, 'grad_norm': 7.176157624874702, 'learning_rate': 3.8048303181874167e-06, 'epoch': 0.35} 35%|███▍ | 4254/12313 [3:11:16<5:52:26, 2.62s/it] 35%|███▍ | 4255/12313 [3:11:19<5:52:04, 2.62s/it] {'loss': 0.7248, 'grad_norm': 5.9798967177829345, 'learning_rate': 3.8042693298717e-06, 'epoch': 0.35} 35%|███▍ | 4255/12313 [3:11:19<5:52:04, 2.62s/it] 35%|███▍ | 4256/12313 [3:11:22<5:57:07, 2.66s/it] {'loss': 0.4971, 'grad_norm': 5.3870467284981, 'learning_rate': 3.8037082513074468e-06, 'epoch': 0.35} 35%|███▍ | 4256/12313 [3:11:22<5:57:07, 2.66s/it] 35%|███▍ | 4257/12313 [3:11:24<5:50:33, 2.61s/it] {'loss': 0.5185, 'grad_norm': 6.0361440421296, 'learning_rate': 3.8031470825334838e-06, 'epoch': 0.35} 35%|███▍ | 4257/12313 [3:11:24<5:50:33, 2.61s/it] 35%|███▍ | 4258/12313 [3:11:27<5:49:11, 2.60s/it] {'loss': 0.7618, 'grad_norm': 3.7390306125476243, 'learning_rate': 3.8025858235886394e-06, 'epoch': 0.35} 35%|███▍ | 4258/12313 [3:11:27<5:49:11, 2.60s/it] 35%|███▍ | 4259/12313 [3:11:30<5:50:30, 2.61s/it] {'loss': 0.5033, 'grad_norm': 6.423197360909206, 'learning_rate': 3.802024474511749e-06, 'epoch': 0.35} 35%|███▍ | 4259/12313 [3:11:30<5:50:30, 2.61s/it] 35%|███▍ | 4260/12313 [3:11:32<5:53:20, 2.63s/it] {'loss': 0.5696, 'grad_norm': 4.523265015480748, 'learning_rate': 3.801463035341656e-06, 'epoch': 0.35} 35%|███▍ | 4260/12313 [3:11:32<5:53:20, 2.63s/it] 35%|███▍ | 4261/12313 [3:11:35<5:47:11, 2.59s/it] {'loss': 0.5074, 'grad_norm': 6.030313585209124, 'learning_rate': 3.8009015061172095e-06, 'epoch': 0.35} 35%|███▍ | 4261/12313 [3:11:35<5:47:11, 2.59s/it] 35%|███▍ | 4262/12313 [3:11:37<5:50:38, 2.61s/it] {'loss': 0.4557, 'grad_norm': 9.4302885564273, 'learning_rate': 3.8003398868772635e-06, 'epoch': 0.35} 35%|███▍ | 4262/12313 [3:11:37<5:50:38, 2.61s/it] 35%|███▍ | 4263/12313 [3:11:40<5:58:20, 2.67s/it] {'loss': 0.5377, 'grad_norm': 7.919711884223888, 'learning_rate': 3.799778177660679e-06, 'epoch': 0.35} 35%|███▍ | 4263/12313 [3:11:40<5:58:20, 2.67s/it] 35%|███▍ | 4264/12313 [3:11:43<5:58:43, 2.67s/it] {'loss': 0.5704, 'grad_norm': 4.176073302897445, 'learning_rate': 3.7992163785063236e-06, 'epoch': 0.35} 35%|███▍ | 4264/12313 [3:11:43<5:58:43, 2.67s/it] 35%|███▍ | 4265/12313 [3:11:45<5:55:00, 2.65s/it] {'loss': 0.554, 'grad_norm': 5.09213708548835, 'learning_rate': 3.798654489453071e-06, 'epoch': 0.35} 35%|███▍ | 4265/12313 [3:11:45<5:55:00, 2.65s/it] 35%|███▍ | 4266/12313 [3:11:48<5:56:47, 2.66s/it] {'loss': 0.526, 'grad_norm': 6.0436953163637295, 'learning_rate': 3.7980925105398004e-06, 'epoch': 0.35} 35%|███▍ | 4266/12313 [3:11:48<5:56:47, 2.66s/it] 35%|███▍ | 4267/12313 [3:11:51<6:07:17, 2.74s/it] {'loss': 0.4978, 'grad_norm': 6.156030657781099, 'learning_rate': 3.7975304418053986e-06, 'epoch': 0.35} 35%|███▍ | 4267/12313 [3:11:51<6:07:17, 2.74s/it] 35%|███▍ | 4268/12313 [3:11:54<6:06:11, 2.73s/it] {'loss': 0.6419, 'grad_norm': 8.589190270382112, 'learning_rate': 3.796968283288758e-06, 'epoch': 0.35} 35%|███▍ | 4268/12313 [3:11:54<6:06:11, 2.73s/it] 35%|███▍ | 4269/12313 [3:11:56<6:03:53, 2.71s/it] {'loss': 0.5272, 'grad_norm': 3.828448877259962, 'learning_rate': 3.7964060350287747e-06, 'epoch': 0.35} 35%|███▍ | 4269/12313 [3:11:56<6:03:53, 2.71s/it] 35%|███▍ | 4270/12313 [3:11:59<6:03:45, 2.71s/it] {'loss': 0.5072, 'grad_norm': 4.335800260568463, 'learning_rate': 3.795843697064355e-06, 'epoch': 0.35} 35%|███▍ | 4270/12313 [3:11:59<6:03:45, 2.71s/it] 35%|███▍ | 4271/12313 [3:12:02<6:14:48, 2.80s/it] {'loss': 0.5839, 'grad_norm': 4.112493113260704, 'learning_rate': 3.795281269434411e-06, 'epoch': 0.35} 35%|███▍ | 4271/12313 [3:12:02<6:14:48, 2.80s/it] 35%|███▍ | 4272/12313 [3:12:05<6:03:59, 2.72s/it] {'loss': 0.7074, 'grad_norm': 7.2088319680381225, 'learning_rate': 3.794718752177857e-06, 'epoch': 0.35} 35%|███▍ | 4272/12313 [3:12:05<6:03:59, 2.72s/it] 35%|███▍ | 4273/12313 [3:12:08<6:11:27, 2.77s/it] {'loss': 0.5309, 'grad_norm': 3.766153162278056, 'learning_rate': 3.7941561453336184e-06, 'epoch': 0.35} 35%|███▍ | 4273/12313 [3:12:08<6:11:27, 2.77s/it] 35%|███▍ | 4274/12313 [3:12:10<6:01:16, 2.70s/it] {'loss': 0.3967, 'grad_norm': 5.845126706777332, 'learning_rate': 3.7935934489406232e-06, 'epoch': 0.35} 35%|███▍ | 4274/12313 [3:12:10<6:01:16, 2.70s/it] 35%|███▍ | 4275/12313 [3:12:13<5:59:46, 2.69s/it] {'loss': 0.7975, 'grad_norm': 6.9520337995936785, 'learning_rate': 3.7930306630378085e-06, 'epoch': 0.35} 35%|███▍ | 4275/12313 [3:12:13<5:59:46, 2.69s/it] 35%|███▍ | 4276/12313 [3:12:15<6:02:27, 2.71s/it] {'loss': 0.4203, 'grad_norm': 4.890242996197673, 'learning_rate': 3.7924677876641147e-06, 'epoch': 0.35} 35%|███▍ | 4276/12313 [3:12:15<6:02:27, 2.71s/it] 35%|███▍ | 4277/12313 [3:12:18<6:08:39, 2.75s/it] {'loss': 0.7376, 'grad_norm': 3.8913871864084255, 'learning_rate': 3.79190482285849e-06, 'epoch': 0.35} 35%|███▍ | 4277/12313 [3:12:18<6:08:39, 2.75s/it] 35%|███▍ | 4278/12313 [3:12:21<5:54:11, 2.64s/it] {'loss': 0.6519, 'grad_norm': 3.544505934601897, 'learning_rate': 3.7913417686598886e-06, 'epoch': 0.35} 35%|███▍ | 4278/12313 [3:12:21<5:54:11, 2.64s/it] 35%|███▍ | 4279/12313 [3:12:23<5:50:08, 2.61s/it] {'loss': 0.5073, 'grad_norm': 9.771874636578227, 'learning_rate': 3.790778625107272e-06, 'epoch': 0.35} 35%|███▍ | 4279/12313 [3:12:23<5:50:08, 2.61s/it] 35%|███▍ | 4280/12313 [3:12:26<5:45:54, 2.58s/it] {'loss': 0.5759, 'grad_norm': 3.505229136041032, 'learning_rate': 3.790215392239606e-06, 'epoch': 0.35} 35%|███▍ | 4280/12313 [3:12:26<5:45:54, 2.58s/it] 35%|███▍ | 4281/12313 [3:12:28<5:39:05, 2.53s/it] {'loss': 0.5125, 'grad_norm': 5.243230166877009, 'learning_rate': 3.7896520700958616e-06, 'epoch': 0.35} 35%|███▍ | 4281/12313 [3:12:28<5:39:05, 2.53s/it] 35%|███▍ | 4282/12313 [3:12:31<5:57:59, 2.67s/it] {'loss': 0.5668, 'grad_norm': 5.947817581890729, 'learning_rate': 3.789088658715021e-06, 'epoch': 0.35} 35%|███▍ | 4282/12313 [3:12:31<5:57:59, 2.67s/it] 35%|███▍ | 4283/12313 [3:12:34<5:58:35, 2.68s/it] {'loss': 0.5129, 'grad_norm': 4.83590576883994, 'learning_rate': 3.788525158136067e-06, 'epoch': 0.35} 35%|███▍ | 4283/12313 [3:12:34<5:58:35, 2.68s/it] 35%|███▍ | 4284/12313 [3:12:37<5:57:06, 2.67s/it] {'loss': 0.45, 'grad_norm': 4.089901831736869, 'learning_rate': 3.787961568397992e-06, 'epoch': 0.35} 35%|███▍ | 4284/12313 [3:12:37<5:57:06, 2.67s/it] 35%|███▍ | 4285/12313 [3:12:40<6:15:46, 2.81s/it] {'loss': 0.699, 'grad_norm': 4.292363577189182, 'learning_rate': 3.787397889539792e-06, 'epoch': 0.35} 35%|███▍ | 4285/12313 [3:12:40<6:15:46, 2.81s/it] 35%|███▍ | 4286/12313 [3:12:42<6:06:19, 2.74s/it] {'loss': 0.4771, 'grad_norm': 3.731168656658335, 'learning_rate': 3.786834121600472e-06, 'epoch': 0.35} 35%|███▍ | 4286/12313 [3:12:42<6:06:19, 2.74s/it] 35%|███▍ | 4287/12313 [3:12:45<6:03:02, 2.71s/it] {'loss': 0.6777, 'grad_norm': 6.663689172947077, 'learning_rate': 3.7862702646190415e-06, 'epoch': 0.35} 35%|███▍ | 4287/12313 [3:12:45<6:03:02, 2.71s/it] 35%|███▍ | 4288/12313 [3:12:47<5:50:01, 2.62s/it] {'loss': 0.62, 'grad_norm': 7.125149701358761, 'learning_rate': 3.7857063186345156e-06, 'epoch': 0.35} 35%|███▍ | 4288/12313 [3:12:47<5:50:01, 2.62s/it] 35%|███▍ | 4289/12313 [3:12:50<5:50:08, 2.62s/it] {'loss': 0.541, 'grad_norm': 3.677264775633149, 'learning_rate': 3.7851422836859177e-06, 'epoch': 0.35} 35%|███▍ | 4289/12313 [3:12:50<5:50:08, 2.62s/it] 35%|███▍ | 4290/12313 [3:12:53<5:54:07, 2.65s/it] {'loss': 0.4561, 'grad_norm': 7.094843934887101, 'learning_rate': 3.7845781598122743e-06, 'epoch': 0.35} 35%|███▍ | 4290/12313 [3:12:53<5:54:07, 2.65s/it] 35%|███▍ | 4291/12313 [3:12:55<5:45:53, 2.59s/it] {'loss': 0.4937, 'grad_norm': 4.886642342030592, 'learning_rate': 3.7840139470526215e-06, 'epoch': 0.35} 35%|███▍ | 4291/12313 [3:12:55<5:45:53, 2.59s/it] 35%|███▍ | 4292/12313 [3:12:58<5:44:59, 2.58s/it] {'loss': 0.513, 'grad_norm': 5.62838070355612, 'learning_rate': 3.783449645445999e-06, 'epoch': 0.35} 35%|███▍ | 4292/12313 [3:12:58<5:44:59, 2.58s/it] 35%|███▍ | 4293/12313 [3:13:00<5:43:07, 2.57s/it] {'loss': 0.5147, 'grad_norm': 4.6812249317048025, 'learning_rate': 3.782885255031453e-06, 'epoch': 0.35} 35%|███▍ | 4293/12313 [3:13:00<5:43:07, 2.57s/it] 35%|███▍ | 4294/12313 [3:13:03<5:38:01, 2.53s/it] {'loss': 0.3674, 'grad_norm': 6.938556921862774, 'learning_rate': 3.782320775848038e-06, 'epoch': 0.35} 35%|███▍ | 4294/12313 [3:13:03<5:38:01, 2.53s/it] 35%|███▍ | 4295/12313 [3:13:06<5:53:04, 2.64s/it] {'loss': 0.4626, 'grad_norm': 6.723176214680427, 'learning_rate': 3.7817562079348114e-06, 'epoch': 0.35} 35%|███▍ | 4295/12313 [3:13:06<5:53:04, 2.64s/it] 35%|███▍ | 4296/12313 [3:13:08<5:51:57, 2.63s/it] {'loss': 0.4768, 'grad_norm': 5.553775061400944, 'learning_rate': 3.7811915513308382e-06, 'epoch': 0.35} 35%|███▍ | 4296/12313 [3:13:08<5:51:57, 2.63s/it] 35%|███▍ | 4297/12313 [3:13:11<5:41:58, 2.56s/it] {'loss': 0.4368, 'grad_norm': 5.884939622386394, 'learning_rate': 3.7806268060751914e-06, 'epoch': 0.35} 35%|███▍ | 4297/12313 [3:13:11<5:41:58, 2.56s/it] 35%|███▍ | 4298/12313 [3:13:13<5:40:30, 2.55s/it] {'loss': 0.444, 'grad_norm': 6.5667330861720234, 'learning_rate': 3.7800619722069464e-06, 'epoch': 0.35} 35%|███▍ | 4298/12313 [3:13:13<5:40:30, 2.55s/it] 35%|███▍ | 4299/12313 [3:13:16<5:37:17, 2.53s/it] {'loss': 0.4569, 'grad_norm': 6.393075158086913, 'learning_rate': 3.7794970497651877e-06, 'epoch': 0.35} 35%|███▍ | 4299/12313 [3:13:16<5:37:17, 2.53s/it] 35%|███▍ | 4300/12313 [3:13:18<5:47:53, 2.60s/it] {'loss': 0.6791, 'grad_norm': 7.173171177630671, 'learning_rate': 3.7789320387890056e-06, 'epoch': 0.35} 35%|███▍ | 4300/12313 [3:13:18<5:47:53, 2.60s/it] 35%|███▍ | 4301/12313 [3:13:21<5:47:39, 2.60s/it] {'loss': 0.5986, 'grad_norm': 4.165736108586291, 'learning_rate': 3.778366939317494e-06, 'epoch': 0.35} 35%|███▍ | 4301/12313 [3:13:21<5:47:39, 2.60s/it] 35%|███▍ | 4302/12313 [3:13:24<5:47:26, 2.60s/it] {'loss': 0.5417, 'grad_norm': 3.349799899482817, 'learning_rate': 3.777801751389757e-06, 'epoch': 0.35} 35%|███▍ | 4302/12313 [3:13:24<5:47:26, 2.60s/it] 35%|███▍ | 4303/12313 [3:13:26<5:47:51, 2.61s/it] {'loss': 0.7214, 'grad_norm': 5.017370500577692, 'learning_rate': 3.7772364750449002e-06, 'epoch': 0.35} 35%|███▍ | 4303/12313 [3:13:26<5:47:51, 2.61s/it] 35%|███▍ | 4304/12313 [3:13:29<5:58:17, 2.68s/it] {'loss': 0.4845, 'grad_norm': 6.384615890109909, 'learning_rate': 3.77667111032204e-06, 'epoch': 0.35} 35%|███▍ | 4304/12313 [3:13:29<5:58:17, 2.68s/it] 35%|███▍ | 4305/12313 [3:13:32<5:55:57, 2.67s/it] {'loss': 0.5029, 'grad_norm': 5.139067733395603, 'learning_rate': 3.776105657260295e-06, 'epoch': 0.35} 35%|███▍ | 4305/12313 [3:13:32<5:55:57, 2.67s/it] 35%|███▍ | 4306/12313 [3:13:34<5:51:28, 2.63s/it] {'loss': 0.6012, 'grad_norm': 30.243450603041254, 'learning_rate': 3.7755401158987926e-06, 'epoch': 0.35} 35%|███▍ | 4306/12313 [3:13:34<5:51:28, 2.63s/it] 35%|███▍ | 4307/12313 [3:13:37<5:45:32, 2.59s/it] {'loss': 0.4776, 'grad_norm': 8.484420160765751, 'learning_rate': 3.774974486276664e-06, 'epoch': 0.35} 35%|███▍ | 4307/12313 [3:13:37<5:45:32, 2.59s/it] 35%|███▍ | 4308/12313 [3:13:39<5:46:37, 2.60s/it] {'loss': 0.4172, 'grad_norm': 6.935408912143071, 'learning_rate': 3.77440876843305e-06, 'epoch': 0.35} 35%|███▍ | 4308/12313 [3:13:39<5:46:37, 2.60s/it] 35%|███▍ | 4309/12313 [3:13:42<5:51:55, 2.64s/it] {'loss': 0.7109, 'grad_norm': 4.024333397155922, 'learning_rate': 3.773842962407093e-06, 'epoch': 0.35} 35%|███▍ | 4309/12313 [3:13:42<5:51:55, 2.64s/it] 35%|███▌ | 4310/12313 [3:13:45<5:50:13, 2.63s/it] {'loss': 0.5926, 'grad_norm': 4.11100179002336, 'learning_rate': 3.773277068237945e-06, 'epoch': 0.35} 35%|███▌ | 4310/12313 [3:13:45<5:50:13, 2.63s/it] 35%|███▌ | 4311/12313 [3:13:47<5:52:12, 2.64s/it] {'loss': 0.6069, 'grad_norm': 4.947318220787382, 'learning_rate': 3.7727110859647627e-06, 'epoch': 0.35} 35%|███▌ | 4311/12313 [3:13:47<5:52:12, 2.64s/it] 35%|███▌ | 4312/12313 [3:13:50<6:02:22, 2.72s/it] {'loss': 0.5277, 'grad_norm': 3.6871411628254696, 'learning_rate': 3.772145015626709e-06, 'epoch': 0.35} 35%|███▌ | 4312/12313 [3:13:50<6:02:22, 2.72s/it] 35%|███▌ | 4313/12313 [3:13:53<5:59:41, 2.70s/it] {'loss': 0.4547, 'grad_norm': 10.305081134224809, 'learning_rate': 3.771578857262953e-06, 'epoch': 0.35} 35%|███▌ | 4313/12313 [3:13:53<5:59:41, 2.70s/it] 35%|███▌ | 4314/12313 [3:13:56<6:00:10, 2.70s/it] {'loss': 0.6503, 'grad_norm': 4.429330825416072, 'learning_rate': 3.771012610912669e-06, 'epoch': 0.35} 35%|███▌ | 4314/12313 [3:13:56<6:00:10, 2.70s/it] 35%|███▌ | 4315/12313 [3:13:58<5:58:46, 2.69s/it] {'loss': 0.5715, 'grad_norm': 8.456969721413778, 'learning_rate': 3.7704462766150396e-06, 'epoch': 0.35} 35%|███▌ | 4315/12313 [3:13:58<5:58:46, 2.69s/it] 35%|███▌ | 4316/12313 [3:14:01<5:52:06, 2.64s/it] {'loss': 0.3989, 'grad_norm': 5.543349480716538, 'learning_rate': 3.7698798544092525e-06, 'epoch': 0.35} 35%|███▌ | 4316/12313 [3:14:01<5:52:06, 2.64s/it] 35%|███▌ | 4317/12313 [3:14:03<5:48:35, 2.62s/it] {'loss': 0.7712, 'grad_norm': 4.0226607503884315, 'learning_rate': 3.7693133443344986e-06, 'epoch': 0.35} 35%|███▌ | 4317/12313 [3:14:03<5:48:35, 2.62s/it] 35%|███▌ | 4318/12313 [3:14:06<5:45:55, 2.60s/it] {'loss': 0.5677, 'grad_norm': 3.3414221408894798, 'learning_rate': 3.7687467464299797e-06, 'epoch': 0.35} 35%|███▌ | 4318/12313 [3:14:06<5:45:55, 2.60s/it] 35%|███▌ | 4319/12313 [3:14:09<5:58:07, 2.69s/it] {'loss': 0.5779, 'grad_norm': 3.3587069368493263, 'learning_rate': 3.7681800607349017e-06, 'epoch': 0.35} 35%|███▌ | 4319/12313 [3:14:09<5:58:07, 2.69s/it] 35%|███▌ | 4320/12313 [3:14:12<6:00:35, 2.71s/it] {'loss': 0.5286, 'grad_norm': 7.359687891817828, 'learning_rate': 3.767613287288474e-06, 'epoch': 0.35} 35%|███▌ | 4320/12313 [3:14:12<6:00:35, 2.71s/it] 35%|███▌ | 4321/12313 [3:14:14<5:55:29, 2.67s/it] {'loss': 0.6514, 'grad_norm': 4.5580897772433095, 'learning_rate': 3.767046426129917e-06, 'epoch': 0.35} 35%|███▌ | 4321/12313 [3:14:14<5:55:29, 2.67s/it] 35%|███▌ | 4322/12313 [3:14:17<6:03:56, 2.73s/it] {'loss': 0.4803, 'grad_norm': 6.868236388509833, 'learning_rate': 3.7664794772984515e-06, 'epoch': 0.35} 35%|███▌ | 4322/12313 [3:14:17<6:03:56, 2.73s/it] 35%|███▌ | 4323/12313 [3:14:20<6:04:46, 2.74s/it] {'loss': 0.4627, 'grad_norm': 4.0864068693665985, 'learning_rate': 3.7659124408333094e-06, 'epoch': 0.35} 35%|███▌ | 4323/12313 [3:14:20<6:04:46, 2.74s/it] 35%|███▌ | 4324/12313 [3:14:22<5:54:46, 2.66s/it] {'loss': 0.6832, 'grad_norm': 4.220792384857057, 'learning_rate': 3.7653453167737263e-06, 'epoch': 0.35} 35%|███▌ | 4324/12313 [3:14:22<5:54:46, 2.66s/it] 35%|███▌ | 4325/12313 [3:14:25<5:52:07, 2.64s/it] {'loss': 0.7653, 'grad_norm': 6.391965671992134, 'learning_rate': 3.7647781051589436e-06, 'epoch': 0.35} 35%|███▌ | 4325/12313 [3:14:25<5:52:07, 2.64s/it] 35%|███▌ | 4326/12313 [3:14:28<5:53:45, 2.66s/it] {'loss': 0.4652, 'grad_norm': 6.3932460863379905, 'learning_rate': 3.76421080602821e-06, 'epoch': 0.35} 35%|███▌ | 4326/12313 [3:14:28<5:53:45, 2.66s/it] 35%|███▌ | 4327/12313 [3:14:30<5:56:42, 2.68s/it] {'loss': 0.6826, 'grad_norm': 5.223581522055953, 'learning_rate': 3.76364341942078e-06, 'epoch': 0.35} 35%|███▌ | 4327/12313 [3:14:30<5:56:42, 2.68s/it] 35%|███▌ | 4328/12313 [3:14:33<5:55:21, 2.67s/it] {'loss': 0.4624, 'grad_norm': 6.233483510000385, 'learning_rate': 3.7630759453759123e-06, 'epoch': 0.35} 35%|███▌ | 4328/12313 [3:14:33<5:55:21, 2.67s/it] 35%|███▌ | 4329/12313 [3:14:35<5:50:32, 2.63s/it] {'loss': 0.4736, 'grad_norm': 4.989931764108342, 'learning_rate': 3.7625083839328747e-06, 'epoch': 0.35} 35%|███▌ | 4329/12313 [3:14:35<5:50:32, 2.63s/it] 35%|███▌ | 4330/12313 [3:14:38<5:47:15, 2.61s/it] {'loss': 0.4673, 'grad_norm': 3.2608228815714755, 'learning_rate': 3.7619407351309377e-06, 'epoch': 0.35} 35%|███▌ | 4330/12313 [3:14:38<5:47:15, 2.61s/it] 35%|███▌ | 4331/12313 [3:14:41<5:49:38, 2.63s/it] {'loss': 0.5807, 'grad_norm': 4.925326222217201, 'learning_rate': 3.761372999009381e-06, 'epoch': 0.35} 35%|███▌ | 4331/12313 [3:14:41<5:49:38, 2.63s/it] 35%|███▌ | 4332/12313 [3:14:43<5:44:17, 2.59s/it] {'loss': 0.4816, 'grad_norm': 3.6058579203829844, 'learning_rate': 3.7608051756074894e-06, 'epoch': 0.35} 35%|███▌ | 4332/12313 [3:14:43<5:44:17, 2.59s/it] 35%|███▌ | 4333/12313 [3:14:46<5:43:19, 2.58s/it] {'loss': 0.6296, 'grad_norm': 5.412076104516127, 'learning_rate': 3.7602372649645512e-06, 'epoch': 0.35} 35%|███▌ | 4333/12313 [3:14:46<5:43:19, 2.58s/it] 35%|███▌ | 4334/12313 [3:14:48<5:49:35, 2.63s/it] {'loss': 0.5238, 'grad_norm': 4.984953620322977, 'learning_rate': 3.759669267119864e-06, 'epoch': 0.35} 35%|███▌ | 4334/12313 [3:14:48<5:49:35, 2.63s/it] 35%|███▌ | 4335/12313 [3:14:51<5:40:49, 2.56s/it] {'loss': 0.6843, 'grad_norm': 3.348233179207022, 'learning_rate': 3.759101182112731e-06, 'epoch': 0.35} 35%|███▌ | 4335/12313 [3:14:51<5:40:49, 2.56s/it] 35%|███▌ | 4336/12313 [3:14:53<5:38:07, 2.54s/it] {'loss': 0.4943, 'grad_norm': 3.9680570346308204, 'learning_rate': 3.758533009982459e-06, 'epoch': 0.35} 35%|███▌ | 4336/12313 [3:14:53<5:38:07, 2.54s/it] 35%|███▌ | 4337/12313 [3:14:56<5:32:33, 2.50s/it] {'loss': 0.4964, 'grad_norm': 7.538460902230132, 'learning_rate': 3.7579647507683636e-06, 'epoch': 0.35} 35%|███▌ | 4337/12313 [3:14:56<5:32:33, 2.50s/it] 35%|███▌ | 4338/12313 [3:14:59<5:46:34, 2.61s/it] {'loss': 0.6186, 'grad_norm': 4.840273083850319, 'learning_rate': 3.7573964045097655e-06, 'epoch': 0.35} 35%|███▌ | 4338/12313 [3:14:59<5:46:34, 2.61s/it] 35%|███▌ | 4339/12313 [3:15:02<6:13:52, 2.81s/it] {'loss': 0.4687, 'grad_norm': 4.652079301299005, 'learning_rate': 3.7568279712459908e-06, 'epoch': 0.35} 35%|███▌ | 4339/12313 [3:15:02<6:13:52, 2.81s/it] 35%|███▌ | 4340/12313 [3:15:05<6:14:31, 2.82s/it] {'loss': 0.6544, 'grad_norm': 4.542830944823784, 'learning_rate': 3.7562594510163718e-06, 'epoch': 0.35} 35%|███▌ | 4340/12313 [3:15:05<6:14:31, 2.82s/it] 35%|███▌ | 4341/12313 [3:15:07<6:05:17, 2.75s/it] {'loss': 0.598, 'grad_norm': 5.562323687848427, 'learning_rate': 3.755690843860248e-06, 'epoch': 0.35} 35%|███▌ | 4341/12313 [3:15:07<6:05:17, 2.75s/it] 35%|███▌ | 4342/12313 [3:15:10<6:21:45, 2.87s/it] {'loss': 0.46, 'grad_norm': 3.056102850388366, 'learning_rate': 3.7551221498169633e-06, 'epoch': 0.35} 35%|███▌ | 4342/12313 [3:15:10<6:21:45, 2.87s/it] 35%|███▌ | 4343/12313 [3:15:13<6:04:51, 2.75s/it] {'loss': 0.5024, 'grad_norm': 6.370494566318182, 'learning_rate': 3.7545533689258683e-06, 'epoch': 0.35} 35%|███▌ | 4343/12313 [3:15:13<6:04:51, 2.75s/it] 35%|███▌ | 4344/12313 [3:15:16<6:05:05, 2.75s/it] {'loss': 0.4952, 'grad_norm': 3.674196156661001, 'learning_rate': 3.75398450122632e-06, 'epoch': 0.35} 35%|███▌ | 4344/12313 [3:15:16<6:05:05, 2.75s/it] 35%|███▌ | 4345/12313 [3:15:19<6:10:17, 2.79s/it] {'loss': 0.545, 'grad_norm': 5.618292271167161, 'learning_rate': 3.7534155467576805e-06, 'epoch': 0.35} 35%|███▌ | 4345/12313 [3:15:19<6:10:17, 2.79s/it] 35%|███▌ | 4346/12313 [3:15:21<6:05:09, 2.75s/it] {'loss': 0.7136, 'grad_norm': 7.155057050102085, 'learning_rate': 3.7528465055593186e-06, 'epoch': 0.35} 35%|███▌ | 4346/12313 [3:15:21<6:05:09, 2.75s/it] 35%|███▌ | 4347/12313 [3:15:24<6:02:41, 2.73s/it] {'loss': 0.5152, 'grad_norm': 4.835019495981251, 'learning_rate': 3.75227737767061e-06, 'epoch': 0.35} 35%|███▌ | 4347/12313 [3:15:24<6:02:41, 2.73s/it] 35%|███▌ | 4348/12313 [3:15:27<6:08:12, 2.77s/it] {'loss': 0.5799, 'grad_norm': 6.80042426573597, 'learning_rate': 3.7517081631309336e-06, 'epoch': 0.35} 35%|███▌ | 4348/12313 [3:15:27<6:08:12, 2.77s/it] 35%|███▌ | 4349/12313 [3:15:29<5:58:02, 2.70s/it] {'loss': 0.5641, 'grad_norm': 7.706095450683844, 'learning_rate': 3.751138861979678e-06, 'epoch': 0.35} 35%|███▌ | 4349/12313 [3:15:29<5:58:02, 2.70s/it] 35%|███▌ | 4350/12313 [3:15:32<6:06:08, 2.76s/it] {'loss': 0.5249, 'grad_norm': 8.667289733551144, 'learning_rate': 3.750569474256233e-06, 'epoch': 0.35} 35%|███▌ | 4350/12313 [3:15:32<6:06:08, 2.76s/it] 35%|███▌ | 4351/12313 [3:15:35<5:59:54, 2.71s/it] {'loss': 0.4826, 'grad_norm': 4.311680279675531, 'learning_rate': 3.7500000000000005e-06, 'epoch': 0.35} 35%|███▌ | 4351/12313 [3:15:35<5:59:54, 2.71s/it] 35%|███▌ | 4352/12313 [3:15:38<6:14:20, 2.82s/it] {'loss': 0.5173, 'grad_norm': 5.148268090014955, 'learning_rate': 3.7494304392503826e-06, 'epoch': 0.35} 35%|███▌ | 4352/12313 [3:15:38<6:14:20, 2.82s/it] 35%|███▌ | 4353/12313 [3:15:41<6:10:39, 2.79s/it] {'loss': 0.7347, 'grad_norm': 5.2133873799522705, 'learning_rate': 3.7488607920467912e-06, 'epoch': 0.35} 35%|███▌ | 4353/12313 [3:15:41<6:10:39, 2.79s/it] 35%|███▌ | 4354/12313 [3:15:43<5:55:27, 2.68s/it] {'loss': 0.4798, 'grad_norm': 5.251842412363714, 'learning_rate': 3.7482910584286424e-06, 'epoch': 0.35} 35%|███▌ | 4354/12313 [3:15:43<5:55:27, 2.68s/it] 35%|███▌ | 4355/12313 [3:15:46<5:56:57, 2.69s/it] {'loss': 0.6538, 'grad_norm': 4.4703093132843, 'learning_rate': 3.747721238435359e-06, 'epoch': 0.35} 35%|███▌ | 4355/12313 [3:15:46<5:56:57, 2.69s/it] 35%|███▌ | 4356/12313 [3:15:49<6:11:05, 2.80s/it] {'loss': 0.487, 'grad_norm': 4.814603055251707, 'learning_rate': 3.747151332106369e-06, 'epoch': 0.35} 35%|███▌ | 4356/12313 [3:15:49<6:11:05, 2.80s/it] 35%|███▌ | 4357/12313 [3:15:51<5:59:51, 2.71s/it] {'loss': 0.5932, 'grad_norm': 5.542109813139826, 'learning_rate': 3.746581339481108e-06, 'epoch': 0.35} 35%|███▌ | 4357/12313 [3:15:51<5:59:51, 2.71s/it] 35%|███▌ | 4358/12313 [3:15:54<6:00:54, 2.72s/it] {'loss': 0.605, 'grad_norm': 5.541416801300144, 'learning_rate': 3.746011260599015e-06, 'epoch': 0.35} 35%|███▌ | 4358/12313 [3:15:54<6:00:54, 2.72s/it] 35%|███▌ | 4359/12313 [3:15:57<5:59:45, 2.71s/it] {'loss': 0.4671, 'grad_norm': 4.197242421152716, 'learning_rate': 3.7454410954995375e-06, 'epoch': 0.35} 35%|███▌ | 4359/12313 [3:15:57<5:59:45, 2.71s/it] 35%|███▌ | 4360/12313 [3:15:59<5:58:26, 2.70s/it] {'loss': 0.5889, 'grad_norm': 6.220911708551648, 'learning_rate': 3.7448708442221277e-06, 'epoch': 0.35} 35%|███▌ | 4360/12313 [3:15:59<5:58:26, 2.70s/it] 35%|███▌ | 4361/12313 [3:16:02<6:03:20, 2.74s/it] {'loss': 0.5731, 'grad_norm': 4.456428745711773, 'learning_rate': 3.744300506806243e-06, 'epoch': 0.35} 35%|███▌ | 4361/12313 [3:16:02<6:03:20, 2.74s/it] 35%|███▌ | 4362/12313 [3:16:05<6:00:55, 2.72s/it] {'loss': 0.4561, 'grad_norm': 4.103784001652695, 'learning_rate': 3.7437300832913503e-06, 'epoch': 0.35} 35%|███▌ | 4362/12313 [3:16:05<6:00:55, 2.72s/it] 35%|███▌ | 4363/12313 [3:16:08<5:56:43, 2.69s/it] {'loss': 0.6277, 'grad_norm': 16.445100012740834, 'learning_rate': 3.743159573716917e-06, 'epoch': 0.35} 35%|███▌ | 4363/12313 [3:16:08<5:56:43, 2.69s/it] 35%|███▌ | 4364/12313 [3:16:10<5:55:07, 2.68s/it] {'loss': 0.5266, 'grad_norm': 4.616837308526195, 'learning_rate': 3.7425889781224204e-06, 'epoch': 0.35} 35%|███▌ | 4364/12313 [3:16:10<5:55:07, 2.68s/it] 35%|███▌ | 4365/12313 [3:16:13<5:54:41, 2.68s/it] {'loss': 0.5105, 'grad_norm': 8.670573299062688, 'learning_rate': 3.742018296547344e-06, 'epoch': 0.35} 35%|███▌ | 4365/12313 [3:16:13<5:54:41, 2.68s/it] 35%|███▌ | 4366/12313 [3:16:16<6:08:13, 2.78s/it] {'loss': 0.7134, 'grad_norm': 6.449318189960221, 'learning_rate': 3.741447529031173e-06, 'epoch': 0.35} 35%|███▌ | 4366/12313 [3:16:16<6:08:13, 2.78s/it] 35%|███▌ | 4367/12313 [3:16:18<5:59:31, 2.71s/it] {'loss': 0.5255, 'grad_norm': 4.211623063096266, 'learning_rate': 3.7408766756134046e-06, 'epoch': 0.35} 35%|███▌ | 4367/12313 [3:16:18<5:59:31, 2.71s/it] 35%|███▌ | 4368/12313 [3:16:21<6:00:22, 2.72s/it] {'loss': 0.6893, 'grad_norm': 4.907991529043284, 'learning_rate': 3.740305736333537e-06, 'epoch': 0.35} 35%|███▌ | 4368/12313 [3:16:21<6:00:22, 2.72s/it] 35%|███▌ | 4369/12313 [3:16:24<5:49:50, 2.64s/it] {'loss': 0.6383, 'grad_norm': 4.845747717269982, 'learning_rate': 3.7397347112310767e-06, 'epoch': 0.35} 35%|███▌ | 4369/12313 [3:16:24<5:49:50, 2.64s/it] 35%|███▌ | 4370/12313 [3:16:26<5:50:10, 2.65s/it] {'loss': 0.4795, 'grad_norm': 4.148877926596722, 'learning_rate': 3.7391636003455355e-06, 'epoch': 0.35} 35%|███▌ | 4370/12313 [3:16:26<5:50:10, 2.65s/it] 35%|███▌ | 4371/12313 [3:16:29<5:51:56, 2.66s/it] {'loss': 0.5019, 'grad_norm': 5.568518592335486, 'learning_rate': 3.7385924037164316e-06, 'epoch': 0.35} 35%|███▌ | 4371/12313 [3:16:29<5:51:56, 2.66s/it] 36%|███▌ | 4372/12313 [3:16:32<5:52:31, 2.66s/it] {'loss': 0.4622, 'grad_norm': 11.380910354185877, 'learning_rate': 3.7380211213832882e-06, 'epoch': 0.36} 36%|███▌ | 4372/12313 [3:16:32<5:52:31, 2.66s/it] 36%|███▌ | 4373/12313 [3:16:34<5:50:06, 2.65s/it] {'loss': 0.5913, 'grad_norm': 4.898875896209035, 'learning_rate': 3.737449753385636e-06, 'epoch': 0.36} 36%|███▌ | 4373/12313 [3:16:34<5:50:06, 2.65s/it] 36%|███▌ | 4374/12313 [3:16:37<5:56:31, 2.69s/it] {'loss': 0.7925, 'grad_norm': 3.7018730566328433, 'learning_rate': 3.7368782997630093e-06, 'epoch': 0.36} 36%|███▌ | 4374/12313 [3:16:37<5:56:31, 2.69s/it] 36%|███▌ | 4375/12313 [3:16:40<5:56:30, 2.69s/it] {'loss': 0.5056, 'grad_norm': 3.8066706472807565, 'learning_rate': 3.7363067605549515e-06, 'epoch': 0.36} 36%|███▌ | 4375/12313 [3:16:40<5:56:30, 2.69s/it] 36%|███▌ | 4376/12313 [3:16:43<5:59:09, 2.72s/it] {'loss': 0.5972, 'grad_norm': 2.709504723389777, 'learning_rate': 3.7357351358010075e-06, 'epoch': 0.36} 36%|███▌ | 4376/12313 [3:16:43<5:59:09, 2.72s/it] 36%|███▌ | 4377/12313 [3:16:45<5:49:36, 2.64s/it] {'loss': 0.6907, 'grad_norm': 4.824489245992463, 'learning_rate': 3.735163425540732e-06, 'epoch': 0.36} 36%|███▌ | 4377/12313 [3:16:45<5:49:36, 2.64s/it] 36%|███▌ | 4378/12313 [3:16:48<5:53:28, 2.67s/it] {'loss': 0.5539, 'grad_norm': 7.943391610774761, 'learning_rate': 3.734591629813686e-06, 'epoch': 0.36} 36%|███▌ | 4378/12313 [3:16:48<5:53:28, 2.67s/it] 36%|███▌ | 4379/12313 [3:16:50<5:50:08, 2.65s/it] {'loss': 0.5431, 'grad_norm': 4.381215594871206, 'learning_rate': 3.7340197486594315e-06, 'epoch': 0.36} 36%|███▌ | 4379/12313 [3:16:50<5:50:08, 2.65s/it] 36%|███▌ | 4380/12313 [3:16:53<5:56:31, 2.70s/it] {'loss': 0.6081, 'grad_norm': 4.946575395839248, 'learning_rate': 3.7334477821175424e-06, 'epoch': 0.36} 36%|███▌ | 4380/12313 [3:16:53<5:56:31, 2.70s/it] 36%|███▌ | 4381/12313 [3:16:56<5:51:20, 2.66s/it] {'loss': 0.4524, 'grad_norm': 4.52740218848989, 'learning_rate': 3.732875730227595e-06, 'epoch': 0.36} 36%|███▌ | 4381/12313 [3:16:56<5:51:20, 2.66s/it] 36%|███▌ | 4382/12313 [3:16:58<5:46:00, 2.62s/it] {'loss': 0.5258, 'grad_norm': 6.069550060126085, 'learning_rate': 3.7323035930291706e-06, 'epoch': 0.36} 36%|███▌ | 4382/12313 [3:16:58<5:46:00, 2.62s/it] 36%|███▌ | 4383/12313 [3:17:01<5:57:12, 2.70s/it] {'loss': 0.8438, 'grad_norm': 5.631666918507212, 'learning_rate': 3.731731370561861e-06, 'epoch': 0.36} 36%|███▌ | 4383/12313 [3:17:01<5:57:12, 2.70s/it] 36%|███▌ | 4384/12313 [3:17:04<5:58:03, 2.71s/it] {'loss': 0.6436, 'grad_norm': 8.095060386123581, 'learning_rate': 3.7311590628652584e-06, 'epoch': 0.36} 36%|███▌ | 4384/12313 [3:17:04<5:58:03, 2.71s/it] 36%|███▌ | 4385/12313 [3:17:07<5:55:53, 2.69s/it] {'loss': 0.5972, 'grad_norm': 5.278732076516886, 'learning_rate': 3.730586669978965e-06, 'epoch': 0.36} 36%|███▌ | 4385/12313 [3:17:07<5:55:53, 2.69s/it] 36%|███▌ | 4386/12313 [3:17:10<6:07:21, 2.78s/it] {'loss': 0.4352, 'grad_norm': 6.566634009856689, 'learning_rate': 3.7300141919425865e-06, 'epoch': 0.36} 36%|███▌ | 4386/12313 [3:17:10<6:07:21, 2.78s/it] 36%|███▌ | 4387/12313 [3:17:12<6:04:37, 2.76s/it] {'loss': 0.4451, 'grad_norm': 4.301691821909096, 'learning_rate': 3.729441628795736e-06, 'epoch': 0.36} 36%|███▌ | 4387/12313 [3:17:12<6:04:37, 2.76s/it] 36%|███▌ | 4388/12313 [3:17:15<6:03:18, 2.75s/it] {'loss': 0.4408, 'grad_norm': 3.538764460520504, 'learning_rate': 3.728868980578031e-06, 'epoch': 0.36} 36%|███▌ | 4388/12313 [3:17:15<6:03:18, 2.75s/it] 36%|███▌ | 4389/12313 [3:17:17<5:47:54, 2.63s/it] {'loss': 0.6824, 'grad_norm': 8.009167633892602, 'learning_rate': 3.7282962473290964e-06, 'epoch': 0.36} 36%|███▌ | 4389/12313 [3:17:17<5:47:54, 2.63s/it] 36%|███▌ | 4390/12313 [3:17:20<5:47:34, 2.63s/it] {'loss': 0.6465, 'grad_norm': 3.6954739187392414, 'learning_rate': 3.727723429088562e-06, 'epoch': 0.36} 36%|███▌ | 4390/12313 [3:17:20<5:47:34, 2.63s/it] 36%|███▌ | 4391/12313 [3:17:22<5:42:41, 2.60s/it] {'loss': 0.3879, 'grad_norm': 5.835592705497657, 'learning_rate': 3.7271505258960644e-06, 'epoch': 0.36} 36%|███▌ | 4391/12313 [3:17:22<5:42:41, 2.60s/it] 36%|███▌ | 4392/12313 [3:17:25<5:35:32, 2.54s/it] {'loss': 0.6456, 'grad_norm': 4.621090146943028, 'learning_rate': 3.726577537791245e-06, 'epoch': 0.36} 36%|███▌ | 4392/12313 [3:17:25<5:35:32, 2.54s/it] 36%|███▌ | 4393/12313 [3:17:27<5:34:27, 2.53s/it] {'loss': 0.5299, 'grad_norm': 7.345583344323817, 'learning_rate': 3.726004464813752e-06, 'epoch': 0.36} 36%|███▌ | 4393/12313 [3:17:27<5:34:27, 2.53s/it] 36%|███▌ | 4394/12313 [3:17:30<5:30:00, 2.50s/it] {'loss': 0.4683, 'grad_norm': 5.887510819023688, 'learning_rate': 3.725431307003238e-06, 'epoch': 0.36} 36%|███▌ | 4394/12313 [3:17:30<5:30:00, 2.50s/it] 36%|███▌ | 4395/12313 [3:17:33<5:42:29, 2.60s/it] {'loss': 0.5324, 'grad_norm': 4.4086817353864785, 'learning_rate': 3.7248580643993625e-06, 'epoch': 0.36} 36%|███▌ | 4395/12313 [3:17:33<5:42:29, 2.60s/it] 36%|███▌ | 4396/12313 [3:17:35<5:35:22, 2.54s/it] {'loss': 0.5235, 'grad_norm': 5.875362001143908, 'learning_rate': 3.724284737041792e-06, 'epoch': 0.36} 36%|███▌ | 4396/12313 [3:17:35<5:35:22, 2.54s/it] 36%|███▌ | 4397/12313 [3:17:38<5:42:20, 2.59s/it] {'loss': 0.4776, 'grad_norm': 4.84990196629755, 'learning_rate': 3.723711324970197e-06, 'epoch': 0.36} 36%|███▌ | 4397/12313 [3:17:38<5:42:20, 2.59s/it] 36%|███▌ | 4398/12313 [3:17:41<5:56:30, 2.70s/it] {'loss': 0.5653, 'grad_norm': 4.446080183771416, 'learning_rate': 3.723137828224255e-06, 'epoch': 0.36} 36%|███▌ | 4398/12313 [3:17:41<5:56:30, 2.70s/it] 36%|███▌ | 4399/12313 [3:17:43<5:47:02, 2.63s/it] {'loss': 0.6452, 'grad_norm': 11.963154592877663, 'learning_rate': 3.722564246843648e-06, 'epoch': 0.36} 36%|███▌ | 4399/12313 [3:17:43<5:47:02, 2.63s/it] 36%|███▌ | 4400/12313 [3:17:46<5:57:45, 2.71s/it] {'loss': 0.5563, 'grad_norm': 5.038523322391007, 'learning_rate': 3.7219905808680663e-06, 'epoch': 0.36} 36%|███▌ | 4400/12313 [3:17:46<5:57:45, 2.71s/it] 36%|███▌ | 4401/12313 [3:17:49<5:54:29, 2.69s/it] {'loss': 0.4416, 'grad_norm': 5.775055041451304, 'learning_rate': 3.7214168303372033e-06, 'epoch': 0.36} 36%|███▌ | 4401/12313 [3:17:49<5:54:29, 2.69s/it] 36%|███▌ | 4402/12313 [3:17:51<5:48:33, 2.64s/it] {'loss': 0.6903, 'grad_norm': 13.159177334829804, 'learning_rate': 3.72084299529076e-06, 'epoch': 0.36} 36%|███▌ | 4402/12313 [3:17:51<5:48:33, 2.64s/it] 36%|███▌ | 4403/12313 [3:17:54<5:45:20, 2.62s/it] {'loss': 0.5353, 'grad_norm': 6.4873380102452325, 'learning_rate': 3.720269075768442e-06, 'epoch': 0.36} 36%|███▌ | 4403/12313 [3:17:54<5:45:20, 2.62s/it] 36%|███▌ | 4404/12313 [3:17:57<5:48:25, 2.64s/it] {'loss': 0.5224, 'grad_norm': 6.187953414055915, 'learning_rate': 3.7196950718099636e-06, 'epoch': 0.36} 36%|███▌ | 4404/12313 [3:17:57<5:48:25, 2.64s/it] 36%|███▌ | 4405/12313 [3:17:59<5:51:38, 2.67s/it] {'loss': 0.4981, 'grad_norm': 6.813848598601859, 'learning_rate': 3.71912098345504e-06, 'epoch': 0.36} 36%|███▌ | 4405/12313 [3:17:59<5:51:38, 2.67s/it] 36%|███▌ | 4406/12313 [3:18:02<5:48:47, 2.65s/it] {'loss': 0.6389, 'grad_norm': 5.500592742588713, 'learning_rate': 3.7185468107433966e-06, 'epoch': 0.36} 36%|███▌ | 4406/12313 [3:18:02<5:48:47, 2.65s/it] 36%|███▌ | 4407/12313 [3:18:05<5:51:51, 2.67s/it] {'loss': 0.6802, 'grad_norm': 4.764003153750818, 'learning_rate': 3.7179725537147638e-06, 'epoch': 0.36} 36%|███▌ | 4407/12313 [3:18:05<5:51:51, 2.67s/it] 36%|███▌ | 4408/12313 [3:18:07<5:45:14, 2.62s/it] {'loss': 0.6526, 'grad_norm': 23.629219188415377, 'learning_rate': 3.717398212408875e-06, 'epoch': 0.36} 36%|███▌ | 4408/12313 [3:18:07<5:45:14, 2.62s/it] 36%|███▌ | 4409/12313 [3:18:10<5:43:14, 2.61s/it] {'loss': 0.4644, 'grad_norm': 6.635582406761952, 'learning_rate': 3.716823786865474e-06, 'epoch': 0.36} 36%|███▌ | 4409/12313 [3:18:10<5:43:14, 2.61s/it] 36%|███▌ | 4410/12313 [3:18:12<5:52:05, 2.67s/it] {'loss': 0.5585, 'grad_norm': 5.158135221215257, 'learning_rate': 3.7162492771243068e-06, 'epoch': 0.36} 36%|███▌ | 4410/12313 [3:18:12<5:52:05, 2.67s/it] 36%|███▌ | 4411/12313 [3:18:15<5:55:52, 2.70s/it] {'loss': 0.6006, 'grad_norm': 3.8038224782647925, 'learning_rate': 3.7156746832251266e-06, 'epoch': 0.36} 36%|███▌ | 4411/12313 [3:18:15<5:55:52, 2.70s/it] 36%|███▌ | 4412/12313 [3:18:18<5:58:25, 2.72s/it] {'loss': 0.5972, 'grad_norm': 3.935552912541937, 'learning_rate': 3.7151000052076913e-06, 'epoch': 0.36} 36%|███▌ | 4412/12313 [3:18:18<5:58:25, 2.72s/it] 36%|███▌ | 4413/12313 [3:18:21<5:56:51, 2.71s/it] {'loss': 0.5808, 'grad_norm': 5.2873448761373885, 'learning_rate': 3.7145252431117672e-06, 'epoch': 0.36} 36%|███▌ | 4413/12313 [3:18:21<5:56:51, 2.71s/it] 36%|███▌ | 4414/12313 [3:18:23<5:51:53, 2.67s/it] {'loss': 0.7058, 'grad_norm': 7.369629869130491, 'learning_rate': 3.713950396977124e-06, 'epoch': 0.36} 36%|███▌ | 4414/12313 [3:18:23<5:51:53, 2.67s/it] 36%|███▌ | 4415/12313 [3:18:26<5:50:12, 2.66s/it] {'loss': 0.8023, 'grad_norm': 5.201267760709762, 'learning_rate': 3.7133754668435377e-06, 'epoch': 0.36} 36%|███▌ | 4415/12313 [3:18:26<5:50:12, 2.66s/it] 36%|███▌ | 4416/12313 [3:18:28<5:44:04, 2.61s/it] {'loss': 0.596, 'grad_norm': 3.1998164797424598, 'learning_rate': 3.7128004527507916e-06, 'epoch': 0.36} 36%|███▌ | 4416/12313 [3:18:28<5:44:04, 2.61s/it] 36%|███▌ | 4417/12313 [3:18:31<5:38:59, 2.58s/it] {'loss': 0.6137, 'grad_norm': 5.052174245450653, 'learning_rate': 3.712225354738672e-06, 'epoch': 0.36} 36%|███▌ | 4417/12313 [3:18:31<5:38:59, 2.58s/it] 36%|███▌ | 4418/12313 [3:18:34<5:50:57, 2.67s/it] {'loss': 0.5863, 'grad_norm': 4.804582850754989, 'learning_rate': 3.7116501728469746e-06, 'epoch': 0.36} 36%|███▌ | 4418/12313 [3:18:34<5:50:57, 2.67s/it] 36%|███▌ | 4419/12313 [3:18:36<5:48:14, 2.65s/it] {'loss': 0.5453, 'grad_norm': 4.754942942838345, 'learning_rate': 3.711074907115497e-06, 'epoch': 0.36} 36%|███▌ | 4419/12313 [3:18:36<5:48:14, 2.65s/it] 36%|███▌ | 4420/12313 [3:18:39<5:42:54, 2.61s/it] {'loss': 0.6604, 'grad_norm': 4.182528054308997, 'learning_rate': 3.710499557584045e-06, 'epoch': 0.36} 36%|███▌ | 4420/12313 [3:18:39<5:42:54, 2.61s/it] 36%|███▌ | 4421/12313 [3:18:42<5:46:30, 2.63s/it] {'loss': 0.6099, 'grad_norm': 5.430420823976959, 'learning_rate': 3.7099241242924306e-06, 'epoch': 0.36} 36%|███▌ | 4421/12313 [3:18:42<5:46:30, 2.63s/it] 36%|███▌ | 4422/12313 [3:18:45<5:57:54, 2.72s/it] {'loss': 0.5275, 'grad_norm': 6.298870190762428, 'learning_rate': 3.7093486072804696e-06, 'epoch': 0.36} 36%|███▌ | 4422/12313 [3:18:45<5:57:54, 2.72s/it] 36%|███▌ | 4423/12313 [3:18:47<5:51:33, 2.67s/it] {'loss': 0.7317, 'grad_norm': 4.826705123382431, 'learning_rate': 3.7087730065879862e-06, 'epoch': 0.36} 36%|███▌ | 4423/12313 [3:18:47<5:51:33, 2.67s/it] 36%|███▌ | 4424/12313 [3:18:50<5:50:17, 2.66s/it] {'loss': 0.6044, 'grad_norm': 13.171633048916624, 'learning_rate': 3.708197322254807e-06, 'epoch': 0.36} 36%|███▌ | 4424/12313 [3:18:50<5:50:17, 2.66s/it] 36%|███▌ | 4425/12313 [3:18:53<5:56:47, 2.71s/it] {'loss': 0.6298, 'grad_norm': 4.774839704277342, 'learning_rate': 3.7076215543207688e-06, 'epoch': 0.36} 36%|███▌ | 4425/12313 [3:18:53<5:56:47, 2.71s/it] 36%|███▌ | 4426/12313 [3:18:55<5:50:43, 2.67s/it] {'loss': 0.4906, 'grad_norm': 7.708019365223568, 'learning_rate': 3.7070457028257095e-06, 'epoch': 0.36} 36%|███▌ | 4426/12313 [3:18:55<5:50:43, 2.67s/it] 36%|███▌ | 4427/12313 [3:18:58<5:52:09, 2.68s/it] {'loss': 0.459, 'grad_norm': 6.0983843136049805, 'learning_rate': 3.7064697678094765e-06, 'epoch': 0.36} 36%|███▌ | 4427/12313 [3:18:58<5:52:09, 2.68s/it] 36%|███▌ | 4428/12313 [3:19:00<5:50:54, 2.67s/it] {'loss': 0.553, 'grad_norm': 4.100219190551162, 'learning_rate': 3.7058937493119195e-06, 'epoch': 0.36} 36%|███▌ | 4428/12313 [3:19:00<5:50:54, 2.67s/it] 36%|███▌ | 4429/12313 [3:19:03<5:52:07, 2.68s/it] {'loss': 0.4724, 'grad_norm': 5.689077880893839, 'learning_rate': 3.705317647372898e-06, 'epoch': 0.36} 36%|███▌ | 4429/12313 [3:19:03<5:52:07, 2.68s/it] 36%|███▌ | 4430/12313 [3:19:06<5:45:41, 2.63s/it] {'loss': 0.4586, 'grad_norm': 3.858145204321467, 'learning_rate': 3.704741462032274e-06, 'epoch': 0.36} 36%|███▌ | 4430/12313 [3:19:06<5:45:41, 2.63s/it] 36%|███▌ | 4431/12313 [3:19:08<5:49:28, 2.66s/it] {'loss': 0.5293, 'grad_norm': 11.983671864011523, 'learning_rate': 3.7041651933299167e-06, 'epoch': 0.36} 36%|███▌ | 4431/12313 [3:19:08<5:49:28, 2.66s/it] 36%|███▌ | 4432/12313 [3:19:11<5:50:51, 2.67s/it] {'loss': 0.6165, 'grad_norm': 5.870016078637099, 'learning_rate': 3.703588841305702e-06, 'epoch': 0.36} 36%|███▌ | 4432/12313 [3:19:11<5:50:51, 2.67s/it] 36%|███▌ | 4433/12313 [3:19:14<5:50:42, 2.67s/it] {'loss': 0.5869, 'grad_norm': 11.720283579984393, 'learning_rate': 3.7030124059995086e-06, 'epoch': 0.36} 36%|███▌ | 4433/12313 [3:19:14<5:50:42, 2.67s/it] 36%|███▌ | 4434/12313 [3:19:16<5:46:55, 2.64s/it] {'loss': 0.5978, 'grad_norm': 3.3466945140934072, 'learning_rate': 3.7024358874512235e-06, 'epoch': 0.36} 36%|███▌ | 4434/12313 [3:19:16<5:46:55, 2.64s/it] 36%|███▌ | 4435/12313 [3:19:19<5:45:21, 2.63s/it] {'loss': 0.5366, 'grad_norm': 6.264649818329499, 'learning_rate': 3.7018592857007386e-06, 'epoch': 0.36} 36%|███▌ | 4435/12313 [3:19:19<5:45:21, 2.63s/it] 36%|███▌ | 4436/12313 [3:19:22<5:47:20, 2.65s/it] {'loss': 0.5043, 'grad_norm': 5.505223118383528, 'learning_rate': 3.701282600787952e-06, 'epoch': 0.36} 36%|███▌ | 4436/12313 [3:19:22<5:47:20, 2.65s/it] 36%|███▌ | 4437/12313 [3:19:24<5:51:23, 2.68s/it] {'loss': 0.4023, 'grad_norm': 3.9050434398042793, 'learning_rate': 3.700705832752768e-06, 'epoch': 0.36} 36%|███▌ | 4437/12313 [3:19:24<5:51:23, 2.68s/it] 36%|███▌ | 4438/12313 [3:19:27<5:50:14, 2.67s/it] {'loss': 0.7087, 'grad_norm': 3.8486979131156818, 'learning_rate': 3.700128981635094e-06, 'epoch': 0.36} 36%|███▌ | 4438/12313 [3:19:27<5:50:14, 2.67s/it] 36%|███▌ | 4439/12313 [3:19:30<5:54:06, 2.70s/it] {'loss': 0.676, 'grad_norm': 4.97236792052597, 'learning_rate': 3.6995520474748457e-06, 'epoch': 0.36} 36%|███▌ | 4439/12313 [3:19:30<5:54:06, 2.70s/it] 36%|███▌ | 4440/12313 [3:19:32<5:51:05, 2.68s/it] {'loss': 0.4853, 'grad_norm': 5.54885373013609, 'learning_rate': 3.698975030311946e-06, 'epoch': 0.36} 36%|███▌ | 4440/12313 [3:19:32<5:51:05, 2.68s/it] 36%|███▌ | 4441/12313 [3:19:35<5:44:24, 2.63s/it] {'loss': 0.6239, 'grad_norm': 4.00154383333082, 'learning_rate': 3.6983979301863184e-06, 'epoch': 0.36} 36%|███▌ | 4441/12313 [3:19:35<5:44:24, 2.63s/it] 36%|███▌ | 4442/12313 [3:19:37<5:37:48, 2.58s/it] {'loss': 0.648, 'grad_norm': 8.644737774833692, 'learning_rate': 3.6978207471378965e-06, 'epoch': 0.36} 36%|███▌ | 4442/12313 [3:19:37<5:37:48, 2.58s/it] 36%|███▌ | 4443/12313 [3:19:40<5:46:27, 2.64s/it] {'loss': 0.5129, 'grad_norm': 4.192845183053263, 'learning_rate': 3.697243481206619e-06, 'epoch': 0.36} 36%|███▌ | 4443/12313 [3:19:40<5:46:27, 2.64s/it] 36%|███▌ | 4444/12313 [3:19:43<5:45:10, 2.63s/it] {'loss': 0.6095, 'grad_norm': 4.71407305163305, 'learning_rate': 3.6966661324324278e-06, 'epoch': 0.36} 36%|███▌ | 4444/12313 [3:19:43<5:45:10, 2.63s/it] 36%|███▌ | 4445/12313 [3:19:46<5:48:56, 2.66s/it] {'loss': 0.4677, 'grad_norm': 7.220930209286327, 'learning_rate': 3.6960887008552743e-06, 'epoch': 0.36} 36%|███▌ | 4445/12313 [3:19:46<5:48:56, 2.66s/it] 36%|███▌ | 4446/12313 [3:19:48<5:43:15, 2.62s/it] {'loss': 0.5154, 'grad_norm': 4.962625757719452, 'learning_rate': 3.6955111865151127e-06, 'epoch': 0.36} 36%|███▌ | 4446/12313 [3:19:48<5:43:15, 2.62s/it] 36%|███▌ | 4447/12313 [3:19:51<5:41:47, 2.61s/it] {'loss': 0.5977, 'grad_norm': 10.29073366584436, 'learning_rate': 3.6949335894519033e-06, 'epoch': 0.36} 36%|███▌ | 4447/12313 [3:19:51<5:41:47, 2.61s/it] 36%|███▌ | 4448/12313 [3:19:53<5:42:10, 2.61s/it] {'loss': 0.4716, 'grad_norm': 4.688756562689801, 'learning_rate': 3.6943559097056155e-06, 'epoch': 0.36} 36%|███▌ | 4448/12313 [3:19:53<5:42:10, 2.61s/it] 36%|███▌ | 4449/12313 [3:19:56<5:39:52, 2.59s/it] {'loss': 0.6092, 'grad_norm': 5.833182973807545, 'learning_rate': 3.6937781473162183e-06, 'epoch': 0.36} 36%|███▌ | 4449/12313 [3:19:56<5:39:52, 2.59s/it] 36%|███▌ | 4450/12313 [3:19:58<5:42:29, 2.61s/it] {'loss': 0.6076, 'grad_norm': 7.160318638424944, 'learning_rate': 3.6932003023236916e-06, 'epoch': 0.36} 36%|███▌ | 4450/12313 [3:19:58<5:42:29, 2.61s/it] 36%|███▌ | 4451/12313 [3:20:01<5:44:52, 2.63s/it] {'loss': 0.5193, 'grad_norm': 9.311074485235293, 'learning_rate': 3.692622374768019e-06, 'epoch': 0.36} 36%|███▌ | 4451/12313 [3:20:01<5:44:52, 2.63s/it] 36%|███▌ | 4452/12313 [3:20:04<5:43:01, 2.62s/it] {'loss': 0.4899, 'grad_norm': 12.351430012812038, 'learning_rate': 3.69204436468919e-06, 'epoch': 0.36} 36%|███▌ | 4452/12313 [3:20:04<5:43:01, 2.62s/it] 36%|███▌ | 4453/12313 [3:20:06<5:41:33, 2.61s/it] {'loss': 0.5991, 'grad_norm': 5.254888121313631, 'learning_rate': 3.6914662721272e-06, 'epoch': 0.36} 36%|███▌ | 4453/12313 [3:20:06<5:41:33, 2.61s/it] 36%|███▌ | 4454/12313 [3:20:09<5:43:39, 2.62s/it] {'loss': 0.5256, 'grad_norm': 4.86844411981734, 'learning_rate': 3.6908880971220494e-06, 'epoch': 0.36} 36%|███▌ | 4454/12313 [3:20:09<5:43:39, 2.62s/it] 36%|███▌ | 4455/12313 [3:20:12<5:43:02, 2.62s/it] {'loss': 0.504, 'grad_norm': 11.036443384726775, 'learning_rate': 3.690309839713745e-06, 'epoch': 0.36} 36%|███▌ | 4455/12313 [3:20:12<5:43:02, 2.62s/it] 36%|███▌ | 4456/12313 [3:20:15<5:57:11, 2.73s/it] {'loss': 0.4975, 'grad_norm': 3.81687688264698, 'learning_rate': 3.6897314999423e-06, 'epoch': 0.36} 36%|███▌ | 4456/12313 [3:20:15<5:57:11, 2.73s/it] 36%|███▌ | 4457/12313 [3:20:17<5:57:20, 2.73s/it] {'loss': 0.5774, 'grad_norm': 3.12299941091288, 'learning_rate': 3.6891530778477306e-06, 'epoch': 0.36} 36%|███▌ | 4457/12313 [3:20:17<5:57:20, 2.73s/it] 36%|███▌ | 4458/12313 [3:20:20<5:54:47, 2.71s/it] {'loss': 0.4885, 'grad_norm': 5.036876057872486, 'learning_rate': 3.6885745734700628e-06, 'epoch': 0.36} 36%|███▌ | 4458/12313 [3:20:20<5:54:47, 2.71s/it] 36%|███▌ | 4459/12313 [3:20:23<5:57:17, 2.73s/it] {'loss': 0.6869, 'grad_norm': 8.62327137178916, 'learning_rate': 3.687995986849325e-06, 'epoch': 0.36} 36%|███▌ | 4459/12313 [3:20:23<5:57:17, 2.73s/it] 36%|███▌ | 4460/12313 [3:20:26<6:00:56, 2.76s/it] {'loss': 0.5733, 'grad_norm': 3.4708246203276443, 'learning_rate': 3.687417318025551e-06, 'epoch': 0.36} 36%|███▌ | 4460/12313 [3:20:26<6:00:56, 2.76s/it] 36%|███▌ | 4461/12313 [3:20:28<5:58:41, 2.74s/it] {'loss': 0.6222, 'grad_norm': 4.973444884464531, 'learning_rate': 3.686838567038784e-06, 'epoch': 0.36} 36%|███▌ | 4461/12313 [3:20:28<5:58:41, 2.74s/it] 36%|███▌ | 4462/12313 [3:20:31<5:59:37, 2.75s/it] {'loss': 0.5412, 'grad_norm': 5.667375423236751, 'learning_rate': 3.68625973392907e-06, 'epoch': 0.36} 36%|███▌ | 4462/12313 [3:20:31<5:59:37, 2.75s/it] 36%|███▌ | 4463/12313 [3:20:34<5:54:59, 2.71s/it] {'loss': 0.5223, 'grad_norm': 6.103470863477208, 'learning_rate': 3.6856808187364594e-06, 'epoch': 0.36} 36%|███▌ | 4463/12313 [3:20:34<5:54:59, 2.71s/it] 36%|███▋ | 4464/12313 [3:20:36<5:53:27, 2.70s/it] {'loss': 0.4664, 'grad_norm': 7.865435634910888, 'learning_rate': 3.685101821501012e-06, 'epoch': 0.36} 36%|███▋ | 4464/12313 [3:20:36<5:53:27, 2.70s/it] 36%|███▋ | 4465/12313 [3:20:39<5:54:08, 2.71s/it] {'loss': 0.3839, 'grad_norm': 4.7253648152592715, 'learning_rate': 3.6845227422627904e-06, 'epoch': 0.36} 36%|███▋ | 4465/12313 [3:20:39<5:54:08, 2.71s/it] 36%|███▋ | 4466/12313 [3:20:42<5:47:39, 2.66s/it] {'loss': 0.587, 'grad_norm': 5.338042758405891, 'learning_rate': 3.683943581061864e-06, 'epoch': 0.36} 36%|███▋ | 4466/12313 [3:20:42<5:47:39, 2.66s/it] 36%|███▋ | 4467/12313 [3:20:44<5:46:47, 2.65s/it] {'loss': 0.5633, 'grad_norm': 5.179836245143301, 'learning_rate': 3.683364337938308e-06, 'epoch': 0.36} 36%|███▋ | 4467/12313 [3:20:44<5:46:47, 2.65s/it] 36%|███▋ | 4468/12313 [3:20:47<5:46:02, 2.65s/it] {'loss': 0.6154, 'grad_norm': 6.353705665638407, 'learning_rate': 3.6827850129322017e-06, 'epoch': 0.36} 36%|███▋ | 4468/12313 [3:20:47<5:46:02, 2.65s/it] 36%|███▋ | 4469/12313 [3:20:49<5:41:38, 2.61s/it] {'loss': 0.4864, 'grad_norm': 4.584366349679948, 'learning_rate': 3.682205606083633e-06, 'epoch': 0.36} 36%|███▋ | 4469/12313 [3:20:49<5:41:38, 2.61s/it] 36%|███▋ | 4470/12313 [3:20:52<5:45:01, 2.64s/it] {'loss': 0.4577, 'grad_norm': 6.208825994136881, 'learning_rate': 3.681626117432693e-06, 'epoch': 0.36} 36%|███▋ | 4470/12313 [3:20:52<5:45:01, 2.64s/it] 36%|███▋ | 4471/12313 [3:20:55<5:47:40, 2.66s/it] {'loss': 0.4515, 'grad_norm': 6.366010507681265, 'learning_rate': 3.6810465470194796e-06, 'epoch': 0.36} 36%|███▋ | 4471/12313 [3:20:55<5:47:40, 2.66s/it] 36%|███▋ | 4472/12313 [3:20:57<5:42:24, 2.62s/it] {'loss': 0.5824, 'grad_norm': 4.554722007028088, 'learning_rate': 3.680466894884096e-06, 'epoch': 0.36} 36%|███▋ | 4472/12313 [3:20:57<5:42:24, 2.62s/it] 36%|███▋ | 4473/12313 [3:21:00<5:46:11, 2.65s/it] {'loss': 0.5687, 'grad_norm': 14.16578555542569, 'learning_rate': 3.6798871610666497e-06, 'epoch': 0.36} 36%|███▋ | 4473/12313 [3:21:00<5:46:11, 2.65s/it] 36%|███▋ | 4474/12313 [3:21:03<5:43:28, 2.63s/it] {'loss': 0.5116, 'grad_norm': 6.940126197441687, 'learning_rate': 3.679307345607257e-06, 'epoch': 0.36} 36%|███▋ | 4474/12313 [3:21:03<5:43:28, 2.63s/it] 36%|███▋ | 4475/12313 [3:21:05<5:43:49, 2.63s/it] {'loss': 0.5931, 'grad_norm': 11.501609688473003, 'learning_rate': 3.6787274485460377e-06, 'epoch': 0.36} 36%|███▋ | 4475/12313 [3:21:05<5:43:49, 2.63s/it] 36%|███▋ | 4476/12313 [3:21:08<5:46:41, 2.65s/it] {'loss': 0.5515, 'grad_norm': 9.174827684495176, 'learning_rate': 3.678147469923117e-06, 'epoch': 0.36} 36%|███▋ | 4476/12313 [3:21:08<5:46:41, 2.65s/it] 36%|███▋ | 4477/12313 [3:21:11<5:47:45, 2.66s/it] {'loss': 0.4882, 'grad_norm': 4.650571061924677, 'learning_rate': 3.677567409778626e-06, 'epoch': 0.36} 36%|███▋ | 4477/12313 [3:21:11<5:47:45, 2.66s/it] 36%|███▋ | 4478/12313 [3:21:13<5:39:34, 2.60s/it] {'loss': 0.5081, 'grad_norm': 4.716688957959236, 'learning_rate': 3.6769872681527036e-06, 'epoch': 0.36} 36%|███▋ | 4478/12313 [3:21:13<5:39:34, 2.60s/it] 36%|███▋ | 4479/12313 [3:21:16<5:44:59, 2.64s/it] {'loss': 0.4636, 'grad_norm': 6.347553656163701, 'learning_rate': 3.6764070450854907e-06, 'epoch': 0.36} 36%|███▋ | 4479/12313 [3:21:16<5:44:59, 2.64s/it] 36%|███▋ | 4480/12313 [3:21:19<5:45:54, 2.65s/it] {'loss': 0.4511, 'grad_norm': 10.156227178827374, 'learning_rate': 3.675826740617136e-06, 'epoch': 0.36} 36%|███▋ | 4480/12313 [3:21:19<5:45:54, 2.65s/it] 36%|███▋ | 4481/12313 [3:21:21<5:51:03, 2.69s/it] {'loss': 0.6027, 'grad_norm': 3.559954335727195, 'learning_rate': 3.6752463547877946e-06, 'epoch': 0.36} 36%|███▋ | 4481/12313 [3:21:21<5:51:03, 2.69s/it] 36%|███▋ | 4482/12313 [3:21:24<6:00:02, 2.76s/it] {'loss': 0.5531, 'grad_norm': 4.72021021192598, 'learning_rate': 3.674665887637625e-06, 'epoch': 0.36} 36%|███▋ | 4482/12313 [3:21:24<6:00:02, 2.76s/it] 36%|███▋ | 4483/12313 [3:21:27<5:51:14, 2.69s/it] {'loss': 0.609, 'grad_norm': 11.263712709683006, 'learning_rate': 3.6740853392067925e-06, 'epoch': 0.36} 36%|███▋ | 4483/12313 [3:21:27<5:51:14, 2.69s/it] 36%|███▋ | 4484/12313 [3:21:29<5:48:31, 2.67s/it] {'loss': 0.458, 'grad_norm': 5.123832459796128, 'learning_rate': 3.6735047095354693e-06, 'epoch': 0.36} 36%|███▋ | 4484/12313 [3:21:29<5:48:31, 2.67s/it] 36%|███▋ | 4485/12313 [3:21:32<5:46:36, 2.66s/it] {'loss': 0.8243, 'grad_norm': 4.265471041437096, 'learning_rate': 3.67292399866383e-06, 'epoch': 0.36} 36%|███▋ | 4485/12313 [3:21:32<5:46:36, 2.66s/it] 36%|███▋ | 4486/12313 [3:21:35<5:47:33, 2.66s/it] {'loss': 0.5279, 'grad_norm': 7.3136697189021405, 'learning_rate': 3.6723432066320575e-06, 'epoch': 0.36} 36%|███▋ | 4486/12313 [3:21:35<5:47:33, 2.66s/it] 36%|███▋ | 4487/12313 [3:21:37<5:48:51, 2.67s/it] {'loss': 0.5274, 'grad_norm': 3.4708908613272618, 'learning_rate': 3.67176233348034e-06, 'epoch': 0.36} 36%|███▋ | 4487/12313 [3:21:37<5:48:51, 2.67s/it] 36%|███▋ | 4488/12313 [3:21:40<5:57:51, 2.74s/it] {'loss': 0.4907, 'grad_norm': 3.2379003246205205, 'learning_rate': 3.6711813792488706e-06, 'epoch': 0.36} 36%|███▋ | 4488/12313 [3:21:40<5:57:51, 2.74s/it] 36%|███▋ | 4489/12313 [3:21:43<5:55:22, 2.73s/it] {'loss': 0.5109, 'grad_norm': 4.61818923987606, 'learning_rate': 3.6706003439778476e-06, 'epoch': 0.36} 36%|███▋ | 4489/12313 [3:21:43<5:55:22, 2.73s/it] 36%|███▋ | 4490/12313 [3:21:46<5:53:46, 2.71s/it] {'loss': 0.4814, 'grad_norm': 6.1238028357576555, 'learning_rate': 3.6700192277074766e-06, 'epoch': 0.36} 36%|███▋ | 4490/12313 [3:21:46<5:53:46, 2.71s/it] 36%|███▋ | 4491/12313 [3:21:49<5:58:39, 2.75s/it] {'loss': 0.4514, 'grad_norm': 5.108669074454857, 'learning_rate': 3.6694380304779676e-06, 'epoch': 0.36} 36%|███▋ | 4491/12313 [3:21:49<5:58:39, 2.75s/it] 36%|███▋ | 4492/12313 [3:21:51<5:50:11, 2.69s/it] {'loss': 0.6225, 'grad_norm': 4.555539206226715, 'learning_rate': 3.6688567523295356e-06, 'epoch': 0.36} 36%|███▋ | 4492/12313 [3:21:51<5:50:11, 2.69s/it] 36%|███▋ | 4493/12313 [3:21:54<5:44:05, 2.64s/it] {'loss': 0.526, 'grad_norm': 4.947835075771073, 'learning_rate': 3.668275393302402e-06, 'epoch': 0.36} 36%|███▋ | 4493/12313 [3:21:54<5:44:05, 2.64s/it] 36%|███▋ | 4494/12313 [3:21:56<5:35:19, 2.57s/it] {'loss': 0.5885, 'grad_norm': 4.987280419667079, 'learning_rate': 3.667693953436795e-06, 'epoch': 0.36} 36%|███▋ | 4494/12313 [3:21:56<5:35:19, 2.57s/it] 37%|███▋ | 4495/12313 [3:21:59<5:39:47, 2.61s/it] {'loss': 0.5906, 'grad_norm': 5.33019974406346, 'learning_rate': 3.6671124327729457e-06, 'epoch': 0.37} 37%|███▋ | 4495/12313 [3:21:59<5:39:47, 2.61s/it] 37%|███▋ | 4496/12313 [3:22:02<5:51:58, 2.70s/it] {'loss': 0.4641, 'grad_norm': 3.911550454532029, 'learning_rate': 3.6665308313510927e-06, 'epoch': 0.37} 37%|███▋ | 4496/12313 [3:22:02<5:51:58, 2.70s/it] 37%|███▋ | 4497/12313 [3:22:04<5:51:16, 2.70s/it] {'loss': 0.5346, 'grad_norm': 7.327451479070638, 'learning_rate': 3.665949149211481e-06, 'epoch': 0.37} 37%|███▋ | 4497/12313 [3:22:04<5:51:16, 2.70s/it] 37%|███▋ | 4498/12313 [3:22:07<5:50:25, 2.69s/it] {'loss': 0.564, 'grad_norm': 6.679619506230374, 'learning_rate': 3.6653673863943584e-06, 'epoch': 0.37} 37%|███▋ | 4498/12313 [3:22:07<5:50:25, 2.69s/it] 37%|███▋ | 4499/12313 [3:22:10<5:49:40, 2.68s/it] {'loss': 0.5412, 'grad_norm': 7.979525255569825, 'learning_rate': 3.6647855429399803e-06, 'epoch': 0.37} 37%|███▋ | 4499/12313 [3:22:10<5:49:40, 2.68s/it] 37%|███▋ | 4500/12313 [3:22:12<5:55:58, 2.73s/it] {'loss': 0.5653, 'grad_norm': 5.930675497170029, 'learning_rate': 3.6642036188886072e-06, 'epoch': 0.37} 37%|███▋ | 4500/12313 [3:22:12<5:55:58, 2.73s/it] 37%|███▋ | 4501/12313 [3:22:15<6:05:03, 2.80s/it] {'loss': 0.6143, 'grad_norm': 4.872227933745589, 'learning_rate': 3.663621614280505e-06, 'epoch': 0.37} 37%|███▋ | 4501/12313 [3:22:15<6:05:03, 2.80s/it] 37%|███▋ | 4502/12313 [3:22:18<5:56:54, 2.74s/it] {'loss': 0.5996, 'grad_norm': 4.942347208045127, 'learning_rate': 3.663039529155945e-06, 'epoch': 0.37} 37%|███▋ | 4502/12313 [3:22:18<5:56:54, 2.74s/it] 37%|███▋ | 4503/12313 [3:22:21<5:53:44, 2.72s/it] {'loss': 0.6432, 'grad_norm': 3.9093013103451084, 'learning_rate': 3.6624573635552056e-06, 'epoch': 0.37} 37%|███▋ | 4503/12313 [3:22:21<5:53:44, 2.72s/it] 37%|███▋ | 4504/12313 [3:22:23<5:55:30, 2.73s/it] {'loss': 0.5461, 'grad_norm': 6.258360673482065, 'learning_rate': 3.6618751175185687e-06, 'epoch': 0.37} 37%|███▋ | 4504/12313 [3:22:23<5:55:30, 2.73s/it] 37%|███▋ | 4505/12313 [3:22:26<5:43:10, 2.64s/it] {'loss': 0.4864, 'grad_norm': 6.512518797476682, 'learning_rate': 3.6612927910863235e-06, 'epoch': 0.37} 37%|███▋ | 4505/12313 [3:22:26<5:43:10, 2.64s/it] 37%|███▋ | 4506/12313 [3:22:28<5:38:27, 2.60s/it] {'loss': 0.6799, 'grad_norm': 4.622906130526112, 'learning_rate': 3.660710384298762e-06, 'epoch': 0.37} 37%|███▋ | 4506/12313 [3:22:28<5:38:27, 2.60s/it] 37%|███▋ | 4507/12313 [3:22:31<5:55:26, 2.73s/it] {'loss': 0.6479, 'grad_norm': 3.157312017939539, 'learning_rate': 3.6601278971961853e-06, 'epoch': 0.37} 37%|███▋ | 4507/12313 [3:22:31<5:55:26, 2.73s/it] 37%|███▋ | 4508/12313 [3:22:34<5:54:35, 2.73s/it] {'loss': 0.3765, 'grad_norm': 5.831757782243094, 'learning_rate': 3.659545329818898e-06, 'epoch': 0.37} 37%|███▋ | 4508/12313 [3:22:34<5:54:35, 2.73s/it] 37%|███▋ | 4509/12313 [3:22:37<5:49:52, 2.69s/it] {'loss': 0.5067, 'grad_norm': 4.871188343403577, 'learning_rate': 3.6589626822072105e-06, 'epoch': 0.37} 37%|███▋ | 4509/12313 [3:22:37<5:49:52, 2.69s/it] 37%|███▋ | 4510/12313 [3:22:39<5:44:53, 2.65s/it] {'loss': 0.7395, 'grad_norm': 3.3143048064685825, 'learning_rate': 3.6583799544014397e-06, 'epoch': 0.37} 37%|███▋ | 4510/12313 [3:22:39<5:44:53, 2.65s/it] 37%|███▋ | 4511/12313 [3:22:42<5:36:13, 2.59s/it] {'loss': 0.4955, 'grad_norm': 6.5927342002006535, 'learning_rate': 3.6577971464419064e-06, 'epoch': 0.37} 37%|███▋ | 4511/12313 [3:22:42<5:36:13, 2.59s/it] 37%|███▋ | 4512/12313 [3:22:44<5:34:58, 2.58s/it] {'loss': 0.5946, 'grad_norm': 5.678543552265716, 'learning_rate': 3.6572142583689372e-06, 'epoch': 0.37} 37%|███▋ | 4512/12313 [3:22:44<5:34:58, 2.58s/it] 37%|███▋ | 4513/12313 [3:22:47<5:43:12, 2.64s/it] {'loss': 0.6531, 'grad_norm': 9.978179156767617, 'learning_rate': 3.656631290222867e-06, 'epoch': 0.37} 37%|███▋ | 4513/12313 [3:22:47<5:43:12, 2.64s/it] 37%|███▋ | 4514/12313 [3:22:50<5:46:13, 2.66s/it] {'loss': 0.545, 'grad_norm': 4.4736754108351215, 'learning_rate': 3.656048242044033e-06, 'epoch': 0.37} 37%|███▋ | 4514/12313 [3:22:50<5:46:13, 2.66s/it] 37%|███▋ | 4515/12313 [3:22:53<6:00:23, 2.77s/it] {'loss': 0.4614, 'grad_norm': 3.069124245403996, 'learning_rate': 3.655465113872779e-06, 'epoch': 0.37} 37%|███▋ | 4515/12313 [3:22:53<6:00:23, 2.77s/it] 37%|███▋ | 4516/12313 [3:22:56<5:57:33, 2.75s/it] {'loss': 0.5642, 'grad_norm': 5.78003187791633, 'learning_rate': 3.6548819057494533e-06, 'epoch': 0.37} 37%|███▋ | 4516/12313 [3:22:56<5:57:33, 2.75s/it] 37%|███▋ | 4517/12313 [3:22:58<5:51:42, 2.71s/it] {'loss': 0.726, 'grad_norm': 6.651983645474769, 'learning_rate': 3.6542986177144124e-06, 'epoch': 0.37} 37%|███▋ | 4517/12313 [3:22:58<5:51:42, 2.71s/it] 37%|███▋ | 4518/12313 [3:23:01<5:53:34, 2.72s/it] {'loss': 0.5679, 'grad_norm': 5.61902839950427, 'learning_rate': 3.6537152498080165e-06, 'epoch': 0.37} 37%|███▋ | 4518/12313 [3:23:01<5:53:34, 2.72s/it] 37%|███▋ | 4519/12313 [3:23:04<5:52:25, 2.71s/it] {'loss': 0.6392, 'grad_norm': 3.6610484670477597, 'learning_rate': 3.653131802070631e-06, 'epoch': 0.37} 37%|███▋ | 4519/12313 [3:23:04<5:52:25, 2.71s/it] 37%|███▋ | 4520/12313 [3:23:06<5:57:00, 2.75s/it] {'loss': 0.5744, 'grad_norm': 4.744064286312601, 'learning_rate': 3.6525482745426277e-06, 'epoch': 0.37} 37%|███▋ | 4520/12313 [3:23:06<5:57:00, 2.75s/it] 37%|███▋ | 4521/12313 [3:23:09<5:46:16, 2.67s/it] {'loss': 0.647, 'grad_norm': 5.713260181456036, 'learning_rate': 3.6519646672643837e-06, 'epoch': 0.37} 37%|███▋ | 4521/12313 [3:23:09<5:46:16, 2.67s/it] 37%|███▋ | 4522/12313 [3:23:12<5:46:37, 2.67s/it] {'loss': 0.5129, 'grad_norm': 4.789967162852778, 'learning_rate': 3.6513809802762805e-06, 'epoch': 0.37} 37%|███▋ | 4522/12313 [3:23:12<5:46:37, 2.67s/it] 37%|███▋ | 4523/12313 [3:23:15<6:04:00, 2.80s/it] {'loss': 0.53, 'grad_norm': 5.3194037700176215, 'learning_rate': 3.6507972136187082e-06, 'epoch': 0.37} 37%|███▋ | 4523/12313 [3:23:15<6:04:00, 2.80s/it] 37%|███▋ | 4524/12313 [3:23:18<6:03:47, 2.80s/it] {'loss': 0.4182, 'grad_norm': 4.077739067704118, 'learning_rate': 3.650213367332059e-06, 'epoch': 0.37} 37%|███▋ | 4524/12313 [3:23:18<6:03:47, 2.80s/it] 37%|███▋ | 4525/12313 [3:23:20<5:50:34, 2.70s/it] {'loss': 0.5525, 'grad_norm': 5.963767765697606, 'learning_rate': 3.6496294414567313e-06, 'epoch': 0.37} 37%|███▋ | 4525/12313 [3:23:20<5:50:34, 2.70s/it] 37%|███▋ | 4526/12313 [3:23:22<5:37:37, 2.60s/it] {'loss': 0.6207, 'grad_norm': 16.366327402411947, 'learning_rate': 3.649045436033132e-06, 'epoch': 0.37} 37%|███▋ | 4526/12313 [3:23:22<5:37:37, 2.60s/it] 37%|███▋ | 4527/12313 [3:23:25<5:39:54, 2.62s/it] {'loss': 0.5597, 'grad_norm': 4.600677095064222, 'learning_rate': 3.6484613511016693e-06, 'epoch': 0.37} 37%|███▋ | 4527/12313 [3:23:25<5:39:54, 2.62s/it] 37%|███▋ | 4528/12313 [3:23:28<5:42:43, 2.64s/it] {'loss': 0.5274, 'grad_norm': 4.635569455138606, 'learning_rate': 3.6478771867027585e-06, 'epoch': 0.37} 37%|███▋ | 4528/12313 [3:23:28<5:42:43, 2.64s/it] 37%|███▋ | 4529/12313 [3:23:30<5:43:20, 2.65s/it] {'loss': 0.3283, 'grad_norm': 13.8186155722183, 'learning_rate': 3.647292942876822e-06, 'epoch': 0.37} 37%|███▋ | 4529/12313 [3:23:30<5:43:20, 2.65s/it] 37%|███▋ | 4530/12313 [3:23:33<5:40:13, 2.62s/it] {'loss': 0.666, 'grad_norm': 4.921988696473463, 'learning_rate': 3.646708619664286e-06, 'epoch': 0.37} 37%|███▋ | 4530/12313 [3:23:33<5:40:13, 2.62s/it] 37%|███▋ | 4531/12313 [3:23:36<5:50:24, 2.70s/it] {'loss': 0.504, 'grad_norm': 5.125112781896423, 'learning_rate': 3.646124217105582e-06, 'epoch': 0.37} 37%|███▋ | 4531/12313 [3:23:36<5:50:24, 2.70s/it] 37%|███▋ | 4532/12313 [3:23:39<5:50:32, 2.70s/it] {'loss': 0.4889, 'grad_norm': 4.160106735681602, 'learning_rate': 3.645539735241148e-06, 'epoch': 0.37} 37%|███▋ | 4532/12313 [3:23:39<5:50:32, 2.70s/it] 37%|███▋ | 4533/12313 [3:23:41<5:48:46, 2.69s/it] {'loss': 0.5853, 'grad_norm': 11.018187691856639, 'learning_rate': 3.6449551741114277e-06, 'epoch': 0.37} 37%|███▋ | 4533/12313 [3:23:41<5:48:46, 2.69s/it] 37%|███▋ | 4534/12313 [3:23:44<5:53:20, 2.73s/it] {'loss': 0.5506, 'grad_norm': 6.301041751261872, 'learning_rate': 3.6443705337568683e-06, 'epoch': 0.37} 37%|███▋ | 4534/12313 [3:23:44<5:53:20, 2.73s/it] 37%|███▋ | 4535/12313 [3:23:47<5:52:03, 2.72s/it] {'loss': 0.4832, 'grad_norm': 2.9536104737688635, 'learning_rate': 3.643785814217924e-06, 'epoch': 0.37} 37%|███▋ | 4535/12313 [3:23:47<5:52:03, 2.72s/it] 37%|███▋ | 4536/12313 [3:23:49<5:47:07, 2.68s/it] {'loss': 0.5408, 'grad_norm': 6.237830789303934, 'learning_rate': 3.6432010155350556e-06, 'epoch': 0.37} 37%|███▋ | 4536/12313 [3:23:49<5:47:07, 2.68s/it] 37%|███▋ | 4537/12313 [3:23:52<5:54:51, 2.74s/it] {'loss': 0.5208, 'grad_norm': 5.326352917032812, 'learning_rate': 3.642616137748727e-06, 'epoch': 0.37} 37%|███▋ | 4537/12313 [3:23:52<5:54:51, 2.74s/it] 37%|███▋ | 4538/12313 [3:23:55<5:45:53, 2.67s/it] {'loss': 0.5739, 'grad_norm': 6.248306452774462, 'learning_rate': 3.6420311808994084e-06, 'epoch': 0.37} 37%|███▋ | 4538/12313 [3:23:55<5:45:53, 2.67s/it] 37%|███▋ | 4539/12313 [3:23:57<5:40:03, 2.62s/it] {'loss': 0.6192, 'grad_norm': 4.147693647179564, 'learning_rate': 3.641446145027577e-06, 'epoch': 0.37} 37%|███▋ | 4539/12313 [3:23:57<5:40:03, 2.62s/it] 37%|███▋ | 4540/12313 [3:24:00<5:39:36, 2.62s/it] {'loss': 0.5873, 'grad_norm': 6.469141494304793, 'learning_rate': 3.640861030173713e-06, 'epoch': 0.37} 37%|███▋ | 4540/12313 [3:24:00<5:39:36, 2.62s/it] 37%|███▋ | 4541/12313 [3:24:03<5:43:53, 2.65s/it] {'loss': 0.6911, 'grad_norm': 7.33657703163369, 'learning_rate': 3.6402758363783037e-06, 'epoch': 0.37} 37%|███▋ | 4541/12313 [3:24:03<5:43:53, 2.65s/it] 37%|███▋ | 4542/12313 [3:24:05<5:43:55, 2.66s/it] {'loss': 0.5873, 'grad_norm': 5.919960884683257, 'learning_rate': 3.639690563681841e-06, 'epoch': 0.37} 37%|███▋ | 4542/12313 [3:24:05<5:43:55, 2.66s/it] 37%|███▋ | 4543/12313 [3:24:08<5:48:58, 2.69s/it] {'loss': 0.5995, 'grad_norm': 18.615597531360145, 'learning_rate': 3.6391052121248233e-06, 'epoch': 0.37} 37%|███▋ | 4543/12313 [3:24:08<5:48:58, 2.69s/it] 37%|███▋ | 4544/12313 [3:24:10<5:38:53, 2.62s/it] {'loss': 0.4588, 'grad_norm': 7.146041859144082, 'learning_rate': 3.6385197817477535e-06, 'epoch': 0.37} 37%|███▋ | 4544/12313 [3:24:10<5:38:53, 2.62s/it] 37%|███▋ | 4545/12313 [3:24:13<5:44:05, 2.66s/it] {'loss': 0.4738, 'grad_norm': 4.196856311260542, 'learning_rate': 3.6379342725911402e-06, 'epoch': 0.37} 37%|███▋ | 4545/12313 [3:24:13<5:44:05, 2.66s/it] 37%|███▋ | 4546/12313 [3:24:16<5:46:41, 2.68s/it] {'loss': 0.6132, 'grad_norm': 4.917160222161063, 'learning_rate': 3.637348684695498e-06, 'epoch': 0.37} 37%|███▋ | 4546/12313 [3:24:16<5:46:41, 2.68s/it] 37%|███▋ | 4547/12313 [3:24:18<5:40:05, 2.63s/it] {'loss': 0.3245, 'grad_norm': 3.5260428355619218, 'learning_rate': 3.6367630181013457e-06, 'epoch': 0.37} 37%|███▋ | 4547/12313 [3:24:18<5:40:05, 2.63s/it] 37%|███▋ | 4548/12313 [3:24:21<5:44:00, 2.66s/it] {'loss': 0.6407, 'grad_norm': 5.486577536708969, 'learning_rate': 3.6361772728492096e-06, 'epoch': 0.37} 37%|███▋ | 4548/12313 [3:24:21<5:44:00, 2.66s/it] 37%|███▋ | 4549/12313 [3:24:24<5:38:35, 2.62s/it] {'loss': 0.645, 'grad_norm': 5.527101563539024, 'learning_rate': 3.6355914489796185e-06, 'epoch': 0.37} 37%|███▋ | 4549/12313 [3:24:24<5:38:35, 2.62s/it] 37%|███▋ | 4550/12313 [3:24:26<5:35:43, 2.59s/it] {'loss': 0.5124, 'grad_norm': 5.8535848869679405, 'learning_rate': 3.6350055465331098e-06, 'epoch': 0.37} 37%|███▋ | 4550/12313 [3:24:26<5:35:43, 2.59s/it] 37%|███▋ | 4551/12313 [3:24:29<5:42:33, 2.65s/it] {'loss': 0.5123, 'grad_norm': 3.0247220854088592, 'learning_rate': 3.6344195655502233e-06, 'epoch': 0.37} 37%|███▋ | 4551/12313 [3:24:29<5:42:33, 2.65s/it] 37%|███▋ | 4552/12313 [3:24:31<5:38:23, 2.62s/it] {'loss': 0.3802, 'grad_norm': 5.325222924482676, 'learning_rate': 3.633833506071508e-06, 'epoch': 0.37} 37%|███▋ | 4552/12313 [3:24:31<5:38:23, 2.62s/it] 37%|███▋ | 4553/12313 [3:24:34<5:39:37, 2.63s/it] {'loss': 0.4215, 'grad_norm': 6.707191666111479, 'learning_rate': 3.6332473681375146e-06, 'epoch': 0.37} 37%|███▋ | 4553/12313 [3:24:34<5:39:37, 2.63s/it] 37%|███▋ | 4554/12313 [3:24:37<5:39:17, 2.62s/it] {'loss': 0.5247, 'grad_norm': 6.3473511974694885, 'learning_rate': 3.6326611517888e-06, 'epoch': 0.37} 37%|███▋ | 4554/12313 [3:24:37<5:39:17, 2.62s/it] 37%|███▋ | 4555/12313 [3:24:39<5:39:47, 2.63s/it] {'loss': 0.4702, 'grad_norm': 4.62624452294825, 'learning_rate': 3.632074857065928e-06, 'epoch': 0.37} 37%|███▋ | 4555/12313 [3:24:39<5:39:47, 2.63s/it] 37%|███▋ | 4556/12313 [3:24:42<5:33:31, 2.58s/it] {'loss': 0.5228, 'grad_norm': 3.421641642355826, 'learning_rate': 3.631488484009469e-06, 'epoch': 0.37} 37%|███▋ | 4556/12313 [3:24:42<5:33:31, 2.58s/it] 37%|███▋ | 4557/12313 [3:24:45<5:50:13, 2.71s/it] {'loss': 0.5604, 'grad_norm': 6.538600958475513, 'learning_rate': 3.630902032659994e-06, 'epoch': 0.37} 37%|███▋ | 4557/12313 [3:24:45<5:50:13, 2.71s/it] 37%|███▋ | 4558/12313 [3:24:47<5:44:18, 2.66s/it] {'loss': 0.5324, 'grad_norm': 6.779583853613752, 'learning_rate': 3.6303155030580834e-06, 'epoch': 0.37} 37%|███▋ | 4558/12313 [3:24:47<5:44:18, 2.66s/it] 37%|███▋ | 4559/12313 [3:24:50<5:39:41, 2.63s/it] {'loss': 0.4619, 'grad_norm': 4.786962125388907, 'learning_rate': 3.629728895244323e-06, 'epoch': 0.37} 37%|███▋ | 4559/12313 [3:24:50<5:39:41, 2.63s/it] 37%|███▋ | 4560/12313 [3:24:53<5:47:45, 2.69s/it] {'loss': 0.5009, 'grad_norm': 5.970025695408968, 'learning_rate': 3.6291422092593016e-06, 'epoch': 0.37} 37%|███▋ | 4560/12313 [3:24:53<5:47:45, 2.69s/it] 37%|███▋ | 4561/12313 [3:24:55<5:42:55, 2.65s/it] {'loss': 0.5614, 'grad_norm': 5.054283857837389, 'learning_rate': 3.628555445143615e-06, 'epoch': 0.37} 37%|███▋ | 4561/12313 [3:24:55<5:42:55, 2.65s/it] 37%|███▋ | 4562/12313 [3:24:58<5:43:47, 2.66s/it] {'loss': 0.5086, 'grad_norm': 4.038161358633893, 'learning_rate': 3.6279686029378646e-06, 'epoch': 0.37} 37%|███▋ | 4562/12313 [3:24:58<5:43:47, 2.66s/it] 37%|███▋ | 4563/12313 [3:25:01<5:47:45, 2.69s/it] {'loss': 0.4921, 'grad_norm': 4.169489051463636, 'learning_rate': 3.6273816826826565e-06, 'epoch': 0.37} 37%|███▋ | 4563/12313 [3:25:01<5:47:45, 2.69s/it] 37%|███▋ | 4564/12313 [3:25:04<5:48:18, 2.70s/it] {'loss': 0.5351, 'grad_norm': 6.938462515442908, 'learning_rate': 3.6267946844186023e-06, 'epoch': 0.37} 37%|███▋ | 4564/12313 [3:25:04<5:48:18, 2.70s/it] 37%|███▋ | 4565/12313 [3:25:06<5:48:24, 2.70s/it] {'loss': 0.4798, 'grad_norm': 4.177658193519644, 'learning_rate': 3.6262076081863195e-06, 'epoch': 0.37} 37%|███▋ | 4565/12313 [3:25:06<5:48:24, 2.70s/it] 37%|███▋ | 4566/12313 [3:25:09<6:01:04, 2.80s/it] {'loss': 0.468, 'grad_norm': 3.350413459828565, 'learning_rate': 3.625620454026431e-06, 'epoch': 0.37} 37%|███▋ | 4566/12313 [3:25:09<6:01:04, 2.80s/it] 37%|███▋ | 4567/12313 [3:25:12<5:53:44, 2.74s/it] {'loss': 0.5411, 'grad_norm': 5.601772436876186, 'learning_rate': 3.625033221979564e-06, 'epoch': 0.37} 37%|███▋ | 4567/12313 [3:25:12<5:53:44, 2.74s/it] 37%|███▋ | 4568/12313 [3:25:15<5:53:29, 2.74s/it] {'loss': 0.6332, 'grad_norm': 5.46442194527407, 'learning_rate': 3.624445912086352e-06, 'epoch': 0.37} 37%|███▋ | 4568/12313 [3:25:15<5:53:29, 2.74s/it] 37%|███▋ | 4569/12313 [3:25:17<5:47:52, 2.70s/it] {'loss': 0.6209, 'grad_norm': 5.8554252343208, 'learning_rate': 3.6238585243874346e-06, 'epoch': 0.37} 37%|███▋ | 4569/12313 [3:25:17<5:47:52, 2.70s/it] 37%|███▋ | 4570/12313 [3:25:20<5:44:12, 2.67s/it] {'loss': 0.5771, 'grad_norm': 6.38111885179741, 'learning_rate': 3.6232710589234556e-06, 'epoch': 0.37} 37%|███▋ | 4570/12313 [3:25:20<5:44:12, 2.67s/it] 37%|███▋ | 4571/12313 [3:25:22<5:41:24, 2.65s/it] {'loss': 0.6513, 'grad_norm': 4.123032307940846, 'learning_rate': 3.6226835157350625e-06, 'epoch': 0.37} 37%|███▋ | 4571/12313 [3:25:22<5:41:24, 2.65s/it] 37%|███▋ | 4572/12313 [3:25:25<5:38:17, 2.62s/it] {'loss': 0.6454, 'grad_norm': 5.925335677806806, 'learning_rate': 3.6220958948629137e-06, 'epoch': 0.37} 37%|███▋ | 4572/12313 [3:25:25<5:38:17, 2.62s/it] 37%|███▋ | 4573/12313 [3:25:28<5:45:26, 2.68s/it] {'loss': 0.5576, 'grad_norm': 4.123001819171552, 'learning_rate': 3.621508196347667e-06, 'epoch': 0.37} 37%|███▋ | 4573/12313 [3:25:28<5:45:26, 2.68s/it] 37%|███▋ | 4574/12313 [3:25:30<5:42:33, 2.66s/it] {'loss': 0.6256, 'grad_norm': 6.967825569297834, 'learning_rate': 3.6209204202299875e-06, 'epoch': 0.37} 37%|███▋ | 4574/12313 [3:25:30<5:42:33, 2.66s/it] 37%|███▋ | 4575/12313 [3:25:33<5:38:45, 2.63s/it] {'loss': 0.632, 'grad_norm': 6.347779650753475, 'learning_rate': 3.6203325665505486e-06, 'epoch': 0.37} 37%|███▋ | 4575/12313 [3:25:33<5:38:45, 2.63s/it] 37%|███▋ | 4576/12313 [3:25:35<5:35:59, 2.61s/it] {'loss': 0.5919, 'grad_norm': 3.8118384188679943, 'learning_rate': 3.619744635350025e-06, 'epoch': 0.37} 37%|███▋ | 4576/12313 [3:25:35<5:35:59, 2.61s/it] 37%|███▋ | 4577/12313 [3:25:38<5:36:54, 2.61s/it] {'loss': 0.6338, 'grad_norm': 7.993997703219614, 'learning_rate': 3.619156626669098e-06, 'epoch': 0.37} 37%|███▋ | 4577/12313 [3:25:38<5:36:54, 2.61s/it] 37%|███▋ | 4578/12313 [3:25:41<5:30:42, 2.57s/it] {'loss': 0.5769, 'grad_norm': 4.85057042384623, 'learning_rate': 3.6185685405484566e-06, 'epoch': 0.37} 37%|███▋ | 4578/12313 [3:25:41<5:30:42, 2.57s/it] 37%|███▋ | 4579/12313 [3:25:43<5:36:48, 2.61s/it] {'loss': 0.4109, 'grad_norm': 4.65594156616089, 'learning_rate': 3.6179803770287913e-06, 'epoch': 0.37} 37%|███▋ | 4579/12313 [3:25:43<5:36:48, 2.61s/it] 37%|███▋ | 4580/12313 [3:25:46<5:37:25, 2.62s/it] {'loss': 0.3943, 'grad_norm': 4.986423129791739, 'learning_rate': 3.6173921361508012e-06, 'epoch': 0.37} 37%|███▋ | 4580/12313 [3:25:46<5:37:25, 2.62s/it] 37%|███▋ | 4581/12313 [3:25:49<5:37:02, 2.62s/it] {'loss': 0.5428, 'grad_norm': 6.284947322407295, 'learning_rate': 3.616803817955189e-06, 'epoch': 0.37} 37%|███▋ | 4581/12313 [3:25:49<5:37:02, 2.62s/it] 37%|███▋ | 4582/12313 [3:25:51<5:47:03, 2.69s/it] {'loss': 0.5465, 'grad_norm': 5.691836227913848, 'learning_rate': 3.6162154224826627e-06, 'epoch': 0.37} 37%|███▋ | 4582/12313 [3:25:51<5:47:03, 2.69s/it] 37%|███▋ | 4583/12313 [3:25:54<5:41:06, 2.65s/it] {'loss': 0.4766, 'grad_norm': 8.95784788971493, 'learning_rate': 3.615626949773937e-06, 'epoch': 0.37} 37%|███▋ | 4583/12313 [3:25:54<5:41:06, 2.65s/it] 37%|███▋ | 4584/12313 [3:25:57<5:44:45, 2.68s/it] {'loss': 0.5862, 'grad_norm': 4.7473375463729495, 'learning_rate': 3.6150383998697315e-06, 'epoch': 0.37} 37%|███▋ | 4584/12313 [3:25:57<5:44:45, 2.68s/it] 37%|███▋ | 4585/12313 [3:26:00<5:51:17, 2.73s/it] {'loss': 0.4044, 'grad_norm': 5.4505240290945585, 'learning_rate': 3.614449772810769e-06, 'epoch': 0.37} 37%|███▋ | 4585/12313 [3:26:00<5:51:17, 2.73s/it] 37%|███▋ | 4586/12313 [3:26:02<5:45:22, 2.68s/it] {'loss': 0.4798, 'grad_norm': 6.977930137629381, 'learning_rate': 3.613861068637781e-06, 'epoch': 0.37} 37%|███▋ | 4586/12313 [3:26:02<5:45:22, 2.68s/it] 37%|███▋ | 4587/12313 [3:26:05<5:40:16, 2.64s/it] {'loss': 0.6262, 'grad_norm': 6.5403690266388335, 'learning_rate': 3.6132722873915017e-06, 'epoch': 0.37} 37%|███▋ | 4587/12313 [3:26:05<5:40:16, 2.64s/it] 37%|███▋ | 4588/12313 [3:26:07<5:42:36, 2.66s/it] {'loss': 0.4946, 'grad_norm': 6.834395602891912, 'learning_rate': 3.6126834291126724e-06, 'epoch': 0.37} 37%|███▋ | 4588/12313 [3:26:07<5:42:36, 2.66s/it] 37%|███▋ | 4589/12313 [3:26:10<5:36:06, 2.61s/it] {'loss': 0.552, 'grad_norm': 3.6009773174215054, 'learning_rate': 3.6120944938420384e-06, 'epoch': 0.37} 37%|███▋ | 4589/12313 [3:26:10<5:36:06, 2.61s/it] 37%|███▋ | 4590/12313 [3:26:13<5:39:56, 2.64s/it] {'loss': 0.4826, 'grad_norm': 6.184936256815004, 'learning_rate': 3.6115054816203504e-06, 'epoch': 0.37} 37%|███▋ | 4590/12313 [3:26:13<5:39:56, 2.64s/it] 37%|███▋ | 4591/12313 [3:26:15<5:49:16, 2.71s/it] {'loss': 0.595, 'grad_norm': 4.521800781478745, 'learning_rate': 3.6109163924883668e-06, 'epoch': 0.37} 37%|███▋ | 4591/12313 [3:26:15<5:49:16, 2.71s/it] 37%|███▋ | 4592/12313 [3:26:18<5:46:32, 2.69s/it] {'loss': 0.5459, 'grad_norm': 5.129658876888231, 'learning_rate': 3.6103272264868473e-06, 'epoch': 0.37} 37%|███▋ | 4592/12313 [3:26:18<5:46:32, 2.69s/it] 37%|███▋ | 4593/12313 [3:26:21<5:45:59, 2.69s/it] {'loss': 0.7445, 'grad_norm': 3.955425189400771, 'learning_rate': 3.6097379836565604e-06, 'epoch': 0.37} 37%|███▋ | 4593/12313 [3:26:21<5:45:59, 2.69s/it] 37%|███▋ | 4594/12313 [3:26:23<5:32:47, 2.59s/it] {'loss': 0.5907, 'grad_norm': 4.651648963857242, 'learning_rate': 3.6091486640382785e-06, 'epoch': 0.37} 37%|███▋ | 4594/12313 [3:26:23<5:32:47, 2.59s/it] 37%|███▋ | 4595/12313 [3:26:26<5:46:13, 2.69s/it] {'loss': 0.332, 'grad_norm': 3.3474693180043045, 'learning_rate': 3.6085592676727786e-06, 'epoch': 0.37} 37%|███▋ | 4595/12313 [3:26:26<5:46:13, 2.69s/it] 37%|███▋ | 4596/12313 [3:26:29<5:46:00, 2.69s/it] {'loss': 0.4753, 'grad_norm': 6.634033909104456, 'learning_rate': 3.6079697946008453e-06, 'epoch': 0.37} 37%|███▋ | 4596/12313 [3:26:29<5:46:00, 2.69s/it] 37%|███▋ | 4597/12313 [3:26:31<5:42:21, 2.66s/it] {'loss': 0.5514, 'grad_norm': 5.83533093603698, 'learning_rate': 3.607380244863265e-06, 'epoch': 0.37} 37%|███▋ | 4597/12313 [3:26:31<5:42:21, 2.66s/it] 37%|███▋ | 4598/12313 [3:26:34<5:39:26, 2.64s/it] {'loss': 0.3765, 'grad_norm': 4.575836973417688, 'learning_rate': 3.6067906185008328e-06, 'epoch': 0.37} 37%|███▋ | 4598/12313 [3:26:34<5:39:26, 2.64s/it] 37%|███▋ | 4599/12313 [3:26:37<5:38:32, 2.63s/it] {'loss': 0.5876, 'grad_norm': 4.848076544705409, 'learning_rate': 3.6062009155543483e-06, 'epoch': 0.37} 37%|███▋ | 4599/12313 [3:26:37<5:38:32, 2.63s/it] 37%|███▋ | 4600/12313 [3:26:39<5:45:02, 2.68s/it] {'loss': 0.4459, 'grad_norm': 5.836658897942606, 'learning_rate': 3.6056111360646134e-06, 'epoch': 0.37} 37%|███▋ | 4600/12313 [3:26:39<5:45:02, 2.68s/it] 37%|███▋ | 4601/12313 [3:26:42<5:45:13, 2.69s/it] {'loss': 0.4275, 'grad_norm': 3.9899151337710306, 'learning_rate': 3.6050212800724403e-06, 'epoch': 0.37} 37%|███▋ | 4601/12313 [3:26:42<5:45:13, 2.69s/it] 37%|███▋ | 4602/12313 [3:26:45<5:55:52, 2.77s/it] {'loss': 0.5786, 'grad_norm': 4.289228152113736, 'learning_rate': 3.6044313476186433e-06, 'epoch': 0.37} 37%|███▋ | 4602/12313 [3:26:45<5:55:52, 2.77s/it] 37%|███▋ | 4603/12313 [3:26:47<5:42:02, 2.66s/it] {'loss': 0.4954, 'grad_norm': 5.434462312775084, 'learning_rate': 3.603841338744041e-06, 'epoch': 0.37} 37%|███▋ | 4603/12313 [3:26:47<5:42:02, 2.66s/it] 37%|███▋ | 4604/12313 [3:26:50<5:54:16, 2.76s/it] {'loss': 0.6879, 'grad_norm': 3.5351327522854423, 'learning_rate': 3.6032512534894597e-06, 'epoch': 0.37} 37%|███▋ | 4604/12313 [3:26:50<5:54:16, 2.76s/it] 37%|███▋ | 4605/12313 [3:26:53<5:44:39, 2.68s/it] {'loss': 0.43, 'grad_norm': 5.549918211724535, 'learning_rate': 3.602661091895732e-06, 'epoch': 0.37} 37%|███▋ | 4605/12313 [3:26:53<5:44:39, 2.68s/it] 37%|███▋ | 4606/12313 [3:26:55<5:38:38, 2.64s/it] {'loss': 0.5157, 'grad_norm': 4.083602872009912, 'learning_rate': 3.602070854003692e-06, 'epoch': 0.37} 37%|███▋ | 4606/12313 [3:26:55<5:38:38, 2.64s/it] 37%|███▋ | 4607/12313 [3:26:58<5:39:20, 2.64s/it] {'loss': 0.5669, 'grad_norm': 7.5939379592765786, 'learning_rate': 3.6014805398541815e-06, 'epoch': 0.37} 37%|███▋ | 4607/12313 [3:26:58<5:39:20, 2.64s/it] 37%|███▋ | 4608/12313 [3:27:01<5:29:54, 2.57s/it] {'loss': 0.4815, 'grad_norm': 3.8544939676204355, 'learning_rate': 3.6008901494880467e-06, 'epoch': 0.37} 37%|███▋ | 4608/12313 [3:27:01<5:29:54, 2.57s/it] 37%|███▋ | 4609/12313 [3:27:03<5:32:30, 2.59s/it] {'loss': 0.8072, 'grad_norm': 6.619794981662382, 'learning_rate': 3.60029968294614e-06, 'epoch': 0.37} 37%|███▋ | 4609/12313 [3:27:03<5:32:30, 2.59s/it] 37%|███▋ | 4610/12313 [3:27:06<5:28:30, 2.56s/it] {'loss': 0.409, 'grad_norm': 9.067352524497199, 'learning_rate': 3.599709140269319e-06, 'epoch': 0.37} 37%|███▋ | 4610/12313 [3:27:06<5:28:30, 2.56s/it] 37%|███▋ | 4611/12313 [3:27:08<5:30:44, 2.58s/it] {'loss': 0.6207, 'grad_norm': 4.980681642925656, 'learning_rate': 3.599118521498445e-06, 'epoch': 0.37} 37%|███▋ | 4611/12313 [3:27:08<5:30:44, 2.58s/it] 37%|███▋ | 4612/12313 [3:27:11<5:26:39, 2.55s/it] {'loss': 0.4077, 'grad_norm': 5.8795928027879585, 'learning_rate': 3.598527826674387e-06, 'epoch': 0.37} 37%|███▋ | 4612/12313 [3:27:11<5:26:39, 2.55s/it] 37%|███▋ | 4613/12313 [3:27:13<5:23:23, 2.52s/it] {'loss': 0.4889, 'grad_norm': 6.664130906786667, 'learning_rate': 3.597937055838017e-06, 'epoch': 0.37} 37%|███▋ | 4613/12313 [3:27:13<5:23:23, 2.52s/it] 37%|███▋ | 4614/12313 [3:27:16<5:36:21, 2.62s/it] {'loss': 0.556, 'grad_norm': 3.512866216429364, 'learning_rate': 3.5973462090302137e-06, 'epoch': 0.37} 37%|███▋ | 4614/12313 [3:27:16<5:36:21, 2.62s/it] 37%|███▋ | 4615/12313 [3:27:19<5:32:47, 2.59s/it] {'loss': 0.5955, 'grad_norm': 7.915343003804626, 'learning_rate': 3.5967552862918603e-06, 'epoch': 0.37} 37%|███▋ | 4615/12313 [3:27:19<5:32:47, 2.59s/it] 37%|███▋ | 4616/12313 [3:27:21<5:36:00, 2.62s/it] {'loss': 0.4933, 'grad_norm': 5.142753619848735, 'learning_rate': 3.596164287663845e-06, 'epoch': 0.37} 37%|███▋ | 4616/12313 [3:27:21<5:36:00, 2.62s/it] 37%|███▋ | 4617/12313 [3:27:24<5:34:14, 2.61s/it] {'loss': 0.9477, 'grad_norm': 5.185440275795067, 'learning_rate': 3.5955732131870626e-06, 'epoch': 0.37} 37%|███▋ | 4617/12313 [3:27:24<5:34:14, 2.61s/it] 38%|███▊ | 4618/12313 [3:27:26<5:34:50, 2.61s/it] {'loss': 0.6607, 'grad_norm': 12.990297623448031, 'learning_rate': 3.594982062902412e-06, 'epoch': 0.38} 38%|███▊ | 4618/12313 [3:27:26<5:34:50, 2.61s/it] 38%|███▊ | 4619/12313 [3:27:29<5:43:17, 2.68s/it] {'loss': 0.6018, 'grad_norm': 4.182193438396625, 'learning_rate': 3.5943908368507985e-06, 'epoch': 0.38} 38%|███▊ | 4619/12313 [3:27:29<5:43:17, 2.68s/it] 38%|███▊ | 4620/12313 [3:27:32<5:54:25, 2.76s/it] {'loss': 0.5165, 'grad_norm': 3.8071948876468733, 'learning_rate': 3.59379953507313e-06, 'epoch': 0.38} 38%|███▊ | 4620/12313 [3:27:32<5:54:25, 2.76s/it] 38%|███▊ | 4621/12313 [3:27:35<5:53:04, 2.75s/it] {'loss': 0.4829, 'grad_norm': 4.47056756713465, 'learning_rate': 3.593208157610324e-06, 'epoch': 0.38} 38%|███▊ | 4621/12313 [3:27:35<5:53:04, 2.75s/it] 38%|███▊ | 4622/12313 [3:27:37<5:40:12, 2.65s/it] {'loss': 0.5083, 'grad_norm': 4.861658029251169, 'learning_rate': 3.592616704503298e-06, 'epoch': 0.38} 38%|███▊ | 4622/12313 [3:27:37<5:40:12, 2.65s/it] 38%|███▊ | 4623/12313 [3:27:40<5:33:49, 2.60s/it] {'loss': 0.5095, 'grad_norm': 4.194610134145657, 'learning_rate': 3.5920251757929787e-06, 'epoch': 0.38} 38%|███▊ | 4623/12313 [3:27:40<5:33:49, 2.60s/it] 38%|███▊ | 4624/12313 [3:27:43<5:37:40, 2.63s/it] {'loss': 0.4922, 'grad_norm': 6.791531269079618, 'learning_rate': 3.5914335715202976e-06, 'epoch': 0.38} 38%|███▊ | 4624/12313 [3:27:43<5:37:40, 2.63s/it] 38%|███▊ | 4625/12313 [3:27:45<5:42:15, 2.67s/it] {'loss': 0.6261, 'grad_norm': 5.2361522290402185, 'learning_rate': 3.590841891726189e-06, 'epoch': 0.38} 38%|███▊ | 4625/12313 [3:27:45<5:42:15, 2.67s/it] 38%|███▊ | 4626/12313 [3:27:48<5:39:45, 2.65s/it] {'loss': 0.5765, 'grad_norm': 10.082840154908705, 'learning_rate': 3.5902501364515945e-06, 'epoch': 0.38} 38%|███▊ | 4626/12313 [3:27:48<5:39:45, 2.65s/it] 38%|███▊ | 4627/12313 [3:27:51<5:41:22, 2.66s/it] {'loss': 0.3993, 'grad_norm': 6.544537456866241, 'learning_rate': 3.5896583057374607e-06, 'epoch': 0.38} 38%|███▊ | 4627/12313 [3:27:51<5:41:22, 2.66s/it] 38%|███▊ | 4628/12313 [3:27:53<5:47:30, 2.71s/it] {'loss': 0.6202, 'grad_norm': 4.601861451010068, 'learning_rate': 3.589066399624739e-06, 'epoch': 0.38} 38%|███▊ | 4628/12313 [3:27:53<5:47:30, 2.71s/it] 38%|███▊ | 4629/12313 [3:27:56<5:45:21, 2.70s/it] {'loss': 0.5738, 'grad_norm': 4.584192839631751, 'learning_rate': 3.5884744181543868e-06, 'epoch': 0.38} 38%|███▊ | 4629/12313 [3:27:56<5:45:21, 2.70s/it] 38%|███▊ | 4630/12313 [3:27:59<5:43:15, 2.68s/it] {'loss': 0.4293, 'grad_norm': 3.856688627858152, 'learning_rate': 3.5878823613673652e-06, 'epoch': 0.38} 38%|███▊ | 4630/12313 [3:27:59<5:43:15, 2.68s/it] 38%|███▊ | 4631/12313 [3:28:02<5:59:14, 2.81s/it] {'loss': 0.4848, 'grad_norm': 3.3977632115418106, 'learning_rate': 3.5872902293046417e-06, 'epoch': 0.38} 38%|███▊ | 4631/12313 [3:28:02<5:59:14, 2.81s/it] 38%|███▊ | 4632/12313 [3:28:04<5:51:01, 2.74s/it] {'loss': 0.554, 'grad_norm': 5.9959616576129395, 'learning_rate': 3.586698022007189e-06, 'epoch': 0.38} 38%|███▊ | 4632/12313 [3:28:04<5:51:01, 2.74s/it] 38%|███▊ | 4633/12313 [3:28:07<5:45:44, 2.70s/it] {'loss': 0.494, 'grad_norm': 4.593599550308795, 'learning_rate': 3.5861057395159837e-06, 'epoch': 0.38} 38%|███▊ | 4633/12313 [3:28:07<5:45:44, 2.70s/it] 38%|███▊ | 4634/12313 [3:28:10<5:41:27, 2.67s/it] {'loss': 0.3877, 'grad_norm': 3.9971449166055644, 'learning_rate': 3.5855133818720106e-06, 'epoch': 0.38} 38%|███▊ | 4634/12313 [3:28:10<5:41:27, 2.67s/it] 38%|███▊ | 4635/12313 [3:28:12<5:37:02, 2.63s/it] {'loss': 0.4489, 'grad_norm': 10.899762207965088, 'learning_rate': 3.5849209491162555e-06, 'epoch': 0.38} 38%|███▊ | 4635/12313 [3:28:12<5:37:02, 2.63s/it] 38%|███▊ | 4636/12313 [3:28:15<5:38:31, 2.65s/it] {'loss': 0.6891, 'grad_norm': 3.0450268530430598, 'learning_rate': 3.5843284412897127e-06, 'epoch': 0.38} 38%|███▊ | 4636/12313 [3:28:15<5:38:31, 2.65s/it] 38%|███▊ | 4637/12313 [3:28:17<5:35:00, 2.62s/it] {'loss': 0.6111, 'grad_norm': 5.435757415343299, 'learning_rate': 3.5837358584333814e-06, 'epoch': 0.38} 38%|███▊ | 4637/12313 [3:28:17<5:35:00, 2.62s/it] 38%|███▊ | 4638/12313 [3:28:20<5:35:40, 2.62s/it] {'loss': 0.4667, 'grad_norm': 4.283255422002926, 'learning_rate': 3.583143200588263e-06, 'epoch': 0.38} 38%|███▊ | 4638/12313 [3:28:20<5:35:40, 2.62s/it] 38%|███▊ | 4639/12313 [3:28:23<5:29:28, 2.58s/it] {'loss': 0.5796, 'grad_norm': 2.6058528884262553, 'learning_rate': 3.5825504677953684e-06, 'epoch': 0.38} 38%|███▊ | 4639/12313 [3:28:23<5:29:28, 2.58s/it] 38%|███▊ | 4640/12313 [3:28:25<5:31:07, 2.59s/it] {'loss': 0.579, 'grad_norm': 5.535677617104462, 'learning_rate': 3.581957660095711e-06, 'epoch': 0.38} 38%|███▊ | 4640/12313 [3:28:25<5:31:07, 2.59s/it] 38%|███▊ | 4641/12313 [3:28:28<5:42:39, 2.68s/it] {'loss': 0.531, 'grad_norm': 7.993394293530673, 'learning_rate': 3.5813647775303084e-06, 'epoch': 0.38} 38%|███▊ | 4641/12313 [3:28:28<5:42:39, 2.68s/it] 38%|███▊ | 4642/12313 [3:28:31<5:44:06, 2.69s/it] {'loss': 0.5836, 'grad_norm': 6.421520761195438, 'learning_rate': 3.580771820140187e-06, 'epoch': 0.38} 38%|███▊ | 4642/12313 [3:28:31<5:44:06, 2.69s/it] 38%|███▊ | 4643/12313 [3:28:33<5:40:59, 2.67s/it] {'loss': 0.6717, 'grad_norm': 4.4706924859775174, 'learning_rate': 3.580178787966376e-06, 'epoch': 0.38} 38%|███▊ | 4643/12313 [3:28:33<5:40:59, 2.67s/it] 38%|███▊ | 4644/12313 [3:28:36<5:42:14, 2.68s/it] {'loss': 0.5729, 'grad_norm': 4.900856116172455, 'learning_rate': 3.5795856810499085e-06, 'epoch': 0.38} 38%|███▊ | 4644/12313 [3:28:36<5:42:14, 2.68s/it] 38%|███▊ | 4645/12313 [3:28:39<5:48:00, 2.72s/it] {'loss': 0.6078, 'grad_norm': 3.182583890174541, 'learning_rate': 3.5789924994318267e-06, 'epoch': 0.38} 38%|███▊ | 4645/12313 [3:28:39<5:48:00, 2.72s/it] 38%|███▊ | 4646/12313 [3:28:41<5:41:26, 2.67s/it] {'loss': 0.6606, 'grad_norm': 3.865947434731064, 'learning_rate': 3.578399243153174e-06, 'epoch': 0.38} 38%|███▊ | 4646/12313 [3:28:41<5:41:26, 2.67s/it] 38%|███▊ | 4647/12313 [3:28:44<5:54:51, 2.78s/it] {'loss': 0.544, 'grad_norm': 3.349782633519335, 'learning_rate': 3.5778059122550007e-06, 'epoch': 0.38} 38%|███▊ | 4647/12313 [3:28:44<5:54:51, 2.78s/it] 38%|███▊ | 4648/12313 [3:28:47<5:50:27, 2.74s/it] {'loss': 0.5011, 'grad_norm': 4.885217069177799, 'learning_rate': 3.5772125067783624e-06, 'epoch': 0.38} 38%|███▊ | 4648/12313 [3:28:47<5:50:27, 2.74s/it] 38%|███▊ | 4649/12313 [3:28:50<5:50:57, 2.75s/it] {'loss': 0.5884, 'grad_norm': 4.590378597766298, 'learning_rate': 3.57661902676432e-06, 'epoch': 0.38} 38%|███▊ | 4649/12313 [3:28:50<5:50:57, 2.75s/it] 38%|███▊ | 4650/12313 [3:28:53<5:50:09, 2.74s/it] {'loss': 0.4238, 'grad_norm': 4.527590654884785, 'learning_rate': 3.576025472253939e-06, 'epoch': 0.38} 38%|███▊ | 4650/12313 [3:28:53<5:50:09, 2.74s/it] 38%|███▊ | 4651/12313 [3:28:55<5:46:19, 2.71s/it] {'loss': 0.4485, 'grad_norm': 4.451704643586669, 'learning_rate': 3.5754318432882907e-06, 'epoch': 0.38} 38%|███▊ | 4651/12313 [3:28:55<5:46:19, 2.71s/it] 38%|███▊ | 4652/12313 [3:28:58<5:51:36, 2.75s/it] {'loss': 0.5697, 'grad_norm': 5.927309052944885, 'learning_rate': 3.5748381399084492e-06, 'epoch': 0.38} 38%|███▊ | 4652/12313 [3:28:58<5:51:36, 2.75s/it] 38%|███▊ | 4653/12313 [3:29:01<5:46:29, 2.71s/it] {'loss': 0.5761, 'grad_norm': 5.139872795558716, 'learning_rate': 3.5742443621554977e-06, 'epoch': 0.38} 38%|███▊ | 4653/12313 [3:29:01<5:46:29, 2.71s/it] 38%|███▊ | 4654/12313 [3:29:04<5:50:00, 2.74s/it] {'loss': 0.6974, 'grad_norm': 6.387127931906058, 'learning_rate': 3.5736505100705223e-06, 'epoch': 0.38} 38%|███▊ | 4654/12313 [3:29:04<5:50:00, 2.74s/it] 38%|███▊ | 4655/12313 [3:29:06<5:52:50, 2.76s/it] {'loss': 0.6572, 'grad_norm': 5.810815572808849, 'learning_rate': 3.573056583694612e-06, 'epoch': 0.38} 38%|███▊ | 4655/12313 [3:29:06<5:52:50, 2.76s/it] 38%|███▊ | 4656/12313 [3:29:09<5:53:03, 2.77s/it] {'loss': 0.5495, 'grad_norm': 4.719558057872907, 'learning_rate': 3.5724625830688667e-06, 'epoch': 0.38} 38%|███▊ | 4656/12313 [3:29:09<5:53:03, 2.77s/it] 38%|███▊ | 4657/12313 [3:29:12<5:48:08, 2.73s/it] {'loss': 0.4823, 'grad_norm': 6.208002944055565, 'learning_rate': 3.571868508234386e-06, 'epoch': 0.38} 38%|███▊ | 4657/12313 [3:29:12<5:48:08, 2.73s/it] 38%|███▊ | 4658/12313 [3:29:15<5:50:00, 2.74s/it] {'loss': 0.504, 'grad_norm': 3.68160438978963, 'learning_rate': 3.5712743592322775e-06, 'epoch': 0.38} 38%|███▊ | 4658/12313 [3:29:15<5:50:00, 2.74s/it] 38%|███▊ | 4659/12313 [3:29:17<5:44:10, 2.70s/it] {'loss': 0.571, 'grad_norm': 4.064268236543746, 'learning_rate': 3.570680136103653e-06, 'epoch': 0.38} 38%|███▊ | 4659/12313 [3:29:17<5:44:10, 2.70s/it] 38%|███▊ | 4660/12313 [3:29:20<5:44:20, 2.70s/it] {'loss': 0.4338, 'grad_norm': 7.721913069819413, 'learning_rate': 3.57008583888963e-06, 'epoch': 0.38} 38%|███▊ | 4660/12313 [3:29:20<5:44:20, 2.70s/it] 38%|███▊ | 4661/12313 [3:29:23<5:44:38, 2.70s/it] {'loss': 0.4907, 'grad_norm': 8.025619823408007, 'learning_rate': 3.569491467631329e-06, 'epoch': 0.38} 38%|███▊ | 4661/12313 [3:29:23<5:44:38, 2.70s/it] 38%|███▊ | 4662/12313 [3:29:26<5:53:55, 2.78s/it] {'loss': 0.6222, 'grad_norm': 4.4108267729746204, 'learning_rate': 3.568897022369879e-06, 'epoch': 0.38} 38%|███▊ | 4662/12313 [3:29:26<5:53:55, 2.78s/it] 38%|███▊ | 4663/12313 [3:29:28<5:52:29, 2.76s/it] {'loss': 0.5193, 'grad_norm': 15.163395858330826, 'learning_rate': 3.568302503146413e-06, 'epoch': 0.38} 38%|███▊ | 4663/12313 [3:29:28<5:52:29, 2.76s/it] 38%|███▊ | 4664/12313 [3:29:31<5:50:42, 2.75s/it] {'loss': 0.5473, 'grad_norm': 3.2649054015739525, 'learning_rate': 3.567707910002068e-06, 'epoch': 0.38} 38%|███▊ | 4664/12313 [3:29:31<5:50:42, 2.75s/it] 38%|███▊ | 4665/12313 [3:29:34<5:47:06, 2.72s/it] {'loss': 0.4679, 'grad_norm': 5.082393091909739, 'learning_rate': 3.5671132429779847e-06, 'epoch': 0.38} 38%|███▊ | 4665/12313 [3:29:34<5:47:06, 2.72s/it] 38%|███▊ | 4666/12313 [3:29:36<5:43:33, 2.70s/it] {'loss': 0.501, 'grad_norm': 7.079599784559318, 'learning_rate': 3.566518502115314e-06, 'epoch': 0.38} 38%|███▊ | 4666/12313 [3:29:36<5:43:33, 2.70s/it] 38%|███▊ | 4667/12313 [3:29:39<5:47:21, 2.73s/it] {'loss': 0.6414, 'grad_norm': 6.472514959171711, 'learning_rate': 3.565923687455207e-06, 'epoch': 0.38} 38%|███▊ | 4667/12313 [3:29:39<5:47:21, 2.73s/it] 38%|███▊ | 4668/12313 [3:29:42<5:50:10, 2.75s/it] {'loss': 0.4772, 'grad_norm': 6.442050896136885, 'learning_rate': 3.565328799038822e-06, 'epoch': 0.38} 38%|███▊ | 4668/12313 [3:29:42<5:50:10, 2.75s/it] 38%|███▊ | 4669/12313 [3:29:44<5:41:19, 2.68s/it] {'loss': 0.6315, 'grad_norm': 6.208659634243696, 'learning_rate': 3.5647338369073225e-06, 'epoch': 0.38} 38%|███▊ | 4669/12313 [3:29:44<5:41:19, 2.68s/it] 38%|███▊ | 4670/12313 [3:29:47<5:41:07, 2.68s/it] {'loss': 0.4861, 'grad_norm': 4.322414723790994, 'learning_rate': 3.5641388011018764e-06, 'epoch': 0.38} 38%|███▊ | 4670/12313 [3:29:47<5:41:07, 2.68s/it] 38%|███▊ | 4671/12313 [3:29:50<6:00:48, 2.83s/it] {'loss': 0.6495, 'grad_norm': 6.68918224709947, 'learning_rate': 3.563543691663657e-06, 'epoch': 0.38} 38%|███▊ | 4671/12313 [3:29:50<6:00:48, 2.83s/it] 38%|███▊ | 4672/12313 [3:29:53<5:49:31, 2.74s/it] {'loss': 0.7778, 'grad_norm': 6.412987095133581, 'learning_rate': 3.5629485086338432e-06, 'epoch': 0.38} 38%|███▊ | 4672/12313 [3:29:53<5:49:31, 2.74s/it] 38%|███▊ | 4673/12313 [3:29:56<5:54:56, 2.79s/it] {'loss': 0.4884, 'grad_norm': 4.2253670535048204, 'learning_rate': 3.562353252053618e-06, 'epoch': 0.38} 38%|███▊ | 4673/12313 [3:29:56<5:54:56, 2.79s/it] 38%|███▊ | 4674/12313 [3:29:58<5:40:46, 2.68s/it] {'loss': 0.4152, 'grad_norm': 5.325608575778679, 'learning_rate': 3.56175792196417e-06, 'epoch': 0.38} 38%|███▊ | 4674/12313 [3:29:58<5:40:46, 2.68s/it] 38%|███▊ | 4675/12313 [3:30:01<5:38:54, 2.66s/it] {'loss': 0.5208, 'grad_norm': 5.859119696052518, 'learning_rate': 3.561162518406693e-06, 'epoch': 0.38} 38%|███▊ | 4675/12313 [3:30:01<5:38:54, 2.66s/it] 38%|███▊ | 4676/12313 [3:30:03<5:33:22, 2.62s/it] {'loss': 0.6021, 'grad_norm': 3.929599806942015, 'learning_rate': 3.5605670414223866e-06, 'epoch': 0.38} 38%|███▊ | 4676/12313 [3:30:03<5:33:22, 2.62s/it] 38%|███▊ | 4677/12313 [3:30:06<5:35:14, 2.63s/it] {'loss': 0.6292, 'grad_norm': 10.494504954752514, 'learning_rate': 3.559971491052453e-06, 'epoch': 0.38} 38%|███▊ | 4677/12313 [3:30:06<5:35:14, 2.63s/it] 38%|███▊ | 4678/12313 [3:30:09<5:43:57, 2.70s/it] {'loss': 0.4315, 'grad_norm': 4.64718386046636, 'learning_rate': 3.559375867338103e-06, 'epoch': 0.38} 38%|███▊ | 4678/12313 [3:30:09<5:43:57, 2.70s/it] 38%|███▊ | 4679/12313 [3:30:12<6:02:43, 2.85s/it] {'loss': 0.4433, 'grad_norm': 4.74521085436876, 'learning_rate': 3.5587801703205486e-06, 'epoch': 0.38} 38%|███▊ | 4679/12313 [3:30:12<6:02:43, 2.85s/it] 38%|███▊ | 4680/12313 [3:30:14<5:47:50, 2.73s/it] {'loss': 0.5353, 'grad_norm': 9.15577004594742, 'learning_rate': 3.558184400041011e-06, 'epoch': 0.38} 38%|███▊ | 4680/12313 [3:30:14<5:47:50, 2.73s/it] 38%|███▊ | 4681/12313 [3:30:17<5:41:25, 2.68s/it] {'loss': 0.6834, 'grad_norm': 4.324557986539743, 'learning_rate': 3.557588556540712e-06, 'epoch': 0.38} 38%|███▊ | 4681/12313 [3:30:17<5:41:25, 2.68s/it] 38%|███▊ | 4682/12313 [3:30:20<5:48:56, 2.74s/it] {'loss': 0.487, 'grad_norm': 4.785865260255061, 'learning_rate': 3.556992639860883e-06, 'epoch': 0.38} 38%|███▊ | 4682/12313 [3:30:20<5:48:56, 2.74s/it] 38%|███▊ | 4683/12313 [3:30:23<5:45:00, 2.71s/it] {'loss': 0.4949, 'grad_norm': 7.647172576935408, 'learning_rate': 3.5563966500427577e-06, 'epoch': 0.38} 38%|███▊ | 4683/12313 [3:30:23<5:45:00, 2.71s/it] 38%|███▊ | 4684/12313 [3:30:25<5:39:43, 2.67s/it] {'loss': 0.6732, 'grad_norm': 6.176240715797557, 'learning_rate': 3.555800587127574e-06, 'epoch': 0.38} 38%|███▊ | 4684/12313 [3:30:25<5:39:43, 2.67s/it] 38%|███▊ | 4685/12313 [3:30:28<5:48:44, 2.74s/it] {'loss': 0.6017, 'grad_norm': 2.9442581188680306, 'learning_rate': 3.5552044511565783e-06, 'epoch': 0.38} 38%|███▊ | 4685/12313 [3:30:28<5:48:44, 2.74s/it] 38%|███▊ | 4686/12313 [3:30:31<5:59:10, 2.83s/it] {'loss': 0.5588, 'grad_norm': 3.6958273548877916, 'learning_rate': 3.554608242171019e-06, 'epoch': 0.38} 38%|███▊ | 4686/12313 [3:30:31<5:59:10, 2.83s/it] 38%|███▊ | 4687/12313 [3:30:34<5:59:29, 2.83s/it] {'loss': 0.4675, 'grad_norm': 4.174711443180175, 'learning_rate': 3.554011960212151e-06, 'epoch': 0.38} 38%|███▊ | 4687/12313 [3:30:34<5:59:29, 2.83s/it] 38%|███▊ | 4688/12313 [3:30:37<5:55:29, 2.80s/it] {'loss': 0.5403, 'grad_norm': 4.393276038211761, 'learning_rate': 3.5534156053212333e-06, 'epoch': 0.38} 38%|███▊ | 4688/12313 [3:30:37<5:55:29, 2.80s/it] 38%|███▊ | 4689/12313 [3:30:39<5:51:04, 2.76s/it] {'loss': 0.5421, 'grad_norm': 4.053882314908021, 'learning_rate': 3.5528191775395304e-06, 'epoch': 0.38} 38%|███▊ | 4689/12313 [3:30:39<5:51:04, 2.76s/it] 38%|███▊ | 4690/12313 [3:30:42<5:51:09, 2.76s/it] {'loss': 0.4942, 'grad_norm': 2.5408049268953437, 'learning_rate': 3.552222676908313e-06, 'epoch': 0.38} 38%|███▊ | 4690/12313 [3:30:42<5:51:09, 2.76s/it] 38%|███▊ | 4691/12313 [3:30:45<5:52:26, 2.77s/it] {'loss': 0.5421, 'grad_norm': 10.13235114058035, 'learning_rate': 3.5516261034688547e-06, 'epoch': 0.38} 38%|███▊ | 4691/12313 [3:30:45<5:52:26, 2.77s/it] 38%|███▊ | 4692/12313 [3:30:47<5:39:36, 2.67s/it] {'loss': 0.4957, 'grad_norm': 7.384145049649841, 'learning_rate': 3.5510294572624358e-06, 'epoch': 0.38} 38%|███▊ | 4692/12313 [3:30:47<5:39:36, 2.67s/it] 38%|███▊ | 4693/12313 [3:30:50<5:41:25, 2.69s/it] {'loss': 0.5927, 'grad_norm': 11.958189835069302, 'learning_rate': 3.5504327383303415e-06, 'epoch': 0.38} 38%|███▊ | 4693/12313 [3:30:50<5:41:25, 2.69s/it] 38%|███▊ | 4694/12313 [3:30:53<5:48:15, 2.74s/it] {'loss': 0.5608, 'grad_norm': 3.403027900822607, 'learning_rate': 3.549835946713861e-06, 'epoch': 0.38} 38%|███▊ | 4694/12313 [3:30:53<5:48:15, 2.74s/it] 38%|███▊ | 4695/12313 [3:30:56<5:47:17, 2.74s/it] {'loss': 0.5047, 'grad_norm': 3.057095877975725, 'learning_rate': 3.5492390824542887e-06, 'epoch': 0.38} 38%|███▊ | 4695/12313 [3:30:56<5:47:17, 2.74s/it] 38%|███▊ | 4696/12313 [3:30:58<5:41:53, 2.69s/it] {'loss': 0.4971, 'grad_norm': 6.33274087310348, 'learning_rate': 3.5486421455929253e-06, 'epoch': 0.38} 38%|███▊ | 4696/12313 [3:30:58<5:41:53, 2.69s/it] 38%|███▊ | 4697/12313 [3:31:01<5:42:45, 2.70s/it] {'loss': 0.6343, 'grad_norm': 4.1556831093555555, 'learning_rate': 3.5480451361710744e-06, 'epoch': 0.38} 38%|███▊ | 4697/12313 [3:31:01<5:42:45, 2.70s/it] 38%|███▊ | 4698/12313 [3:31:04<5:39:03, 2.67s/it] {'loss': 0.6561, 'grad_norm': 3.1197068090047786, 'learning_rate': 3.5474480542300475e-06, 'epoch': 0.38} 38%|███▊ | 4698/12313 [3:31:04<5:39:03, 2.67s/it] 38%|███▊ | 4699/12313 [3:31:07<5:56:22, 2.81s/it] {'loss': 0.6627, 'grad_norm': 4.160059988171047, 'learning_rate': 3.5468508998111596e-06, 'epoch': 0.38} 38%|███▊ | 4699/12313 [3:31:07<5:56:22, 2.81s/it] 38%|███▊ | 4700/12313 [3:31:09<5:51:17, 2.77s/it] {'loss': 0.6216, 'grad_norm': 5.478519363239558, 'learning_rate': 3.5462536729557284e-06, 'epoch': 0.38} 38%|███▊ | 4700/12313 [3:31:09<5:51:17, 2.77s/it] 38%|███▊ | 4701/12313 [3:31:12<5:48:27, 2.75s/it] {'loss': 0.6758, 'grad_norm': 6.8557884315946795, 'learning_rate': 3.545656373705081e-06, 'epoch': 0.38} 38%|███▊ | 4701/12313 [3:31:12<5:48:27, 2.75s/it] 38%|███▊ | 4702/12313 [3:31:15<5:57:05, 2.82s/it] {'loss': 0.4609, 'grad_norm': 4.172187135110137, 'learning_rate': 3.5450590021005465e-06, 'epoch': 0.38} 38%|███▊ | 4702/12313 [3:31:15<5:57:05, 2.82s/it] 38%|███▊ | 4703/12313 [3:31:18<5:59:04, 2.83s/it] {'loss': 0.5303, 'grad_norm': 7.593654425254116, 'learning_rate': 3.5444615581834595e-06, 'epoch': 0.38} 38%|███▊ | 4703/12313 [3:31:18<5:59:04, 2.83s/it] 38%|███▊ | 4704/12313 [3:31:21<6:03:03, 2.86s/it] {'loss': 0.6348, 'grad_norm': 2.9965381248103466, 'learning_rate': 3.5438640419951608e-06, 'epoch': 0.38} 38%|███▊ | 4704/12313 [3:31:21<6:03:03, 2.86s/it] 38%|███▊ | 4705/12313 [3:31:23<5:56:17, 2.81s/it] {'loss': 0.7041, 'grad_norm': 3.965887732610272, 'learning_rate': 3.5432664535769952e-06, 'epoch': 0.38} 38%|███▊ | 4705/12313 [3:31:23<5:56:17, 2.81s/it] 38%|███▊ | 4706/12313 [3:31:26<5:57:25, 2.82s/it] {'loss': 0.5335, 'grad_norm': 10.865736295673367, 'learning_rate': 3.5426687929703117e-06, 'epoch': 0.38} 38%|███▊ | 4706/12313 [3:31:26<5:57:25, 2.82s/it] 38%|███▊ | 4707/12313 [3:31:29<5:42:44, 2.70s/it] {'loss': 0.6845, 'grad_norm': 6.090855516494487, 'learning_rate': 3.5420710602164665e-06, 'epoch': 0.38} 38%|███▊ | 4707/12313 [3:31:29<5:42:44, 2.70s/it] 38%|███▊ | 4708/12313 [3:31:32<5:46:30, 2.73s/it] {'loss': 0.5373, 'grad_norm': 3.356332999735362, 'learning_rate': 3.5414732553568194e-06, 'epoch': 0.38} 38%|███▊ | 4708/12313 [3:31:32<5:46:30, 2.73s/it] 38%|███▊ | 4709/12313 [3:31:34<5:38:21, 2.67s/it] {'loss': 0.5448, 'grad_norm': 6.469401787137639, 'learning_rate': 3.5408753784327344e-06, 'epoch': 0.38} 38%|███▊ | 4709/12313 [3:31:34<5:38:21, 2.67s/it] 38%|███▊ | 4710/12313 [3:31:37<5:33:22, 2.63s/it] {'loss': 0.474, 'grad_norm': 5.575865042285411, 'learning_rate': 3.540277429485582e-06, 'epoch': 0.38} 38%|███▊ | 4710/12313 [3:31:37<5:33:22, 2.63s/it] 38%|███▊ | 4711/12313 [3:31:39<5:34:23, 2.64s/it] {'loss': 0.4949, 'grad_norm': 5.15216848497321, 'learning_rate': 3.539679408556737e-06, 'epoch': 0.38} 38%|███▊ | 4711/12313 [3:31:39<5:34:23, 2.64s/it] 38%|███▊ | 4712/12313 [3:31:42<5:30:13, 2.61s/it] {'loss': 0.4684, 'grad_norm': 4.447503975145806, 'learning_rate': 3.5390813156875792e-06, 'epoch': 0.38} 38%|███▊ | 4712/12313 [3:31:42<5:30:13, 2.61s/it] 38%|███▊ | 4713/12313 [3:31:44<5:25:00, 2.57s/it] {'loss': 0.4992, 'grad_norm': 6.602421808698497, 'learning_rate': 3.538483150919494e-06, 'epoch': 0.38} 38%|███▊ | 4713/12313 [3:31:44<5:25:00, 2.57s/it] 38%|███▊ | 4714/12313 [3:31:47<5:24:15, 2.56s/it] {'loss': 0.5644, 'grad_norm': 9.959981744936794, 'learning_rate': 3.537884914293871e-06, 'epoch': 0.38} 38%|███▊ | 4714/12313 [3:31:47<5:24:15, 2.56s/it] 38%|███▊ | 4715/12313 [3:31:49<5:19:26, 2.52s/it] {'loss': 0.5316, 'grad_norm': 4.773698731849422, 'learning_rate': 3.537286605852105e-06, 'epoch': 0.38} 38%|███▊ | 4715/12313 [3:31:49<5:19:26, 2.52s/it] 38%|███▊ | 4716/12313 [3:31:52<5:26:54, 2.58s/it] {'loss': 0.4393, 'grad_norm': 5.852883004242007, 'learning_rate': 3.536688225635595e-06, 'epoch': 0.38} 38%|███▊ | 4716/12313 [3:31:52<5:26:54, 2.58s/it] 38%|███▊ | 4717/12313 [3:31:55<5:31:50, 2.62s/it] {'loss': 0.7132, 'grad_norm': 3.5878863623629442, 'learning_rate': 3.5360897736857464e-06, 'epoch': 0.38} 38%|███▊ | 4717/12313 [3:31:55<5:31:50, 2.62s/it] 38%|███▊ | 4718/12313 [3:31:57<5:30:56, 2.61s/it] {'loss': 0.5248, 'grad_norm': 5.620156517899439, 'learning_rate': 3.5354912500439696e-06, 'epoch': 0.38} 38%|███▊ | 4718/12313 [3:31:57<5:30:56, 2.61s/it] 38%|███▊ | 4719/12313 [3:32:00<5:22:50, 2.55s/it] {'loss': 0.5087, 'grad_norm': 7.422807956148856, 'learning_rate': 3.5348926547516783e-06, 'epoch': 0.38} 38%|███▊ | 4719/12313 [3:32:00<5:22:50, 2.55s/it] 38%|███▊ | 4720/12313 [3:32:02<5:26:09, 2.58s/it] {'loss': 0.7483, 'grad_norm': 5.38906725850609, 'learning_rate': 3.534293987850291e-06, 'epoch': 0.38} 38%|███▊ | 4720/12313 [3:32:02<5:26:09, 2.58s/it] 38%|███▊ | 4721/12313 [3:32:05<5:32:09, 2.63s/it] {'loss': 0.6177, 'grad_norm': 4.767084586004944, 'learning_rate': 3.5336952493812353e-06, 'epoch': 0.38} 38%|███▊ | 4721/12313 [3:32:05<5:32:09, 2.63s/it] 38%|███▊ | 4722/12313 [3:32:08<5:28:09, 2.59s/it] {'loss': 0.5982, 'grad_norm': 4.383751010173588, 'learning_rate': 3.533096439385939e-06, 'epoch': 0.38} 38%|███▊ | 4722/12313 [3:32:08<5:28:09, 2.59s/it] 38%|███▊ | 4723/12313 [3:32:10<5:29:36, 2.61s/it] {'loss': 0.6254, 'grad_norm': 18.60580065511833, 'learning_rate': 3.532497557905836e-06, 'epoch': 0.38} 38%|███▊ | 4723/12313 [3:32:10<5:29:36, 2.61s/it] 38%|███▊ | 4724/12313 [3:32:13<5:25:48, 2.58s/it] {'loss': 0.4813, 'grad_norm': 4.724647861890121, 'learning_rate': 3.531898604982367e-06, 'epoch': 0.38} 38%|███▊ | 4724/12313 [3:32:13<5:25:48, 2.58s/it] 38%|███▊ | 4725/12313 [3:32:15<5:28:11, 2.60s/it] {'loss': 0.604, 'grad_norm': 5.525218151959553, 'learning_rate': 3.5312995806569754e-06, 'epoch': 0.38} 38%|███▊ | 4725/12313 [3:32:15<5:28:11, 2.60s/it] 38%|███▊ | 4726/12313 [3:32:18<5:36:20, 2.66s/it] {'loss': 0.6971, 'grad_norm': 4.419767437379395, 'learning_rate': 3.5307004849711114e-06, 'epoch': 0.38} 38%|███▊ | 4726/12313 [3:32:18<5:36:20, 2.66s/it] 38%|███▊ | 4727/12313 [3:32:21<5:35:24, 2.65s/it] {'loss': 0.4638, 'grad_norm': 4.847805495392348, 'learning_rate': 3.530101317966228e-06, 'epoch': 0.38} 38%|███▊ | 4727/12313 [3:32:21<5:35:24, 2.65s/it] 38%|███▊ | 4728/12313 [3:32:23<5:35:21, 2.65s/it] {'loss': 0.503, 'grad_norm': 4.89939462212519, 'learning_rate': 3.5295020796837854e-06, 'epoch': 0.38} 38%|███▊ | 4728/12313 [3:32:23<5:35:21, 2.65s/it] 38%|███▊ | 4729/12313 [3:32:26<5:32:58, 2.63s/it] {'loss': 0.5719, 'grad_norm': 4.244557296775247, 'learning_rate': 3.528902770165248e-06, 'epoch': 0.38} 38%|███▊ | 4729/12313 [3:32:26<5:32:58, 2.63s/it] 38%|███▊ | 4730/12313 [3:32:29<5:34:24, 2.65s/it] {'loss': 0.4718, 'grad_norm': 8.538711454427421, 'learning_rate': 3.5283033894520836e-06, 'epoch': 0.38} 38%|███▊ | 4730/12313 [3:32:29<5:34:24, 2.65s/it] 38%|███▊ | 4731/12313 [3:32:31<5:34:54, 2.65s/it] {'loss': 0.5757, 'grad_norm': 3.6208666339316995, 'learning_rate': 3.5277039375857677e-06, 'epoch': 0.38} 38%|███▊ | 4731/12313 [3:32:31<5:34:54, 2.65s/it] 38%|███▊ | 4732/12313 [3:32:34<5:33:45, 2.64s/it] {'loss': 0.5931, 'grad_norm': 6.751106985384788, 'learning_rate': 3.5271044146077773e-06, 'epoch': 0.38} 38%|███▊ | 4732/12313 [3:32:34<5:33:45, 2.64s/it] 38%|███▊ | 4733/12313 [3:32:37<5:37:55, 2.67s/it] {'loss': 0.4918, 'grad_norm': 7.411379001602781, 'learning_rate': 3.5265048205595976e-06, 'epoch': 0.38} 38%|███▊ | 4733/12313 [3:32:37<5:37:55, 2.67s/it] 38%|███▊ | 4734/12313 [3:32:39<5:36:19, 2.66s/it] {'loss': 0.4503, 'grad_norm': 4.0826091607694455, 'learning_rate': 3.5259051554827175e-06, 'epoch': 0.38} 38%|███▊ | 4734/12313 [3:32:39<5:36:19, 2.66s/it] 38%|███▊ | 4735/12313 [3:32:42<5:36:53, 2.67s/it] {'loss': 0.5551, 'grad_norm': 4.479671138778023, 'learning_rate': 3.5253054194186297e-06, 'epoch': 0.38} 38%|███▊ | 4735/12313 [3:32:42<5:36:53, 2.67s/it] 38%|███▊ | 4736/12313 [3:32:45<5:35:20, 2.66s/it] {'loss': 0.6683, 'grad_norm': 6.005050130154772, 'learning_rate': 3.524705612408833e-06, 'epoch': 0.38} 38%|███▊ | 4736/12313 [3:32:45<5:35:20, 2.66s/it] 38%|███▊ | 4737/12313 [3:32:47<5:33:41, 2.64s/it] {'loss': 0.5399, 'grad_norm': 5.171375305373519, 'learning_rate': 3.5241057344948317e-06, 'epoch': 0.38} 38%|███▊ | 4737/12313 [3:32:47<5:33:41, 2.64s/it] 38%|███▊ | 4738/12313 [3:32:50<5:34:53, 2.65s/it] {'loss': 0.6731, 'grad_norm': 10.084203556597553, 'learning_rate': 3.523505785718133e-06, 'epoch': 0.38} 38%|███▊ | 4738/12313 [3:32:50<5:34:53, 2.65s/it] 38%|███▊ | 4739/12313 [3:32:53<5:39:44, 2.69s/it] {'loss': 0.4689, 'grad_norm': 6.878409019634097, 'learning_rate': 3.5229057661202513e-06, 'epoch': 0.38} 38%|███▊ | 4739/12313 [3:32:53<5:39:44, 2.69s/it] 38%|███▊ | 4740/12313 [3:32:56<5:42:02, 2.71s/it] {'loss': 0.5154, 'grad_norm': 5.680734133112059, 'learning_rate': 3.5223056757427044e-06, 'epoch': 0.38} 38%|███▊ | 4740/12313 [3:32:56<5:42:02, 2.71s/it] 39%|███▊ | 4741/12313 [3:32:59<6:07:06, 2.91s/it] {'loss': 0.5012, 'grad_norm': 2.861081172022858, 'learning_rate': 3.5217055146270144e-06, 'epoch': 0.39} 39%|███▊ | 4741/12313 [3:32:59<6:07:06, 2.91s/it] 39%|███▊ | 4742/12313 [3:33:01<5:52:56, 2.80s/it] {'loss': 0.5743, 'grad_norm': 7.512509192657636, 'learning_rate': 3.5211052828147114e-06, 'epoch': 0.39} 39%|███▊ | 4742/12313 [3:33:01<5:52:56, 2.80s/it] 39%|███▊ | 4743/12313 [3:33:04<5:51:37, 2.79s/it] {'loss': 0.5381, 'grad_norm': 7.512493844062562, 'learning_rate': 3.5205049803473257e-06, 'epoch': 0.39} 39%|███▊ | 4743/12313 [3:33:04<5:51:37, 2.79s/it] 39%|███▊ | 4744/12313 [3:33:07<5:59:43, 2.85s/it] {'loss': 0.5969, 'grad_norm': 4.916065054864969, 'learning_rate': 3.5199046072663968e-06, 'epoch': 0.39} 39%|███▊ | 4744/12313 [3:33:07<5:59:43, 2.85s/it] 39%|███▊ | 4745/12313 [3:33:10<5:42:27, 2.72s/it] {'loss': 0.896, 'grad_norm': 5.767505645700775, 'learning_rate': 3.5193041636134673e-06, 'epoch': 0.39} 39%|███▊ | 4745/12313 [3:33:10<5:42:27, 2.72s/it] 39%|███▊ | 4746/12313 [3:33:12<5:35:50, 2.66s/it] {'loss': 0.6188, 'grad_norm': 3.7412254239286566, 'learning_rate': 3.518703649430083e-06, 'epoch': 0.39} 39%|███▊ | 4746/12313 [3:33:12<5:35:50, 2.66s/it] 39%|███▊ | 4747/12313 [3:33:15<5:43:07, 2.72s/it] {'loss': 0.4346, 'grad_norm': 5.626427073356312, 'learning_rate': 3.518103064757798e-06, 'epoch': 0.39} 39%|███▊ | 4747/12313 [3:33:15<5:43:07, 2.72s/it] 39%|███▊ | 4748/12313 [3:33:18<5:43:25, 2.72s/it] {'loss': 0.5342, 'grad_norm': 5.8362139577169, 'learning_rate': 3.51750240963817e-06, 'epoch': 0.39} 39%|███▊ | 4748/12313 [3:33:18<5:43:25, 2.72s/it] 39%|███▊ | 4749/12313 [3:33:20<5:36:42, 2.67s/it] {'loss': 0.6567, 'grad_norm': 4.982453547043835, 'learning_rate': 3.516901684112759e-06, 'epoch': 0.39} 39%|███▊ | 4749/12313 [3:33:20<5:36:42, 2.67s/it] 39%|███▊ | 4750/12313 [3:33:23<5:33:23, 2.64s/it] {'loss': 0.5791, 'grad_norm': 4.093696003385388, 'learning_rate': 3.5163008882231347e-06, 'epoch': 0.39} 39%|███▊ | 4750/12313 [3:33:23<5:33:23, 2.64s/it] 39%|███▊ | 4751/12313 [3:33:26<5:50:58, 2.78s/it] {'loss': 0.4339, 'grad_norm': 5.642536053815455, 'learning_rate': 3.5157000220108674e-06, 'epoch': 0.39} 39%|███▊ | 4751/12313 [3:33:26<5:50:58, 2.78s/it] 39%|███▊ | 4752/12313 [3:33:29<5:45:03, 2.74s/it] {'loss': 0.6173, 'grad_norm': 4.2134737972824, 'learning_rate': 3.5150990855175337e-06, 'epoch': 0.39} 39%|███▊ | 4752/12313 [3:33:29<5:45:03, 2.74s/it] 39%|███▊ | 4753/12313 [3:33:31<5:42:11, 2.72s/it] {'loss': 0.7697, 'grad_norm': 7.7835339423428405, 'learning_rate': 3.5144980787847155e-06, 'epoch': 0.39} 39%|███▊ | 4753/12313 [3:33:31<5:42:11, 2.72s/it] 39%|███▊ | 4754/12313 [3:33:34<5:58:32, 2.85s/it] {'loss': 0.4681, 'grad_norm': 4.445884072963403, 'learning_rate': 3.5138970018539998e-06, 'epoch': 0.39} 39%|███▊ | 4754/12313 [3:33:34<5:58:32, 2.85s/it] 39%|███▊ | 4755/12313 [3:33:37<5:51:38, 2.79s/it] {'loss': 0.4439, 'grad_norm': 7.397822019738059, 'learning_rate': 3.513295854766977e-06, 'epoch': 0.39} 39%|███▊ | 4755/12313 [3:33:37<5:51:38, 2.79s/it] 39%|███▊ | 4756/12313 [3:33:40<5:42:30, 2.72s/it] {'loss': 0.5881, 'grad_norm': 5.83757001159602, 'learning_rate': 3.5126946375652443e-06, 'epoch': 0.39} 39%|███▊ | 4756/12313 [3:33:40<5:42:30, 2.72s/it] 39%|███▊ | 4757/12313 [3:33:42<5:48:07, 2.76s/it] {'loss': 0.4453, 'grad_norm': 5.649456450527899, 'learning_rate': 3.512093350290402e-06, 'epoch': 0.39} 39%|███▊ | 4757/12313 [3:33:42<5:48:07, 2.76s/it] 39%|███▊ | 4758/12313 [3:33:45<5:48:54, 2.77s/it] {'loss': 0.6149, 'grad_norm': 4.735284890427706, 'learning_rate': 3.511491992984057e-06, 'epoch': 0.39} 39%|███▊ | 4758/12313 [3:33:45<5:48:54, 2.77s/it] 39%|███▊ | 4759/12313 [3:33:48<6:04:43, 2.90s/it] {'loss': 0.4296, 'grad_norm': 9.984154759576063, 'learning_rate': 3.510890565687818e-06, 'epoch': 0.39} 39%|███▊ | 4759/12313 [3:33:48<6:04:43, 2.90s/it] 39%|███▊ | 4760/12313 [3:33:51<6:03:53, 2.89s/it] {'loss': 0.5894, 'grad_norm': 5.043149999950322, 'learning_rate': 3.5102890684433026e-06, 'epoch': 0.39} 39%|███▊ | 4760/12313 [3:33:51<6:03:53, 2.89s/it] 39%|███▊ | 4761/12313 [3:33:54<5:55:29, 2.82s/it] {'loss': 0.3679, 'grad_norm': 4.065716569234592, 'learning_rate': 3.509687501292132e-06, 'epoch': 0.39} 39%|███▊ | 4761/12313 [3:33:54<5:55:29, 2.82s/it] 39%|███▊ | 4762/12313 [3:33:56<5:41:38, 2.71s/it] {'loss': 0.4891, 'grad_norm': 5.553031924098756, 'learning_rate': 3.5090858642759273e-06, 'epoch': 0.39} 39%|███▊ | 4762/12313 [3:33:56<5:41:38, 2.71s/it] 39%|███▊ | 4763/12313 [3:33:59<5:31:08, 2.63s/it] {'loss': 0.5457, 'grad_norm': 9.139353955383072, 'learning_rate': 3.5084841574363227e-06, 'epoch': 0.39} 39%|███▊ | 4763/12313 [3:33:59<5:31:08, 2.63s/it] 39%|███▊ | 4764/12313 [3:34:01<5:27:49, 2.61s/it] {'loss': 0.7012, 'grad_norm': 5.211077508553389, 'learning_rate': 3.507882380814952e-06, 'epoch': 0.39} 39%|███▊ | 4764/12313 [3:34:01<5:27:49, 2.61s/it] 39%|███▊ | 4765/12313 [3:34:04<5:39:22, 2.70s/it] {'loss': 0.5159, 'grad_norm': 3.9781818293321027, 'learning_rate': 3.507280534453454e-06, 'epoch': 0.39} 39%|███▊ | 4765/12313 [3:34:04<5:39:22, 2.70s/it] 39%|███▊ | 4766/12313 [3:34:07<5:37:08, 2.68s/it] {'loss': 0.409, 'grad_norm': 3.7855417874921455, 'learning_rate': 3.5066786183934743e-06, 'epoch': 0.39} 39%|███▊ | 4766/12313 [3:34:07<5:37:08, 2.68s/it] 39%|███▊ | 4767/12313 [3:34:10<5:48:20, 2.77s/it] {'loss': 0.5835, 'grad_norm': 5.51441031251894, 'learning_rate': 3.5060766326766626e-06, 'epoch': 0.39} 39%|███▊ | 4767/12313 [3:34:10<5:48:20, 2.77s/it] 39%|███▊ | 4768/12313 [3:34:13<5:44:07, 2.74s/it] {'loss': 0.4933, 'grad_norm': 5.729949936654051, 'learning_rate': 3.505474577344672e-06, 'epoch': 0.39} 39%|███▊ | 4768/12313 [3:34:13<5:44:07, 2.74s/it] 39%|███▊ | 4769/12313 [3:34:15<5:36:53, 2.68s/it] {'loss': 0.6594, 'grad_norm': 4.617510877977197, 'learning_rate': 3.504872452439162e-06, 'epoch': 0.39} 39%|███▊ | 4769/12313 [3:34:15<5:36:53, 2.68s/it] 39%|███▊ | 4770/12313 [3:34:18<5:41:35, 2.72s/it] {'loss': 0.6615, 'grad_norm': 4.618436696346881, 'learning_rate': 3.504270258001796e-06, 'epoch': 0.39} 39%|███▊ | 4770/12313 [3:34:18<5:41:35, 2.72s/it] 39%|███▊ | 4771/12313 [3:34:21<5:40:33, 2.71s/it] {'loss': 0.2883, 'grad_norm': 3.771460379690749, 'learning_rate': 3.503667994074244e-06, 'epoch': 0.39} 39%|███▊ | 4771/12313 [3:34:21<5:40:33, 2.71s/it] 39%|███▉ | 4772/12313 [3:34:23<5:35:32, 2.67s/it] {'loss': 0.4846, 'grad_norm': 6.252805853476455, 'learning_rate': 3.5030656606981783e-06, 'epoch': 0.39} 39%|███▉ | 4772/12313 [3:34:23<5:35:32, 2.67s/it] 39%|███▉ | 4773/12313 [3:34:26<5:33:46, 2.66s/it] {'loss': 0.4682, 'grad_norm': 6.604659609809073, 'learning_rate': 3.5024632579152775e-06, 'epoch': 0.39} 39%|███▉ | 4773/12313 [3:34:26<5:33:46, 2.66s/it] 39%|███▉ | 4774/12313 [3:34:29<5:35:43, 2.67s/it] {'loss': 0.5166, 'grad_norm': 8.26513746625276, 'learning_rate': 3.501860785767225e-06, 'epoch': 0.39} 39%|███▉ | 4774/12313 [3:34:29<5:35:43, 2.67s/it] 39%|███▉ | 4775/12313 [3:34:31<5:33:24, 2.65s/it] {'loss': 0.7179, 'grad_norm': 4.4406417232924085, 'learning_rate': 3.5012582442957077e-06, 'epoch': 0.39} 39%|███▉ | 4775/12313 [3:34:31<5:33:24, 2.65s/it] 39%|███▉ | 4776/12313 [3:34:34<5:35:55, 2.67s/it] {'loss': 0.3686, 'grad_norm': 13.64277952343109, 'learning_rate': 3.5006556335424197e-06, 'epoch': 0.39} 39%|███▉ | 4776/12313 [3:34:34<5:35:55, 2.67s/it] 39%|███▉ | 4777/12313 [3:34:37<5:34:49, 2.67s/it] {'loss': 0.6139, 'grad_norm': 4.226503480824964, 'learning_rate': 3.500052953549058e-06, 'epoch': 0.39} 39%|███▉ | 4777/12313 [3:34:37<5:34:49, 2.67s/it] 39%|███▉ | 4778/12313 [3:34:39<5:30:07, 2.63s/it] {'loss': 0.7821, 'grad_norm': 4.974191546797816, 'learning_rate': 3.4994502043573237e-06, 'epoch': 0.39} 39%|███▉ | 4778/12313 [3:34:39<5:30:07, 2.63s/it] 39%|███▉ | 4779/12313 [3:34:42<5:20:47, 2.55s/it] {'loss': 0.5492, 'grad_norm': 6.7453632874753815, 'learning_rate': 3.498847386008925e-06, 'epoch': 0.39} 39%|███▉ | 4779/12313 [3:34:42<5:20:47, 2.55s/it] 39%|███▉ | 4780/12313 [3:34:44<5:26:40, 2.60s/it] {'loss': 0.5192, 'grad_norm': 4.381360711445991, 'learning_rate': 3.4982444985455744e-06, 'epoch': 0.39} 39%|███▉ | 4780/12313 [3:34:44<5:26:40, 2.60s/it] 39%|███▉ | 4781/12313 [3:34:47<5:24:25, 2.58s/it] {'loss': 0.5407, 'grad_norm': 4.91701938364453, 'learning_rate': 3.4976415420089865e-06, 'epoch': 0.39} 39%|███▉ | 4781/12313 [3:34:47<5:24:25, 2.58s/it] 39%|███▉ | 4782/12313 [3:34:50<5:31:00, 2.64s/it] {'loss': 0.466, 'grad_norm': 5.048580518545317, 'learning_rate': 3.4970385164408837e-06, 'epoch': 0.39} 39%|███▉ | 4782/12313 [3:34:50<5:31:00, 2.64s/it] 39%|███▉ | 4783/12313 [3:34:52<5:31:07, 2.64s/it] {'loss': 0.7516, 'grad_norm': 4.973082626749845, 'learning_rate': 3.496435421882994e-06, 'epoch': 0.39} 39%|███▉ | 4783/12313 [3:34:52<5:31:07, 2.64s/it] 39%|███▉ | 4784/12313 [3:34:55<5:27:42, 2.61s/it] {'loss': 0.593, 'grad_norm': 5.743564392198817, 'learning_rate': 3.4958322583770453e-06, 'epoch': 0.39} 39%|███▉ | 4784/12313 [3:34:55<5:27:42, 2.61s/it] 39%|███▉ | 4785/12313 [3:34:57<5:33:12, 2.66s/it] {'loss': 0.4979, 'grad_norm': 4.979299976934781, 'learning_rate': 3.495229025964775e-06, 'epoch': 0.39} 39%|███▉ | 4785/12313 [3:34:57<5:33:12, 2.66s/it] 39%|███▉ | 4786/12313 [3:35:00<5:36:43, 2.68s/it] {'loss': 0.6043, 'grad_norm': 12.613122878358682, 'learning_rate': 3.494625724687923e-06, 'epoch': 0.39} 39%|███▉ | 4786/12313 [3:35:00<5:36:43, 2.68s/it] 39%|███▉ | 4787/12313 [3:35:03<5:28:00, 2.62s/it] {'loss': 0.7142, 'grad_norm': 3.85614729618497, 'learning_rate': 3.494022354588235e-06, 'epoch': 0.39} 39%|███▉ | 4787/12313 [3:35:03<5:28:00, 2.62s/it] 39%|███▉ | 4788/12313 [3:35:06<5:45:37, 2.76s/it] {'loss': 0.4985, 'grad_norm': 6.046549551329485, 'learning_rate': 3.493418915707461e-06, 'epoch': 0.39} 39%|███▉ | 4788/12313 [3:35:06<5:45:37, 2.76s/it] 39%|███▉ | 4789/12313 [3:35:08<5:41:41, 2.72s/it] {'loss': 0.7017, 'grad_norm': 5.004952219579293, 'learning_rate': 3.4928154080873556e-06, 'epoch': 0.39} 39%|███▉ | 4789/12313 [3:35:08<5:41:41, 2.72s/it] 39%|███▉ | 4790/12313 [3:35:11<5:39:11, 2.71s/it] {'loss': 0.5423, 'grad_norm': 6.147470697065457, 'learning_rate': 3.4922118317696785e-06, 'epoch': 0.39} 39%|███▉ | 4790/12313 [3:35:11<5:39:11, 2.71s/it] 39%|███▉ | 4791/12313 [3:35:14<5:37:16, 2.69s/it] {'loss': 0.6039, 'grad_norm': 6.876508550025277, 'learning_rate': 3.491608186796193e-06, 'epoch': 0.39} 39%|███▉ | 4791/12313 [3:35:14<5:37:16, 2.69s/it] 39%|███▉ | 4792/12313 [3:35:16<5:32:58, 2.66s/it] {'loss': 0.5495, 'grad_norm': 10.41135603101845, 'learning_rate': 3.49100447320867e-06, 'epoch': 0.39} 39%|███▉ | 4792/12313 [3:35:16<5:32:58, 2.66s/it] 39%|███▉ | 4793/12313 [3:35:19<5:42:08, 2.73s/it] {'loss': 0.5267, 'grad_norm': 3.362847757169651, 'learning_rate': 3.4904006910488824e-06, 'epoch': 0.39} 39%|███▉ | 4793/12313 [3:35:19<5:42:08, 2.73s/it] 39%|███▉ | 4794/12313 [3:35:22<5:39:23, 2.71s/it] {'loss': 0.5027, 'grad_norm': 8.868539359623025, 'learning_rate': 3.489796840358608e-06, 'epoch': 0.39} 39%|███▉ | 4794/12313 [3:35:22<5:39:23, 2.71s/it] 39%|███▉ | 4795/12313 [3:35:25<5:45:22, 2.76s/it] {'loss': 0.6388, 'grad_norm': 3.869570039313511, 'learning_rate': 3.4891929211796303e-06, 'epoch': 0.39} 39%|███▉ | 4795/12313 [3:35:25<5:45:22, 2.76s/it] 39%|███▉ | 4796/12313 [3:35:27<5:39:03, 2.71s/it] {'loss': 0.7654, 'grad_norm': 18.46976613986865, 'learning_rate': 3.488588933553739e-06, 'epoch': 0.39} 39%|███▉ | 4796/12313 [3:35:27<5:39:03, 2.71s/it] 39%|███▉ | 4797/12313 [3:35:30<5:35:28, 2.68s/it] {'loss': 0.5233, 'grad_norm': 4.462040282578044, 'learning_rate': 3.4879848775227243e-06, 'epoch': 0.39} 39%|███▉ | 4797/12313 [3:35:30<5:35:28, 2.68s/it] 39%|███▉ | 4798/12313 [3:35:33<5:31:27, 2.65s/it] {'loss': 0.463, 'grad_norm': 16.476898959715022, 'learning_rate': 3.487380753128385e-06, 'epoch': 0.39} 39%|███▉ | 4798/12313 [3:35:33<5:31:27, 2.65s/it] 39%|███▉ | 4799/12313 [3:35:35<5:38:24, 2.70s/it] {'loss': 0.3472, 'grad_norm': 4.641132700498576, 'learning_rate': 3.4867765604125236e-06, 'epoch': 0.39} 39%|███▉ | 4799/12313 [3:35:35<5:38:24, 2.70s/it] 39%|███▉ | 4800/12313 [3:35:38<5:38:07, 2.70s/it] {'loss': 0.4521, 'grad_norm': 4.809948864229084, 'learning_rate': 3.4861722994169466e-06, 'epoch': 0.39} 39%|███▉ | 4800/12313 [3:35:38<5:38:07, 2.70s/it] 39%|███▉ | 4801/12313 [3:35:41<5:37:43, 2.70s/it] {'loss': 0.5308, 'grad_norm': 4.6897556369620625, 'learning_rate': 3.485567970183466e-06, 'epoch': 0.39} 39%|███▉ | 4801/12313 [3:35:41<5:37:43, 2.70s/it] 39%|███▉ | 4802/12313 [3:35:43<5:33:50, 2.67s/it] {'loss': 0.4595, 'grad_norm': 4.712889214741922, 'learning_rate': 3.484963572753898e-06, 'epoch': 0.39} 39%|███▉ | 4802/12313 [3:35:43<5:33:50, 2.67s/it] 39%|███▉ | 4803/12313 [3:35:46<5:36:21, 2.69s/it] {'loss': 0.6773, 'grad_norm': 3.914385123227235, 'learning_rate': 3.4843591071700627e-06, 'epoch': 0.39} 39%|███▉ | 4803/12313 [3:35:46<5:36:21, 2.69s/it] 39%|███▉ | 4804/12313 [3:35:49<5:27:59, 2.62s/it] {'loss': 0.4701, 'grad_norm': 3.7046943355774147, 'learning_rate': 3.4837545734737877e-06, 'epoch': 0.39} 39%|███▉ | 4804/12313 [3:35:49<5:27:59, 2.62s/it] 39%|███▉ | 4805/12313 [3:35:51<5:30:13, 2.64s/it] {'loss': 0.5245, 'grad_norm': 5.273455209488689, 'learning_rate': 3.483149971706902e-06, 'epoch': 0.39} 39%|███▉ | 4805/12313 [3:35:51<5:30:13, 2.64s/it] 39%|███▉ | 4806/12313 [3:35:54<5:29:20, 2.63s/it] {'loss': 0.58, 'grad_norm': 5.259747071201792, 'learning_rate': 3.482545301911242e-06, 'epoch': 0.39} 39%|███▉ | 4806/12313 [3:35:54<5:29:20, 2.63s/it] 39%|███▉ | 4807/12313 [3:35:57<5:37:43, 2.70s/it] {'loss': 0.5728, 'grad_norm': 3.865781563186537, 'learning_rate': 3.4819405641286476e-06, 'epoch': 0.39} 39%|███▉ | 4807/12313 [3:35:57<5:37:43, 2.70s/it] 39%|███▉ | 4808/12313 [3:35:59<5:33:29, 2.67s/it] {'loss': 0.406, 'grad_norm': 4.430734119933856, 'learning_rate': 3.481335758400962e-06, 'epoch': 0.39} 39%|███▉ | 4808/12313 [3:35:59<5:33:29, 2.67s/it] 39%|███▉ | 4809/12313 [3:36:02<5:35:00, 2.68s/it] {'loss': 0.5959, 'grad_norm': 3.1468197398544295, 'learning_rate': 3.480730884770036e-06, 'epoch': 0.39} 39%|███▉ | 4809/12313 [3:36:02<5:35:00, 2.68s/it] 39%|███▉ | 4810/12313 [3:36:05<5:35:33, 2.68s/it] {'loss': 0.534, 'grad_norm': 4.928724212153909, 'learning_rate': 3.4801259432777236e-06, 'epoch': 0.39} 39%|███▉ | 4810/12313 [3:36:05<5:35:33, 2.68s/it] 39%|███▉ | 4811/12313 [3:36:07<5:30:09, 2.64s/it] {'loss': 0.6451, 'grad_norm': 5.338771481365462, 'learning_rate': 3.479520933965882e-06, 'epoch': 0.39} 39%|███▉ | 4811/12313 [3:36:07<5:30:09, 2.64s/it] 39%|███▉ | 4812/12313 [3:36:10<5:20:26, 2.56s/it] {'loss': 0.6479, 'grad_norm': 4.865233384961484, 'learning_rate': 3.4789158568763777e-06, 'epoch': 0.39} 39%|███▉ | 4812/12313 [3:36:10<5:20:26, 2.56s/it] 39%|███▉ | 4813/12313 [3:36:12<5:26:17, 2.61s/it] {'loss': 0.4542, 'grad_norm': 6.708639239866096, 'learning_rate': 3.4783107120510758e-06, 'epoch': 0.39} 39%|███▉ | 4813/12313 [3:36:12<5:26:17, 2.61s/it] 39%|███▉ | 4814/12313 [3:36:15<5:39:38, 2.72s/it] {'loss': 0.6198, 'grad_norm': 5.797570016494007, 'learning_rate': 3.4777054995318493e-06, 'epoch': 0.39} 39%|███▉ | 4814/12313 [3:36:15<5:39:38, 2.72s/it] 39%|███▉ | 4815/12313 [3:36:18<5:35:41, 2.69s/it] {'loss': 0.5544, 'grad_norm': 3.723719927731895, 'learning_rate': 3.4771002193605783e-06, 'epoch': 0.39} 39%|███▉ | 4815/12313 [3:36:18<5:35:41, 2.69s/it] 39%|███▉ | 4816/12313 [3:36:21<5:38:20, 2.71s/it] {'loss': 0.5319, 'grad_norm': 4.734496419671341, 'learning_rate': 3.4764948715791425e-06, 'epoch': 0.39} 39%|███▉ | 4816/12313 [3:36:21<5:38:20, 2.71s/it] 39%|███▉ | 4817/12313 [3:36:24<5:46:54, 2.78s/it] {'loss': 0.6067, 'grad_norm': 3.483732371245711, 'learning_rate': 3.47588945622943e-06, 'epoch': 0.39} 39%|███▉ | 4817/12313 [3:36:24<5:46:54, 2.78s/it] 39%|███▉ | 4818/12313 [3:36:26<5:39:28, 2.72s/it] {'loss': 0.7024, 'grad_norm': 3.7596387481376143, 'learning_rate': 3.4752839733533315e-06, 'epoch': 0.39} 39%|███▉ | 4818/12313 [3:36:26<5:39:28, 2.72s/it] 39%|███▉ | 4819/12313 [3:36:29<5:43:46, 2.75s/it] {'loss': 0.5705, 'grad_norm': 6.235595740178511, 'learning_rate': 3.4746784229927445e-06, 'epoch': 0.39} 39%|███▉ | 4819/12313 [3:36:29<5:43:46, 2.75s/it] 39%|███▉ | 4820/12313 [3:36:32<5:49:06, 2.80s/it] {'loss': 0.4127, 'grad_norm': 4.914796686357341, 'learning_rate': 3.4740728051895683e-06, 'epoch': 0.39} 39%|███▉ | 4820/12313 [3:36:32<5:49:06, 2.80s/it] 39%|███▉ | 4821/12313 [3:36:35<5:45:44, 2.77s/it] {'loss': 0.461, 'grad_norm': 6.037035106034122, 'learning_rate': 3.4734671199857093e-06, 'epoch': 0.39} 39%|███▉ | 4821/12313 [3:36:35<5:45:44, 2.77s/it] 39%|███▉ | 4822/12313 [3:36:37<5:50:05, 2.80s/it] {'loss': 0.4533, 'grad_norm': 4.084676397060007, 'learning_rate': 3.4728613674230777e-06, 'epoch': 0.39} 39%|███▉ | 4822/12313 [3:36:37<5:50:05, 2.80s/it] 39%|███▉ | 4823/12313 [3:36:40<5:39:50, 2.72s/it] {'loss': 0.4948, 'grad_norm': 5.820986362019269, 'learning_rate': 3.472255547543589e-06, 'epoch': 0.39} 39%|███▉ | 4823/12313 [3:36:40<5:39:50, 2.72s/it] 39%|███▉ | 4824/12313 [3:36:43<5:35:13, 2.69s/it] {'loss': 0.6856, 'grad_norm': 6.625284347900165, 'learning_rate': 3.4716496603891605e-06, 'epoch': 0.39} 39%|███▉ | 4824/12313 [3:36:43<5:35:13, 2.69s/it] 39%|███▉ | 4825/12313 [3:36:45<5:27:33, 2.62s/it] {'loss': 0.5442, 'grad_norm': 4.751431998722198, 'learning_rate': 3.471043706001719e-06, 'epoch': 0.39} 39%|███▉ | 4825/12313 [3:36:45<5:27:33, 2.62s/it] 39%|███▉ | 4826/12313 [3:36:48<5:23:29, 2.59s/it] {'loss': 0.5568, 'grad_norm': 3.789894887233057, 'learning_rate': 3.4704376844231922e-06, 'epoch': 0.39} 39%|███▉ | 4826/12313 [3:36:48<5:23:29, 2.59s/it] 39%|███▉ | 4827/12313 [3:36:51<5:38:04, 2.71s/it] {'loss': 0.6599, 'grad_norm': 3.666796443871811, 'learning_rate': 3.4698315956955125e-06, 'epoch': 0.39} 39%|███▉ | 4827/12313 [3:36:51<5:38:04, 2.71s/it] 39%|███▉ | 4828/12313 [3:36:53<5:39:07, 2.72s/it] {'loss': 0.4903, 'grad_norm': 3.6434242436779973, 'learning_rate': 3.46922543986062e-06, 'epoch': 0.39} 39%|███▉ | 4828/12313 [3:36:53<5:39:07, 2.72s/it] 39%|███▉ | 4829/12313 [3:36:56<5:39:04, 2.72s/it] {'loss': 0.6005, 'grad_norm': 7.08366419413291, 'learning_rate': 3.468619216960457e-06, 'epoch': 0.39} 39%|███▉ | 4829/12313 [3:36:56<5:39:04, 2.72s/it] 39%|███▉ | 4830/12313 [3:37:00<6:06:36, 2.94s/it] {'loss': 0.5686, 'grad_norm': 4.495677796038025, 'learning_rate': 3.46801292703697e-06, 'epoch': 0.39} 39%|███▉ | 4830/12313 [3:37:00<6:06:36, 2.94s/it] 39%|███▉ | 4831/12313 [3:37:02<5:52:56, 2.83s/it] {'loss': 0.402, 'grad_norm': 6.22783462766733, 'learning_rate': 3.467406570132112e-06, 'epoch': 0.39} 39%|███▉ | 4831/12313 [3:37:02<5:52:56, 2.83s/it] 39%|███▉ | 4832/12313 [3:37:05<5:47:12, 2.78s/it] {'loss': 0.4031, 'grad_norm': 6.8479413586138165, 'learning_rate': 3.4668001462878386e-06, 'epoch': 0.39} 39%|███▉ | 4832/12313 [3:37:05<5:47:12, 2.78s/it] 39%|███▉ | 4833/12313 [3:37:07<5:38:16, 2.71s/it] {'loss': 0.4203, 'grad_norm': 11.848480365877366, 'learning_rate': 3.466193655546112e-06, 'epoch': 0.39} 39%|███▉ | 4833/12313 [3:37:07<5:38:16, 2.71s/it] 39%|███▉ | 4834/12313 [3:37:10<5:33:42, 2.68s/it] {'loss': 0.4155, 'grad_norm': 4.1344833900535205, 'learning_rate': 3.465587097948898e-06, 'epoch': 0.39} 39%|███▉ | 4834/12313 [3:37:10<5:33:42, 2.68s/it] 39%|███▉ | 4835/12313 [3:37:13<6:04:45, 2.93s/it] {'loss': 0.6314, 'grad_norm': 4.430449430971486, 'learning_rate': 3.4649804735381675e-06, 'epoch': 0.39} 39%|███▉ | 4835/12313 [3:37:13<6:04:45, 2.93s/it] 39%|███▉ | 4836/12313 [3:37:16<5:53:07, 2.83s/it] {'loss': 0.5447, 'grad_norm': 5.347837303662017, 'learning_rate': 3.4643737823558947e-06, 'epoch': 0.39} 39%|███▉ | 4836/12313 [3:37:16<5:53:07, 2.83s/it] 39%|███▉ | 4837/12313 [3:37:19<5:46:03, 2.78s/it] {'loss': 0.5371, 'grad_norm': 6.546391191124815, 'learning_rate': 3.463767024444061e-06, 'epoch': 0.39} 39%|███▉ | 4837/12313 [3:37:19<5:46:03, 2.78s/it] 39%|███▉ | 4838/12313 [3:37:21<5:43:06, 2.75s/it] {'loss': 0.638, 'grad_norm': 4.80456942652959, 'learning_rate': 3.4631601998446484e-06, 'epoch': 0.39} 39%|███▉ | 4838/12313 [3:37:21<5:43:06, 2.75s/it] 39%|███▉ | 4839/12313 [3:37:24<5:36:07, 2.70s/it] {'loss': 0.4545, 'grad_norm': 5.280861827964533, 'learning_rate': 3.4625533085996495e-06, 'epoch': 0.39} 39%|███▉ | 4839/12313 [3:37:24<5:36:07, 2.70s/it] 39%|███▉ | 4840/12313 [3:37:26<5:28:01, 2.63s/it] {'loss': 0.4733, 'grad_norm': 6.5672307967606995, 'learning_rate': 3.4619463507510536e-06, 'epoch': 0.39} 39%|███▉ | 4840/12313 [3:37:26<5:28:01, 2.63s/it] 39%|███▉ | 4841/12313 [3:37:29<5:24:40, 2.61s/it] {'loss': 0.5039, 'grad_norm': 4.308991992577792, 'learning_rate': 3.4613393263408625e-06, 'epoch': 0.39} 39%|███▉ | 4841/12313 [3:37:29<5:24:40, 2.61s/it] 39%|███▉ | 4842/12313 [3:37:32<5:27:52, 2.63s/it] {'loss': 0.5651, 'grad_norm': 5.784233850167831, 'learning_rate': 3.4607322354110785e-06, 'epoch': 0.39} 39%|███▉ | 4842/12313 [3:37:32<5:27:52, 2.63s/it] 39%|███▉ | 4843/12313 [3:37:34<5:24:30, 2.61s/it] {'loss': 0.6328, 'grad_norm': 5.463637822033822, 'learning_rate': 3.4601250780037064e-06, 'epoch': 0.39} 39%|███▉ | 4843/12313 [3:37:34<5:24:30, 2.61s/it] 39%|███▉ | 4844/12313 [3:37:37<5:31:05, 2.66s/it] {'loss': 0.63, 'grad_norm': 6.116722144553743, 'learning_rate': 3.4595178541607616e-06, 'epoch': 0.39} 39%|███▉ | 4844/12313 [3:37:37<5:31:05, 2.66s/it] 39%|███▉ | 4845/12313 [3:37:40<5:30:20, 2.65s/it] {'loss': 0.4404, 'grad_norm': 5.568944318775315, 'learning_rate': 3.45891056392426e-06, 'epoch': 0.39} 39%|███▉ | 4845/12313 [3:37:40<5:30:20, 2.65s/it] 39%|███▉ | 4846/12313 [3:37:42<5:35:15, 2.69s/it] {'loss': 0.4921, 'grad_norm': 5.00032757592686, 'learning_rate': 3.4583032073362216e-06, 'epoch': 0.39} 39%|███▉ | 4846/12313 [3:37:42<5:35:15, 2.69s/it] 39%|███▉ | 4847/12313 [3:37:45<5:27:30, 2.63s/it] {'loss': 0.5459, 'grad_norm': 6.833348088070499, 'learning_rate': 3.4576957844386728e-06, 'epoch': 0.39} 39%|███▉ | 4847/12313 [3:37:45<5:27:30, 2.63s/it] 39%|███▉ | 4848/12313 [3:37:47<5:24:23, 2.61s/it] {'loss': 0.5338, 'grad_norm': 3.622191745550694, 'learning_rate': 3.4570882952736445e-06, 'epoch': 0.39} 39%|███▉ | 4848/12313 [3:37:47<5:24:23, 2.61s/it] 39%|███▉ | 4849/12313 [3:37:50<5:27:32, 2.63s/it] {'loss': 0.5432, 'grad_norm': 3.7561292429825617, 'learning_rate': 3.4564807398831716e-06, 'epoch': 0.39} 39%|███▉ | 4849/12313 [3:37:50<5:27:32, 2.63s/it] 39%|███▉ | 4850/12313 [3:37:53<5:28:07, 2.64s/it] {'loss': 0.4724, 'grad_norm': 6.7697920687749145, 'learning_rate': 3.4558731183092936e-06, 'epoch': 0.39} 39%|███▉ | 4850/12313 [3:37:53<5:28:07, 2.64s/it] 39%|███▉ | 4851/12313 [3:37:55<5:28:15, 2.64s/it] {'loss': 0.4965, 'grad_norm': 17.901118220763788, 'learning_rate': 3.4552654305940546e-06, 'epoch': 0.39} 39%|███▉ | 4851/12313 [3:37:55<5:28:15, 2.64s/it] 39%|███▉ | 4852/12313 [3:37:58<5:43:27, 2.76s/it] {'loss': 0.4596, 'grad_norm': 5.591762564408774, 'learning_rate': 3.4546576767795036e-06, 'epoch': 0.39} 39%|███▉ | 4852/12313 [3:37:59<5:43:27, 2.76s/it] 39%|███▉ | 4853/12313 [3:38:01<5:34:57, 2.69s/it] {'loss': 0.6449, 'grad_norm': 4.5991945379468975, 'learning_rate': 3.4540498569076935e-06, 'epoch': 0.39} 39%|███▉ | 4853/12313 [3:38:01<5:34:57, 2.69s/it] 39%|███▉ | 4854/12313 [3:38:04<5:41:58, 2.75s/it] {'loss': 0.5522, 'grad_norm': 4.012140667999568, 'learning_rate': 3.453441971020682e-06, 'epoch': 0.39} 39%|███▉ | 4854/12313 [3:38:04<5:41:58, 2.75s/it] 39%|███▉ | 4855/12313 [3:38:07<5:41:36, 2.75s/it] {'loss': 0.5232, 'grad_norm': 4.026685263806628, 'learning_rate': 3.4528340191605336e-06, 'epoch': 0.39} 39%|███▉ | 4855/12313 [3:38:07<5:41:36, 2.75s/it] 39%|███▉ | 4856/12313 [3:38:09<5:36:33, 2.71s/it] {'loss': 0.4068, 'grad_norm': 3.9644370555022226, 'learning_rate': 3.452226001369313e-06, 'epoch': 0.39} 39%|███▉ | 4856/12313 [3:38:09<5:36:33, 2.71s/it] 39%|███▉ | 4857/12313 [3:38:12<5:33:51, 2.69s/it] {'loss': 0.5034, 'grad_norm': 4.955057245622184, 'learning_rate': 3.451617917689093e-06, 'epoch': 0.39} 39%|███▉ | 4857/12313 [3:38:12<5:33:51, 2.69s/it] 39%|███▉ | 4858/12313 [3:38:15<5:34:29, 2.69s/it] {'loss': 0.5604, 'grad_norm': 6.121308963517397, 'learning_rate': 3.4510097681619497e-06, 'epoch': 0.39} 39%|███▉ | 4858/12313 [3:38:15<5:34:29, 2.69s/it] 39%|███▉ | 4859/12313 [3:38:17<5:32:14, 2.67s/it] {'loss': 0.742, 'grad_norm': 3.955911011778945, 'learning_rate': 3.4504015528299633e-06, 'epoch': 0.39} 39%|███▉ | 4859/12313 [3:38:17<5:32:14, 2.67s/it] 39%|███▉ | 4860/12313 [3:38:20<5:34:14, 2.69s/it] {'loss': 0.4727, 'grad_norm': 3.416220952136702, 'learning_rate': 3.449793271735219e-06, 'epoch': 0.39} 39%|███▉ | 4860/12313 [3:38:20<5:34:14, 2.69s/it] 39%|███▉ | 4861/12313 [3:38:23<5:30:24, 2.66s/it] {'loss': 0.6083, 'grad_norm': 4.42715063532882, 'learning_rate': 3.4491849249198074e-06, 'epoch': 0.39} 39%|███▉ | 4861/12313 [3:38:23<5:30:24, 2.66s/it] 39%|███▉ | 4862/12313 [3:38:25<5:36:12, 2.71s/it] {'loss': 0.537, 'grad_norm': 5.7756208575555466, 'learning_rate': 3.4485765124258223e-06, 'epoch': 0.39} 39%|███▉ | 4862/12313 [3:38:25<5:36:12, 2.71s/it] 39%|███▉ | 4863/12313 [3:38:28<5:34:04, 2.69s/it] {'loss': 0.4879, 'grad_norm': 5.659261574392475, 'learning_rate': 3.4479680342953627e-06, 'epoch': 0.39} 39%|███▉ | 4863/12313 [3:38:28<5:34:04, 2.69s/it] 40%|███▉ | 4864/12313 [3:38:31<5:32:58, 2.68s/it] {'loss': 0.5727, 'grad_norm': 3.9294665757315625, 'learning_rate': 3.4473594905705326e-06, 'epoch': 0.4} 40%|███▉ | 4864/12313 [3:38:31<5:32:58, 2.68s/it] 40%|███▉ | 4865/12313 [3:38:33<5:29:07, 2.65s/it] {'loss': 0.5823, 'grad_norm': 4.426724017012553, 'learning_rate': 3.446750881293439e-06, 'epoch': 0.4} 40%|███▉ | 4865/12313 [3:38:33<5:29:07, 2.65s/it] 40%|███▉ | 4866/12313 [3:38:36<5:44:24, 2.77s/it] {'loss': 0.594, 'grad_norm': 4.23549319138764, 'learning_rate': 3.4461422065061957e-06, 'epoch': 0.4} 40%|███▉ | 4866/12313 [3:38:36<5:44:24, 2.77s/it] 40%|███▉ | 4867/12313 [3:38:39<5:53:30, 2.85s/it] {'loss': 0.5666, 'grad_norm': 4.069048549480606, 'learning_rate': 3.4455334662509186e-06, 'epoch': 0.4} 40%|███▉ | 4867/12313 [3:38:39<5:53:30, 2.85s/it] 40%|███▉ | 4868/12313 [3:38:42<5:48:18, 2.81s/it] {'loss': 0.5656, 'grad_norm': 9.920612867004065, 'learning_rate': 3.44492466056973e-06, 'epoch': 0.4} 40%|███▉ | 4868/12313 [3:38:42<5:48:18, 2.81s/it] 40%|███▉ | 4869/12313 [3:38:45<5:43:37, 2.77s/it] {'loss': 0.5565, 'grad_norm': 7.365805382978001, 'learning_rate': 3.4443157895047556e-06, 'epoch': 0.4} 40%|███▉ | 4869/12313 [3:38:45<5:43:37, 2.77s/it] 40%|███▉ | 4870/12313 [3:38:47<5:37:26, 2.72s/it] {'loss': 0.5486, 'grad_norm': 4.079639656340487, 'learning_rate': 3.4437068530981266e-06, 'epoch': 0.4} 40%|███▉ | 4870/12313 [3:38:47<5:37:26, 2.72s/it] 40%|███▉ | 4871/12313 [3:38:50<5:38:19, 2.73s/it] {'loss': 0.4873, 'grad_norm': 5.1014095863826805, 'learning_rate': 3.4430978513919777e-06, 'epoch': 0.4} 40%|███▉ | 4871/12313 [3:38:50<5:38:19, 2.73s/it] 40%|███▉ | 4872/12313 [3:38:53<5:28:14, 2.65s/it] {'loss': 0.5425, 'grad_norm': 6.57666055283231, 'learning_rate': 3.4424887844284492e-06, 'epoch': 0.4} 40%|███▉ | 4872/12313 [3:38:53<5:28:14, 2.65s/it] 40%|███▉ | 4873/12313 [3:38:55<5:28:27, 2.65s/it] {'loss': 0.3941, 'grad_norm': 4.768382603290578, 'learning_rate': 3.4418796522496845e-06, 'epoch': 0.4} 40%|███▉ | 4873/12313 [3:38:55<5:28:27, 2.65s/it] 40%|███▉ | 4874/12313 [3:38:58<5:23:07, 2.61s/it] {'loss': 0.7732, 'grad_norm': 6.114180863563064, 'learning_rate': 3.4412704548978326e-06, 'epoch': 0.4} 40%|███▉ | 4874/12313 [3:38:58<5:23:07, 2.61s/it] 40%|███▉ | 4875/12313 [3:39:00<5:23:25, 2.61s/it] {'loss': 0.5043, 'grad_norm': 3.93227698865023, 'learning_rate': 3.4406611924150468e-06, 'epoch': 0.4} 40%|███▉ | 4875/12313 [3:39:00<5:23:25, 2.61s/it] 40%|███▉ | 4876/12313 [3:39:03<5:24:20, 2.62s/it] {'loss': 0.4199, 'grad_norm': 5.0863186506106794, 'learning_rate': 3.440051864843485e-06, 'epoch': 0.4} 40%|███▉ | 4876/12313 [3:39:03<5:24:20, 2.62s/it] 40%|███▉ | 4877/12313 [3:39:06<5:25:16, 2.62s/it] {'loss': 0.5594, 'grad_norm': 8.186606675125352, 'learning_rate': 3.4394424722253095e-06, 'epoch': 0.4} 40%|███▉ | 4877/12313 [3:39:06<5:25:16, 2.62s/it] 40%|███▉ | 4878/12313 [3:39:08<5:24:50, 2.62s/it] {'loss': 0.515, 'grad_norm': 3.1309990632230185, 'learning_rate': 3.4388330146026865e-06, 'epoch': 0.4} 40%|███▉ | 4878/12313 [3:39:08<5:24:50, 2.62s/it] 40%|███▉ | 4879/12313 [3:39:11<5:28:58, 2.66s/it] {'loss': 0.65, 'grad_norm': 3.0856049239044676, 'learning_rate': 3.438223492017787e-06, 'epoch': 0.4} 40%|███▉ | 4879/12313 [3:39:11<5:28:58, 2.66s/it] 40%|███▉ | 4880/12313 [3:39:14<5:31:23, 2.68s/it] {'loss': 0.5401, 'grad_norm': 5.758345655080891, 'learning_rate': 3.4376139045127886e-06, 'epoch': 0.4} 40%|███▉ | 4880/12313 [3:39:14<5:31:23, 2.68s/it] 40%|███▉ | 4881/12313 [3:39:16<5:31:53, 2.68s/it] {'loss': 0.63, 'grad_norm': 6.386676934781451, 'learning_rate': 3.4370042521298697e-06, 'epoch': 0.4} 40%|███▉ | 4881/12313 [3:39:16<5:31:53, 2.68s/it] 40%|███▉ | 4882/12313 [3:39:19<5:34:31, 2.70s/it] {'loss': 0.4628, 'grad_norm': 4.289237032932299, 'learning_rate': 3.436394534911216e-06, 'epoch': 0.4} 40%|███▉ | 4882/12313 [3:39:19<5:34:31, 2.70s/it] 40%|███▉ | 4883/12313 [3:39:22<5:35:56, 2.71s/it] {'loss': 0.4507, 'grad_norm': 4.493221527603865, 'learning_rate': 3.4357847528990157e-06, 'epoch': 0.4} 40%|███▉ | 4883/12313 [3:39:22<5:35:56, 2.71s/it] 40%|███▉ | 4884/12313 [3:39:25<5:36:50, 2.72s/it] {'loss': 0.4214, 'grad_norm': 5.931364030127042, 'learning_rate': 3.4351749061354634e-06, 'epoch': 0.4} 40%|███▉ | 4884/12313 [3:39:25<5:36:50, 2.72s/it] 40%|███▉ | 4885/12313 [3:39:27<5:36:01, 2.71s/it] {'loss': 0.4338, 'grad_norm': 5.201076643919345, 'learning_rate': 3.4345649946627567e-06, 'epoch': 0.4} 40%|███▉ | 4885/12313 [3:39:27<5:36:01, 2.71s/it] 40%|███▉ | 4886/12313 [3:39:30<5:28:23, 2.65s/it] {'loss': 0.633, 'grad_norm': 6.755107310201019, 'learning_rate': 3.4339550185230985e-06, 'epoch': 0.4} 40%|███▉ | 4886/12313 [3:39:30<5:28:23, 2.65s/it] 40%|███▉ | 4887/12313 [3:39:33<5:30:46, 2.67s/it] {'loss': 0.4477, 'grad_norm': 6.012302549365938, 'learning_rate': 3.4333449777586957e-06, 'epoch': 0.4} 40%|███▉ | 4887/12313 [3:39:33<5:30:46, 2.67s/it] 40%|███▉ | 4888/12313 [3:39:35<5:31:21, 2.68s/it] {'loss': 0.555, 'grad_norm': 5.404553280600934, 'learning_rate': 3.432734872411761e-06, 'epoch': 0.4} 40%|███▉ | 4888/12313 [3:39:35<5:31:21, 2.68s/it] 40%|███▉ | 4889/12313 [3:39:38<5:31:39, 2.68s/it] {'loss': 0.56, 'grad_norm': 6.802478190345189, 'learning_rate': 3.4321247025245084e-06, 'epoch': 0.4} 40%|███▉ | 4889/12313 [3:39:38<5:31:39, 2.68s/it] 40%|███▉ | 4890/12313 [3:39:41<5:33:22, 2.69s/it] {'loss': 0.5331, 'grad_norm': 4.629306992412694, 'learning_rate': 3.4315144681391604e-06, 'epoch': 0.4} 40%|███▉ | 4890/12313 [3:39:41<5:33:22, 2.69s/it] 40%|███▉ | 4891/12313 [3:39:43<5:28:51, 2.66s/it] {'loss': 0.9165, 'grad_norm': 4.785707649092083, 'learning_rate': 3.430904169297941e-06, 'epoch': 0.4} 40%|███▉ | 4891/12313 [3:39:43<5:28:51, 2.66s/it] 40%|███▉ | 4892/12313 [3:39:46<5:20:15, 2.59s/it] {'loss': 0.6152, 'grad_norm': 7.873306295295412, 'learning_rate': 3.4302938060430794e-06, 'epoch': 0.4} 40%|███▉ | 4892/12313 [3:39:46<5:20:15, 2.59s/it] 40%|███▉ | 4893/12313 [3:39:48<5:21:01, 2.60s/it] {'loss': 0.4936, 'grad_norm': 3.4652132108801195, 'learning_rate': 3.429683378416811e-06, 'epoch': 0.4} 40%|███▉ | 4893/12313 [3:39:48<5:21:01, 2.60s/it] 40%|███▉ | 4894/12313 [3:39:51<5:33:01, 2.69s/it] {'loss': 0.5562, 'grad_norm': 4.301920992179896, 'learning_rate': 3.429072886461372e-06, 'epoch': 0.4} 40%|███▉ | 4894/12313 [3:39:51<5:33:01, 2.69s/it] 40%|███▉ | 4895/12313 [3:39:54<5:30:23, 2.67s/it] {'loss': 0.5153, 'grad_norm': 5.010865178574669, 'learning_rate': 3.428462330219007e-06, 'epoch': 0.4} 40%|███▉ | 4895/12313 [3:39:54<5:30:23, 2.67s/it] 40%|███▉ | 4896/12313 [3:39:57<5:34:45, 2.71s/it] {'loss': 0.4894, 'grad_norm': 6.077200433598698, 'learning_rate': 3.4278517097319617e-06, 'epoch': 0.4} 40%|███▉ | 4896/12313 [3:39:57<5:34:45, 2.71s/it] 40%|███▉ | 4897/12313 [3:39:59<5:40:54, 2.76s/it] {'loss': 0.5466, 'grad_norm': 4.201260945415446, 'learning_rate': 3.4272410250424893e-06, 'epoch': 0.4} 40%|███▉ | 4897/12313 [3:39:59<5:40:54, 2.76s/it] 40%|███▉ | 4898/12313 [3:40:02<5:37:34, 2.73s/it] {'loss': 0.499, 'grad_norm': 4.064316394961935, 'learning_rate': 3.4266302761928453e-06, 'epoch': 0.4} 40%|███▉ | 4898/12313 [3:40:02<5:37:34, 2.73s/it] 40%|███▉ | 4899/12313 [3:40:05<5:40:49, 2.76s/it] {'loss': 0.6487, 'grad_norm': 10.729804958408987, 'learning_rate': 3.4260194632252903e-06, 'epoch': 0.4} 40%|███▉ | 4899/12313 [3:40:05<5:40:49, 2.76s/it] 40%|███▉ | 4900/12313 [3:40:08<5:33:24, 2.70s/it] {'loss': 0.4588, 'grad_norm': 5.44190759303068, 'learning_rate': 3.4254085861820895e-06, 'epoch': 0.4} 40%|███▉ | 4900/12313 [3:40:08<5:33:24, 2.70s/it] 40%|███▉ | 4901/12313 [3:40:10<5:29:21, 2.67s/it] {'loss': 0.5658, 'grad_norm': 6.394017389790841, 'learning_rate': 3.424797645105512e-06, 'epoch': 0.4} 40%|███▉ | 4901/12313 [3:40:10<5:29:21, 2.67s/it] 40%|███▉ | 4902/12313 [3:40:13<5:33:13, 2.70s/it] {'loss': 0.5158, 'grad_norm': 5.684350734119721, 'learning_rate': 3.4241866400378315e-06, 'epoch': 0.4} 40%|███▉ | 4902/12313 [3:40:13<5:33:13, 2.70s/it] 40%|███▉ | 4903/12313 [3:40:15<5:25:41, 2.64s/it] {'loss': 0.4298, 'grad_norm': 3.9864422041154275, 'learning_rate': 3.423575571021327e-06, 'epoch': 0.4} 40%|███▉ | 4903/12313 [3:40:15<5:25:41, 2.64s/it] 40%|███▉ | 4904/12313 [3:40:18<5:22:11, 2.61s/it] {'loss': 0.6485, 'grad_norm': 5.696110565863003, 'learning_rate': 3.4229644380982817e-06, 'epoch': 0.4} 40%|███▉ | 4904/12313 [3:40:18<5:22:11, 2.61s/it] 40%|███▉ | 4905/12313 [3:40:21<5:28:20, 2.66s/it] {'loss': 0.5311, 'grad_norm': 8.085273365710513, 'learning_rate': 3.4223532413109807e-06, 'epoch': 0.4} 40%|███▉ | 4905/12313 [3:40:21<5:28:20, 2.66s/it] 40%|███▉ | 4906/12313 [3:40:24<5:38:31, 2.74s/it] {'loss': 0.3467, 'grad_norm': 21.38684299111223, 'learning_rate': 3.4217419807017177e-06, 'epoch': 0.4} 40%|███▉ | 4906/12313 [3:40:24<5:38:31, 2.74s/it] 40%|███▉ | 4907/12313 [3:40:26<5:35:32, 2.72s/it] {'loss': 0.3502, 'grad_norm': 6.879258693117489, 'learning_rate': 3.4211306563127876e-06, 'epoch': 0.4} 40%|███▉ | 4907/12313 [3:40:26<5:35:32, 2.72s/it] 40%|███▉ | 4908/12313 [3:40:29<5:32:29, 2.69s/it] {'loss': 0.5344, 'grad_norm': 5.751298879545744, 'learning_rate': 3.4205192681864905e-06, 'epoch': 0.4} 40%|███▉ | 4908/12313 [3:40:29<5:32:29, 2.69s/it] 40%|███▉ | 4909/12313 [3:40:32<5:29:17, 2.67s/it] {'loss': 0.5033, 'grad_norm': 9.337940570841411, 'learning_rate': 3.4199078163651335e-06, 'epoch': 0.4} 40%|███▉ | 4909/12313 [3:40:32<5:29:17, 2.67s/it] 40%|███▉ | 4910/12313 [3:40:34<5:25:19, 2.64s/it] {'loss': 0.518, 'grad_norm': 4.8306108732275925, 'learning_rate': 3.419296300891023e-06, 'epoch': 0.4} 40%|███▉ | 4910/12313 [3:40:34<5:25:19, 2.64s/it] 40%|███▉ | 4911/12313 [3:40:37<5:18:33, 2.58s/it] {'loss': 0.4499, 'grad_norm': 7.933167629628174, 'learning_rate': 3.418684721806474e-06, 'epoch': 0.4} 40%|███▉ | 4911/12313 [3:40:37<5:18:33, 2.58s/it] 40%|███▉ | 4912/12313 [3:40:39<5:23:15, 2.62s/it] {'loss': 0.534, 'grad_norm': 6.472677681508684, 'learning_rate': 3.418073079153804e-06, 'epoch': 0.4} 40%|███▉ | 4912/12313 [3:40:39<5:23:15, 2.62s/it] 40%|███▉ | 4913/12313 [3:40:42<5:27:34, 2.66s/it] {'loss': 0.569, 'grad_norm': 6.504622980241599, 'learning_rate': 3.4174613729753364e-06, 'epoch': 0.4} 40%|███▉ | 4913/12313 [3:40:42<5:27:34, 2.66s/it] 40%|███▉ | 4914/12313 [3:40:45<5:31:12, 2.69s/it] {'loss': 0.5061, 'grad_norm': 7.727056330599826, 'learning_rate': 3.4168496033133968e-06, 'epoch': 0.4} 40%|███▉ | 4914/12313 [3:40:45<5:31:12, 2.69s/it] 40%|███▉ | 4915/12313 [3:40:47<5:23:37, 2.62s/it] {'loss': 0.4645, 'grad_norm': 4.697545157125827, 'learning_rate': 3.416237770210317e-06, 'epoch': 0.4} 40%|███▉ | 4915/12313 [3:40:47<5:23:37, 2.62s/it] 40%|███▉ | 4916/12313 [3:40:50<5:22:45, 2.62s/it] {'loss': 0.4595, 'grad_norm': 4.809263186617258, 'learning_rate': 3.415625873708433e-06, 'epoch': 0.4} 40%|███▉ | 4916/12313 [3:40:50<5:22:45, 2.62s/it] 40%|███▉ | 4917/12313 [3:40:53<5:26:10, 2.65s/it] {'loss': 0.4436, 'grad_norm': 10.22371082626964, 'learning_rate': 3.4150139138500843e-06, 'epoch': 0.4} 40%|███▉ | 4917/12313 [3:40:53<5:26:10, 2.65s/it] 40%|███▉ | 4918/12313 [3:40:55<5:29:33, 2.67s/it] {'loss': 0.5012, 'grad_norm': 5.9377177585980245, 'learning_rate': 3.4144018906776155e-06, 'epoch': 0.4} 40%|███▉ | 4918/12313 [3:40:55<5:29:33, 2.67s/it] 40%|███▉ | 4919/12313 [3:40:58<5:26:05, 2.65s/it] {'loss': 0.6356, 'grad_norm': 5.523225188806799, 'learning_rate': 3.413789804233375e-06, 'epoch': 0.4} 40%|███▉ | 4919/12313 [3:40:58<5:26:05, 2.65s/it] 40%|███▉ | 4920/12313 [3:41:01<5:34:33, 2.72s/it] {'loss': 0.4785, 'grad_norm': 4.922432155658762, 'learning_rate': 3.413177654559717e-06, 'epoch': 0.4} 40%|███▉ | 4920/12313 [3:41:01<5:34:33, 2.72s/it] 40%|███▉ | 4921/12313 [3:41:03<5:31:09, 2.69s/it] {'loss': 0.4049, 'grad_norm': 8.522209977514363, 'learning_rate': 3.4125654416989975e-06, 'epoch': 0.4} 40%|███▉ | 4921/12313 [3:41:03<5:31:09, 2.69s/it] 40%|███▉ | 4922/12313 [3:41:06<5:34:29, 2.72s/it] {'loss': 0.5051, 'grad_norm': 4.166534006928631, 'learning_rate': 3.411953165693579e-06, 'epoch': 0.4} 40%|███▉ | 4922/12313 [3:41:06<5:34:29, 2.72s/it] 40%|███▉ | 4923/12313 [3:41:09<5:40:08, 2.76s/it] {'loss': 0.5134, 'grad_norm': 7.556966122206325, 'learning_rate': 3.4113408265858282e-06, 'epoch': 0.4} 40%|███▉ | 4923/12313 [3:41:09<5:40:08, 2.76s/it] 40%|███▉ | 4924/12313 [3:41:12<5:42:26, 2.78s/it] {'loss': 0.5773, 'grad_norm': 4.46432321290014, 'learning_rate': 3.4107284244181154e-06, 'epoch': 0.4} 40%|███▉ | 4924/12313 [3:41:12<5:42:26, 2.78s/it] 40%|███▉ | 4925/12313 [3:41:15<6:00:17, 2.93s/it] {'loss': 0.492, 'grad_norm': 3.3296185748575837, 'learning_rate': 3.4101159592328148e-06, 'epoch': 0.4} 40%|███▉ | 4925/12313 [3:41:15<6:00:17, 2.93s/it] 40%|████ | 4926/12313 [3:41:18<6:05:45, 2.97s/it] {'loss': 0.4053, 'grad_norm': 4.808058470791979, 'learning_rate': 3.409503431072308e-06, 'epoch': 0.4} 40%|████ | 4926/12313 [3:41:18<6:05:45, 2.97s/it] 40%|████ | 4927/12313 [3:41:21<5:56:06, 2.89s/it] {'loss': 0.515, 'grad_norm': 5.658421289602297, 'learning_rate': 3.408890839978976e-06, 'epoch': 0.4} 40%|████ | 4927/12313 [3:41:21<5:56:06, 2.89s/it] 40%|████ | 4928/12313 [3:41:23<5:39:51, 2.76s/it] {'loss': 0.5547, 'grad_norm': 6.896229712559321, 'learning_rate': 3.4082781859952087e-06, 'epoch': 0.4} 40%|████ | 4928/12313 [3:41:23<5:39:51, 2.76s/it] 40%|████ | 4929/12313 [3:41:26<5:26:27, 2.65s/it] {'loss': 0.6033, 'grad_norm': 5.824806238919424, 'learning_rate': 3.407665469163398e-06, 'epoch': 0.4} 40%|████ | 4929/12313 [3:41:26<5:26:27, 2.65s/it] 40%|████ | 4930/12313 [3:41:29<5:30:43, 2.69s/it] {'loss': 0.5083, 'grad_norm': 4.135425915903601, 'learning_rate': 3.4070526895259403e-06, 'epoch': 0.4} 40%|████ | 4930/12313 [3:41:29<5:30:43, 2.69s/it] 40%|████ | 4931/12313 [3:41:31<5:37:07, 2.74s/it] {'loss': 0.5962, 'grad_norm': 5.133129422170843, 'learning_rate': 3.4064398471252367e-06, 'epoch': 0.4} 40%|████ | 4931/12313 [3:41:31<5:37:07, 2.74s/it] 40%|████ | 4932/12313 [3:41:34<5:31:19, 2.69s/it] {'loss': 0.4848, 'grad_norm': 5.46292327134982, 'learning_rate': 3.4058269420036937e-06, 'epoch': 0.4} 40%|████ | 4932/12313 [3:41:34<5:31:19, 2.69s/it] 40%|████ | 4933/12313 [3:41:37<5:33:14, 2.71s/it] {'loss': 0.5223, 'grad_norm': 5.055147156646343, 'learning_rate': 3.40521397420372e-06, 'epoch': 0.4} 40%|████ | 4933/12313 [3:41:37<5:33:14, 2.71s/it] 40%|████ | 4934/12313 [3:41:39<5:33:39, 2.71s/it] {'loss': 0.658, 'grad_norm': 6.056907838015249, 'learning_rate': 3.4046009437677296e-06, 'epoch': 0.4} 40%|████ | 4934/12313 [3:41:39<5:33:39, 2.71s/it] 40%|████ | 4935/12313 [3:41:42<5:39:19, 2.76s/it] {'loss': 0.4064, 'grad_norm': 5.370500579188346, 'learning_rate': 3.403987850738142e-06, 'epoch': 0.4} 40%|████ | 4935/12313 [3:41:42<5:39:19, 2.76s/it] 40%|████ | 4936/12313 [3:41:45<5:27:03, 2.66s/it] {'loss': 0.4991, 'grad_norm': 5.583756706162014, 'learning_rate': 3.4033746951573797e-06, 'epoch': 0.4} 40%|████ | 4936/12313 [3:41:45<5:27:03, 2.66s/it] 40%|████ | 4937/12313 [3:41:47<5:26:48, 2.66s/it] {'loss': 0.6236, 'grad_norm': 7.016728588738892, 'learning_rate': 3.4027614770678695e-06, 'epoch': 0.4} 40%|████ | 4937/12313 [3:41:47<5:26:48, 2.66s/it] 40%|████ | 4938/12313 [3:41:50<5:14:54, 2.56s/it] {'loss': 0.4602, 'grad_norm': 4.595088770001303, 'learning_rate': 3.402148196512042e-06, 'epoch': 0.4} 40%|████ | 4938/12313 [3:41:50<5:14:54, 2.56s/it] 40%|████ | 4939/12313 [3:41:52<5:19:41, 2.60s/it] {'loss': 0.582, 'grad_norm': 8.586997426840448, 'learning_rate': 3.4015348535323344e-06, 'epoch': 0.4} 40%|████ | 4939/12313 [3:41:52<5:19:41, 2.60s/it] 40%|████ | 4940/12313 [3:41:55<5:23:02, 2.63s/it] {'loss': 0.456, 'grad_norm': 6.420810576653918, 'learning_rate': 3.400921448171187e-06, 'epoch': 0.4} 40%|████ | 4940/12313 [3:41:55<5:23:02, 2.63s/it] 40%|████ | 4941/12313 [3:41:58<5:27:12, 2.66s/it] {'loss': 0.5049, 'grad_norm': 3.598889910541628, 'learning_rate': 3.4003079804710414e-06, 'epoch': 0.4} 40%|████ | 4941/12313 [3:41:58<5:27:12, 2.66s/it] 40%|████ | 4942/12313 [3:42:01<5:57:26, 2.91s/it] {'loss': 0.5795, 'grad_norm': 4.85229866007985, 'learning_rate': 3.39969445047435e-06, 'epoch': 0.4} 40%|████ | 4942/12313 [3:42:01<5:57:26, 2.91s/it] 40%|████ | 4943/12313 [3:42:04<5:49:43, 2.85s/it] {'loss': 0.4898, 'grad_norm': 5.046704044838436, 'learning_rate': 3.399080858223564e-06, 'epoch': 0.4} 40%|████ | 4943/12313 [3:42:04<5:49:43, 2.85s/it] 40%|████ | 4944/12313 [3:42:07<5:42:15, 2.79s/it] {'loss': 0.4583, 'grad_norm': 4.201283039118606, 'learning_rate': 3.3984672037611403e-06, 'epoch': 0.4} 40%|████ | 4944/12313 [3:42:07<5:42:15, 2.79s/it] 40%|████ | 4945/12313 [3:42:09<5:34:15, 2.72s/it] {'loss': 0.569, 'grad_norm': 3.6245756239233855, 'learning_rate': 3.3978534871295423e-06, 'epoch': 0.4} 40%|████ | 4945/12313 [3:42:09<5:34:15, 2.72s/it] 40%|████ | 4946/12313 [3:42:12<5:24:12, 2.64s/it] {'loss': 0.4635, 'grad_norm': 5.943049720731405, 'learning_rate': 3.3972397083712337e-06, 'epoch': 0.4} 40%|████ | 4946/12313 [3:42:12<5:24:12, 2.64s/it] 40%|████ | 4947/12313 [3:42:14<5:27:37, 2.67s/it] {'loss': 0.5545, 'grad_norm': 4.4825133089550615, 'learning_rate': 3.3966258675286868e-06, 'epoch': 0.4} 40%|████ | 4947/12313 [3:42:14<5:27:37, 2.67s/it] 40%|████ | 4948/12313 [3:42:17<5:36:11, 2.74s/it] {'loss': 0.5132, 'grad_norm': 7.159533755424081, 'learning_rate': 3.3960119646443743e-06, 'epoch': 0.4} 40%|████ | 4948/12313 [3:42:17<5:36:11, 2.74s/it] 40%|████ | 4949/12313 [3:42:20<5:38:57, 2.76s/it] {'loss': 0.4824, 'grad_norm': 5.668432680762583, 'learning_rate': 3.395397999760777e-06, 'epoch': 0.4} 40%|████ | 4949/12313 [3:42:20<5:38:57, 2.76s/it] 40%|████ | 4950/12313 [3:42:23<5:33:18, 2.72s/it] {'loss': 0.5292, 'grad_norm': 7.74454321710248, 'learning_rate': 3.394783972920376e-06, 'epoch': 0.4} 40%|████ | 4950/12313 [3:42:23<5:33:18, 2.72s/it] 40%|████ | 4951/12313 [3:42:25<5:27:47, 2.67s/it] {'loss': 0.5123, 'grad_norm': 3.9688138836277864, 'learning_rate': 3.3941698841656594e-06, 'epoch': 0.4} 40%|████ | 4951/12313 [3:42:25<5:27:47, 2.67s/it] 40%|████ | 4952/12313 [3:42:28<5:20:57, 2.62s/it] {'loss': 0.5618, 'grad_norm': 3.6092591329939196, 'learning_rate': 3.3935557335391194e-06, 'epoch': 0.4} 40%|████ | 4952/12313 [3:42:28<5:20:57, 2.62s/it] 40%|████ | 4953/12313 [3:42:31<5:31:26, 2.70s/it] {'loss': 0.4638, 'grad_norm': 5.268354030573778, 'learning_rate': 3.3929415210832526e-06, 'epoch': 0.4} 40%|████ | 4953/12313 [3:42:31<5:31:26, 2.70s/it] 40%|████ | 4954/12313 [3:42:34<5:42:13, 2.79s/it] {'loss': 0.5442, 'grad_norm': 4.725073559638671, 'learning_rate': 3.392327246840558e-06, 'epoch': 0.4} 40%|████ | 4954/12313 [3:42:34<5:42:13, 2.79s/it] 40%|████ | 4955/12313 [3:42:36<5:37:02, 2.75s/it] {'loss': 0.5132, 'grad_norm': 5.0239266752483225, 'learning_rate': 3.39171291085354e-06, 'epoch': 0.4} 40%|████ | 4955/12313 [3:42:36<5:37:02, 2.75s/it] 40%|████ | 4956/12313 [3:42:39<5:36:46, 2.75s/it] {'loss': 0.6074, 'grad_norm': 4.688205783968622, 'learning_rate': 3.3910985131647077e-06, 'epoch': 0.4} 40%|████ | 4956/12313 [3:42:39<5:36:46, 2.75s/it] 40%|████ | 4957/12313 [3:42:42<5:33:51, 2.72s/it] {'loss': 0.4564, 'grad_norm': 4.306614565733134, 'learning_rate': 3.3904840538165745e-06, 'epoch': 0.4} 40%|████ | 4957/12313 [3:42:42<5:33:51, 2.72s/it] 40%|████ | 4958/12313 [3:42:44<5:32:57, 2.72s/it] {'loss': 0.536, 'grad_norm': 13.862109865488437, 'learning_rate': 3.3898695328516585e-06, 'epoch': 0.4} 40%|████ | 4958/12313 [3:42:44<5:32:57, 2.72s/it] 40%|████ | 4959/12313 [3:42:47<5:27:01, 2.67s/it] {'loss': 0.6559, 'grad_norm': 4.512610902634363, 'learning_rate': 3.38925495031248e-06, 'epoch': 0.4} 40%|████ | 4959/12313 [3:42:47<5:27:01, 2.67s/it] 40%|████ | 4960/12313 [3:42:50<5:22:05, 2.63s/it] {'loss': 0.5741, 'grad_norm': 4.1778178120621225, 'learning_rate': 3.3886403062415653e-06, 'epoch': 0.4} 40%|████ | 4960/12313 [3:42:50<5:22:05, 2.63s/it] 40%|████ | 4961/12313 [3:42:52<5:20:51, 2.62s/it] {'loss': 0.7565, 'grad_norm': 5.5923461249681194, 'learning_rate': 3.3880256006814436e-06, 'epoch': 0.4} 40%|████ | 4961/12313 [3:42:52<5:20:51, 2.62s/it] 40%|████ | 4962/12313 [3:42:55<5:18:40, 2.60s/it] {'loss': 0.817, 'grad_norm': 10.781234356503536, 'learning_rate': 3.387410833674651e-06, 'epoch': 0.4} 40%|████ | 4962/12313 [3:42:55<5:18:40, 2.60s/it] 40%|████ | 4963/12313 [3:42:57<5:19:15, 2.61s/it] {'loss': 0.4918, 'grad_norm': 7.255219164441723, 'learning_rate': 3.386796005263725e-06, 'epoch': 0.4} 40%|████ | 4963/12313 [3:42:57<5:19:15, 2.61s/it] 40%|████ | 4964/12313 [3:43:00<5:23:52, 2.64s/it] {'loss': 0.5502, 'grad_norm': 3.7214218539504094, 'learning_rate': 3.3861811154912085e-06, 'epoch': 0.4} 40%|████ | 4964/12313 [3:43:00<5:23:52, 2.64s/it] 40%|████ | 4965/12313 [3:43:03<5:24:57, 2.65s/it] {'loss': 0.6731, 'grad_norm': 3.7233458707878246, 'learning_rate': 3.385566164399649e-06, 'epoch': 0.4} 40%|████ | 4965/12313 [3:43:03<5:24:57, 2.65s/it] 40%|████ | 4966/12313 [3:43:05<5:25:41, 2.66s/it] {'loss': 0.426, 'grad_norm': 6.244845838241695, 'learning_rate': 3.3849511520315986e-06, 'epoch': 0.4} 40%|████ | 4966/12313 [3:43:05<5:25:41, 2.66s/it] 40%|████ | 4967/12313 [3:43:08<5:22:20, 2.63s/it] {'loss': 0.7065, 'grad_norm': 5.384785695599885, 'learning_rate': 3.384336078429611e-06, 'epoch': 0.4} 40%|████ | 4967/12313 [3:43:08<5:22:20, 2.63s/it] 40%|████ | 4968/12313 [3:43:11<5:23:01, 2.64s/it] {'loss': 0.477, 'grad_norm': 4.49252865805685, 'learning_rate': 3.3837209436362473e-06, 'epoch': 0.4} 40%|████ | 4968/12313 [3:43:11<5:23:01, 2.64s/it] 40%|████ | 4969/12313 [3:43:14<5:34:05, 2.73s/it] {'loss': 0.5459, 'grad_norm': 3.4933203948485643, 'learning_rate': 3.3831057476940716e-06, 'epoch': 0.4} 40%|████ | 4969/12313 [3:43:14<5:34:05, 2.73s/it] 40%|████ | 4970/12313 [3:43:16<5:35:49, 2.74s/it] {'loss': 0.559, 'grad_norm': 7.33852450348436, 'learning_rate': 3.382490490645651e-06, 'epoch': 0.4} 40%|████ | 4970/12313 [3:43:16<5:35:49, 2.74s/it] 40%|████ | 4971/12313 [3:43:19<5:28:53, 2.69s/it] {'loss': 0.4383, 'grad_norm': 5.565835641557815, 'learning_rate': 3.3818751725335595e-06, 'epoch': 0.4} 40%|████ | 4971/12313 [3:43:19<5:28:53, 2.69s/it] 40%|████ | 4972/12313 [3:43:21<5:21:38, 2.63s/it] {'loss': 0.6346, 'grad_norm': 6.069976549417349, 'learning_rate': 3.3812597934003746e-06, 'epoch': 0.4} 40%|████ | 4972/12313 [3:43:21<5:21:38, 2.63s/it] 40%|████ | 4973/12313 [3:43:25<5:41:54, 2.79s/it] {'loss': 0.4902, 'grad_norm': 5.41297727878491, 'learning_rate': 3.3806443532886736e-06, 'epoch': 0.4} 40%|████ | 4973/12313 [3:43:25<5:41:54, 2.79s/it] 40%|████ | 4974/12313 [3:43:27<5:34:28, 2.73s/it] {'loss': 0.6833, 'grad_norm': 3.474757683944031, 'learning_rate': 3.3800288522410464e-06, 'epoch': 0.4} 40%|████ | 4974/12313 [3:43:27<5:34:28, 2.73s/it] 40%|████ | 4975/12313 [3:43:30<5:42:08, 2.80s/it] {'loss': 0.4717, 'grad_norm': 18.494129306251725, 'learning_rate': 3.3794132903000787e-06, 'epoch': 0.4} 40%|████ | 4975/12313 [3:43:30<5:42:08, 2.80s/it] 40%|████ | 4976/12313 [3:43:33<5:27:32, 2.68s/it] {'loss': 0.4586, 'grad_norm': 5.201629259993079, 'learning_rate': 3.3787976675083657e-06, 'epoch': 0.4} 40%|████ | 4976/12313 [3:43:33<5:27:32, 2.68s/it] 40%|████ | 4977/12313 [3:43:35<5:25:48, 2.66s/it] {'loss': 0.6093, 'grad_norm': 3.067579639195292, 'learning_rate': 3.3781819839085056e-06, 'epoch': 0.4} 40%|████ | 4977/12313 [3:43:35<5:25:48, 2.66s/it] 40%|████ | 4978/12313 [3:43:38<5:26:16, 2.67s/it] {'loss': 0.4642, 'grad_norm': 6.51098650482296, 'learning_rate': 3.3775662395431e-06, 'epoch': 0.4} 40%|████ | 4978/12313 [3:43:38<5:26:16, 2.67s/it] 40%|████ | 4979/12313 [3:43:40<5:25:18, 2.66s/it] {'loss': 0.6634, 'grad_norm': 3.2528809931282057, 'learning_rate': 3.376950434454754e-06, 'epoch': 0.4} 40%|████ | 4979/12313 [3:43:40<5:25:18, 2.66s/it] 40%|████ | 4980/12313 [3:43:44<5:47:11, 2.84s/it] {'loss': 0.5264, 'grad_norm': 6.255509398772904, 'learning_rate': 3.37633456868608e-06, 'epoch': 0.4} 40%|████ | 4980/12313 [3:43:44<5:47:11, 2.84s/it] 40%|████ | 4981/12313 [3:43:46<5:37:56, 2.77s/it] {'loss': 0.4249, 'grad_norm': 6.628156482525939, 'learning_rate': 3.3757186422796918e-06, 'epoch': 0.4} 40%|████ | 4981/12313 [3:43:46<5:37:56, 2.77s/it] 40%|████ | 4982/12313 [3:43:49<5:33:49, 2.73s/it] {'loss': 0.5736, 'grad_norm': 5.667448673111919, 'learning_rate': 3.3751026552782085e-06, 'epoch': 0.4} 40%|████ | 4982/12313 [3:43:49<5:33:49, 2.73s/it] 40%|████ | 4983/12313 [3:43:52<5:41:34, 2.80s/it] {'loss': 0.585, 'grad_norm': 4.987055006193434, 'learning_rate': 3.3744866077242516e-06, 'epoch': 0.4} 40%|████ | 4983/12313 [3:43:52<5:41:34, 2.80s/it] 40%|████ | 4984/12313 [3:43:54<5:25:49, 2.67s/it] {'loss': 0.6341, 'grad_norm': 5.82552976044451, 'learning_rate': 3.3738704996604505e-06, 'epoch': 0.4} 40%|████ | 4984/12313 [3:43:54<5:25:49, 2.67s/it] 40%|████ | 4985/12313 [3:43:57<5:29:36, 2.70s/it] {'loss': 0.4046, 'grad_norm': 4.093717103910503, 'learning_rate': 3.373254331129436e-06, 'epoch': 0.4} 40%|████ | 4985/12313 [3:43:57<5:29:36, 2.70s/it] 40%|████ | 4986/12313 [3:43:59<5:17:30, 2.60s/it] {'loss': 0.4569, 'grad_norm': 7.070638776876593, 'learning_rate': 3.3726381021738426e-06, 'epoch': 0.4} 40%|████ | 4986/12313 [3:43:59<5:17:30, 2.60s/it] 41%|████ | 4987/12313 [3:44:02<5:13:41, 2.57s/it] {'loss': 0.522, 'grad_norm': 4.327438597298289, 'learning_rate': 3.372021812836311e-06, 'epoch': 0.41} 41%|████ | 4987/12313 [3:44:02<5:13:41, 2.57s/it] 41%|████ | 4988/12313 [3:44:05<5:14:08, 2.57s/it] {'loss': 0.5055, 'grad_norm': 7.802424985063589, 'learning_rate': 3.371405463159486e-06, 'epoch': 0.41} 41%|████ | 4988/12313 [3:44:05<5:14:08, 2.57s/it] 41%|████ | 4989/12313 [3:44:07<5:19:15, 2.62s/it] {'loss': 0.5691, 'grad_norm': 5.2484985718748405, 'learning_rate': 3.3707890531860143e-06, 'epoch': 0.41} 41%|████ | 4989/12313 [3:44:07<5:19:15, 2.62s/it] 41%|████ | 4990/12313 [3:44:10<5:22:56, 2.65s/it] {'loss': 0.5087, 'grad_norm': 3.961595770162228, 'learning_rate': 3.3701725829585484e-06, 'epoch': 0.41} 41%|████ | 4990/12313 [3:44:10<5:22:56, 2.65s/it] 41%|████ | 4991/12313 [3:44:13<5:30:07, 2.71s/it] {'loss': 0.5014, 'grad_norm': 5.115368545445615, 'learning_rate': 3.369556052519746e-06, 'epoch': 0.41} 41%|████ | 4991/12313 [3:44:13<5:30:07, 2.71s/it] 41%|████ | 4992/12313 [3:44:16<5:42:35, 2.81s/it] {'loss': 0.5217, 'grad_norm': 9.264859588158524, 'learning_rate': 3.3689394619122654e-06, 'epoch': 0.41} 41%|████ | 4992/12313 [3:44:16<5:42:35, 2.81s/it] 41%|████ | 4993/12313 [3:44:19<5:48:10, 2.85s/it] {'loss': 0.5209, 'grad_norm': 5.809302524376221, 'learning_rate': 3.3683228111787738e-06, 'epoch': 0.41} 41%|████ | 4993/12313 [3:44:19<5:48:10, 2.85s/it] 41%|████ | 4994/12313 [3:44:21<5:34:41, 2.74s/it] {'loss': 0.5284, 'grad_norm': 6.347688148270995, 'learning_rate': 3.367706100361939e-06, 'epoch': 0.41} 41%|████ | 4994/12313 [3:44:21<5:34:41, 2.74s/it] 41%|████ | 4995/12313 [3:44:24<5:29:24, 2.70s/it] {'loss': 0.6141, 'grad_norm': 5.2075723248409185, 'learning_rate': 3.3670893295044344e-06, 'epoch': 0.41} 41%|████ | 4995/12313 [3:44:24<5:29:24, 2.70s/it] 41%|████ | 4996/12313 [3:44:27<5:29:36, 2.70s/it] {'loss': 0.539, 'grad_norm': 6.241751717955846, 'learning_rate': 3.3664724986489368e-06, 'epoch': 0.41} 41%|████ | 4996/12313 [3:44:27<5:29:36, 2.70s/it] 41%|████ | 4997/12313 [3:44:29<5:20:41, 2.63s/it] {'loss': 0.4779, 'grad_norm': 5.913411596718994, 'learning_rate': 3.3658556078381283e-06, 'epoch': 0.41} 41%|████ | 4997/12313 [3:44:29<5:20:41, 2.63s/it] 41%|████ | 4998/12313 [3:44:32<5:17:49, 2.61s/it] {'loss': 0.4415, 'grad_norm': 5.987603839595508, 'learning_rate': 3.3652386571146945e-06, 'epoch': 0.41} 41%|████ | 4998/12313 [3:44:32<5:17:49, 2.61s/it] 41%|████ | 4999/12313 [3:44:35<5:33:47, 2.74s/it] {'loss': 0.5426, 'grad_norm': 5.342255084971644, 'learning_rate': 3.3646216465213245e-06, 'epoch': 0.41} 41%|████ | 4999/12313 [3:44:35<5:33:47, 2.74s/it] 41%|████ | 5000/12313 [3:44:37<5:30:56, 2.72s/it] {'loss': 0.5235, 'grad_norm': 5.881978781645484, 'learning_rate': 3.364004576100712e-06, 'epoch': 0.41} 41%|████ | 5000/12313 [3:44:37<5:30:56, 2.72s/it] 41%|████ | 5001/12313 [3:44:40<5:26:20, 2.68s/it] {'loss': 0.5061, 'grad_norm': 5.907061959419017, 'learning_rate': 3.3633874458955573e-06, 'epoch': 0.41} 41%|████ | 5001/12313 [3:44:40<5:26:20, 2.68s/it] 41%|████ | 5002/12313 [3:44:43<5:29:38, 2.71s/it] {'loss': 0.5892, 'grad_norm': 4.640005094027489, 'learning_rate': 3.362770255948559e-06, 'epoch': 0.41} 41%|████ | 5002/12313 [3:44:43<5:29:38, 2.71s/it] 41%|████ | 5003/12313 [3:44:46<5:38:34, 2.78s/it] {'loss': 0.6451, 'grad_norm': 5.450649196042516, 'learning_rate': 3.3621530063024257e-06, 'epoch': 0.41} 41%|████ | 5003/12313 [3:44:46<5:38:34, 2.78s/it] 41%|████ | 5004/12313 [3:44:48<5:38:06, 2.78s/it] {'loss': 0.5278, 'grad_norm': 12.30013823840977, 'learning_rate': 3.3615356969998676e-06, 'epoch': 0.41} 41%|████ | 5004/12313 [3:44:48<5:38:06, 2.78s/it] 41%|████ | 5005/12313 [3:44:51<5:40:39, 2.80s/it] {'loss': 0.6247, 'grad_norm': 3.2285712876853156, 'learning_rate': 3.360918328083598e-06, 'epoch': 0.41} 41%|████ | 5005/12313 [3:44:51<5:40:39, 2.80s/it] 41%|████ | 5006/12313 [3:44:54<5:34:58, 2.75s/it] {'loss': 0.527, 'grad_norm': 4.8576215977033135, 'learning_rate': 3.3603008995963373e-06, 'epoch': 0.41} 41%|████ | 5006/12313 [3:44:54<5:34:58, 2.75s/it] 41%|████ | 5007/12313 [3:44:56<5:24:30, 2.67s/it] {'loss': 0.5264, 'grad_norm': 6.983405839371138, 'learning_rate': 3.3596834115808074e-06, 'epoch': 0.41} 41%|████ | 5007/12313 [3:44:56<5:24:30, 2.67s/it] 41%|████ | 5008/12313 [3:44:59<5:15:34, 2.59s/it] {'loss': 0.6397, 'grad_norm': 4.393414313571411, 'learning_rate': 3.3590658640797346e-06, 'epoch': 0.41} 41%|████ | 5008/12313 [3:44:59<5:15:34, 2.59s/it] 41%|████ | 5009/12313 [3:45:02<5:22:34, 2.65s/it] {'loss': 0.4797, 'grad_norm': 5.416974817120511, 'learning_rate': 3.3584482571358513e-06, 'epoch': 0.41} 41%|████ | 5009/12313 [3:45:02<5:22:34, 2.65s/it] 41%|████ | 5010/12313 [3:45:04<5:18:06, 2.61s/it] {'loss': 0.452, 'grad_norm': 3.780442832500498, 'learning_rate': 3.357830590791891e-06, 'epoch': 0.41} 41%|████ | 5010/12313 [3:45:04<5:18:06, 2.61s/it] 41%|████ | 5011/12313 [3:45:07<5:15:59, 2.60s/it] {'loss': 0.4962, 'grad_norm': 4.203958087565441, 'learning_rate': 3.3572128650905946e-06, 'epoch': 0.41} 41%|████ | 5011/12313 [3:45:07<5:15:59, 2.60s/it] 41%|████ | 5012/12313 [3:45:09<5:19:49, 2.63s/it] {'loss': 0.5625, 'grad_norm': 7.016929878457646, 'learning_rate': 3.3565950800747038e-06, 'epoch': 0.41} 41%|████ | 5012/12313 [3:45:09<5:19:49, 2.63s/it] 41%|████ | 5013/12313 [3:45:12<5:20:30, 2.63s/it] {'loss': 0.544, 'grad_norm': 5.779316536373502, 'learning_rate': 3.355977235786968e-06, 'epoch': 0.41} 41%|████ | 5013/12313 [3:45:12<5:20:30, 2.63s/it] 41%|████ | 5014/12313 [3:45:15<5:20:02, 2.63s/it] {'loss': 0.5445, 'grad_norm': 4.419872849655686, 'learning_rate': 3.3553593322701374e-06, 'epoch': 0.41} 41%|████ | 5014/12313 [3:45:15<5:20:02, 2.63s/it] 41%|████ | 5015/12313 [3:45:17<5:22:33, 2.65s/it] {'loss': 0.4479, 'grad_norm': 19.583565435812158, 'learning_rate': 3.3547413695669673e-06, 'epoch': 0.41} 41%|████ | 5015/12313 [3:45:17<5:22:33, 2.65s/it] 41%|████ | 5016/12313 [3:45:20<5:23:28, 2.66s/it] {'loss': 0.6231, 'grad_norm': 6.737000577753693, 'learning_rate': 3.3541233477202184e-06, 'epoch': 0.41} 41%|████ | 5016/12313 [3:45:20<5:23:28, 2.66s/it] 41%|████ | 5017/12313 [3:45:23<5:25:37, 2.68s/it] {'loss': 0.4791, 'grad_norm': 5.651619729539659, 'learning_rate': 3.3535052667726546e-06, 'epoch': 0.41} 41%|████ | 5017/12313 [3:45:23<5:25:37, 2.68s/it] 41%|████ | 5018/12313 [3:45:25<5:25:37, 2.68s/it] {'loss': 0.618, 'grad_norm': 13.01736060108638, 'learning_rate': 3.352887126767043e-06, 'epoch': 0.41} 41%|████ | 5018/12313 [3:45:25<5:25:37, 2.68s/it] 41%|████ | 5019/12313 [3:45:28<5:26:02, 2.68s/it] {'loss': 0.5014, 'grad_norm': 7.913220552107575, 'learning_rate': 3.352268927746156e-06, 'epoch': 0.41} 41%|████ | 5019/12313 [3:45:28<5:26:02, 2.68s/it] 41%|████ | 5020/12313 [3:45:31<5:19:42, 2.63s/it] {'loss': 0.7005, 'grad_norm': 5.00487843260526, 'learning_rate': 3.3516506697527706e-06, 'epoch': 0.41} 41%|████ | 5020/12313 [3:45:31<5:19:42, 2.63s/it] 41%|████ | 5021/12313 [3:45:33<5:18:54, 2.62s/it] {'loss': 0.5042, 'grad_norm': 4.0699036925411045, 'learning_rate': 3.3510323528296656e-06, 'epoch': 0.41} 41%|████ | 5021/12313 [3:45:33<5:18:54, 2.62s/it] 41%|████ | 5022/12313 [3:45:36<5:17:37, 2.61s/it] {'loss': 0.5318, 'grad_norm': 3.9550635463784944, 'learning_rate': 3.3504139770196252e-06, 'epoch': 0.41} 41%|████ | 5022/12313 [3:45:36<5:17:37, 2.61s/it] 41%|████ | 5023/12313 [3:45:39<5:21:46, 2.65s/it] {'loss': 0.5501, 'grad_norm': 3.9690809902318605, 'learning_rate': 3.3497955423654395e-06, 'epoch': 0.41} 41%|████ | 5023/12313 [3:45:39<5:21:46, 2.65s/it] 41%|████ | 5024/12313 [3:45:41<5:21:48, 2.65s/it] {'loss': 0.4221, 'grad_norm': 5.238253417658213, 'learning_rate': 3.349177048909899e-06, 'epoch': 0.41} 41%|████ | 5024/12313 [3:45:41<5:21:48, 2.65s/it] 41%|████ | 5025/12313 [3:45:44<5:25:18, 2.68s/it] {'loss': 0.5599, 'grad_norm': 6.31641058902131, 'learning_rate': 3.3485584966958005e-06, 'epoch': 0.41} 41%|████ | 5025/12313 [3:45:44<5:25:18, 2.68s/it] 41%|████ | 5026/12313 [3:45:47<5:25:14, 2.68s/it] {'loss': 0.672, 'grad_norm': 4.773613373392485, 'learning_rate': 3.3479398857659464e-06, 'epoch': 0.41} 41%|████ | 5026/12313 [3:45:47<5:25:14, 2.68s/it] 41%|████ | 5027/12313 [3:45:49<5:28:58, 2.71s/it] {'loss': 0.4964, 'grad_norm': 6.145123089078632, 'learning_rate': 3.3473212161631385e-06, 'epoch': 0.41} 41%|████ | 5027/12313 [3:45:49<5:28:58, 2.71s/it] 41%|████ | 5028/12313 [3:45:52<5:25:23, 2.68s/it] {'loss': 0.633, 'grad_norm': 4.502728069285821, 'learning_rate': 3.3467024879301873e-06, 'epoch': 0.41} 41%|████ | 5028/12313 [3:45:52<5:25:23, 2.68s/it] 41%|████ | 5029/12313 [3:45:55<5:20:27, 2.64s/it] {'loss': 0.7541, 'grad_norm': 4.232356275831273, 'learning_rate': 3.346083701109905e-06, 'epoch': 0.41} 41%|████ | 5029/12313 [3:45:55<5:20:27, 2.64s/it] 41%|████ | 5030/12313 [3:45:57<5:18:34, 2.62s/it] {'loss': 0.6154, 'grad_norm': 6.5669331099437045, 'learning_rate': 3.3454648557451087e-06, 'epoch': 0.41} 41%|████ | 5030/12313 [3:45:57<5:18:34, 2.62s/it] 41%|████ | 5031/12313 [3:46:00<5:29:43, 2.72s/it] {'loss': 0.563, 'grad_norm': 3.585238962477474, 'learning_rate': 3.3448459518786193e-06, 'epoch': 0.41} 41%|████ | 5031/12313 [3:46:00<5:29:43, 2.72s/it] 41%|████ | 5032/12313 [3:46:03<5:31:33, 2.73s/it] {'loss': 0.4665, 'grad_norm': 7.113560531413456, 'learning_rate': 3.3442269895532604e-06, 'epoch': 0.41} 41%|████ | 5032/12313 [3:46:03<5:31:33, 2.73s/it] 41%|████ | 5033/12313 [3:46:05<5:26:01, 2.69s/it] {'loss': 0.4255, 'grad_norm': 4.184427215416845, 'learning_rate': 3.3436079688118618e-06, 'epoch': 0.41} 41%|████ | 5033/12313 [3:46:05<5:26:01, 2.69s/it] 41%|████ | 5034/12313 [3:46:09<5:43:29, 2.83s/it] {'loss': 0.4262, 'grad_norm': 3.690634979166768, 'learning_rate': 3.3429888896972575e-06, 'epoch': 0.41} 41%|████ | 5034/12313 [3:46:09<5:43:29, 2.83s/it] 41%|████ | 5035/12313 [3:46:11<5:39:02, 2.80s/it] {'loss': 0.5172, 'grad_norm': 3.7301420071620424, 'learning_rate': 3.3423697522522823e-06, 'epoch': 0.41} 41%|████ | 5035/12313 [3:46:11<5:39:02, 2.80s/it] 41%|████ | 5036/12313 [3:46:14<5:48:23, 2.87s/it] {'loss': 0.6716, 'grad_norm': 4.87720646565865, 'learning_rate': 3.3417505565197794e-06, 'epoch': 0.41} 41%|████ | 5036/12313 [3:46:14<5:48:23, 2.87s/it] 41%|████ | 5037/12313 [3:46:17<5:47:53, 2.87s/it] {'loss': 0.5472, 'grad_norm': 3.6748482560950615, 'learning_rate': 3.3411313025425927e-06, 'epoch': 0.41} 41%|████ | 5037/12313 [3:46:17<5:47:53, 2.87s/it] 41%|████ | 5038/12313 [3:46:20<5:40:15, 2.81s/it] {'loss': 0.3746, 'grad_norm': 5.7048262277305275, 'learning_rate': 3.340511990363571e-06, 'epoch': 0.41} 41%|████ | 5038/12313 [3:46:20<5:40:15, 2.81s/it] 41%|████ | 5039/12313 [3:46:22<5:33:15, 2.75s/it] {'loss': 0.3669, 'grad_norm': 5.363584407358412, 'learning_rate': 3.3398926200255684e-06, 'epoch': 0.41} 41%|████ | 5039/12313 [3:46:22<5:33:15, 2.75s/it] 41%|████ | 5040/12313 [3:46:25<5:32:55, 2.75s/it] {'loss': 0.5765, 'grad_norm': 4.185826585370702, 'learning_rate': 3.3392731915714417e-06, 'epoch': 0.41} 41%|████ | 5040/12313 [3:46:25<5:32:55, 2.75s/it] 41%|████ | 5041/12313 [3:46:28<5:30:47, 2.73s/it] {'loss': 0.574, 'grad_norm': 5.9673921212077286, 'learning_rate': 3.338653705044051e-06, 'epoch': 0.41} 41%|████ | 5041/12313 [3:46:28<5:30:47, 2.73s/it] 41%|████ | 5042/12313 [3:46:31<5:46:14, 2.86s/it] {'loss': 0.4787, 'grad_norm': 13.697669906869592, 'learning_rate': 3.3380341604862633e-06, 'epoch': 0.41} 41%|████ | 5042/12313 [3:46:31<5:46:14, 2.86s/it] 41%|████ | 5043/12313 [3:46:34<5:40:05, 2.81s/it] {'loss': 0.5793, 'grad_norm': 18.101167067110552, 'learning_rate': 3.3374145579409467e-06, 'epoch': 0.41} 41%|████ | 5043/12313 [3:46:34<5:40:05, 2.81s/it] 41%|████ | 5044/12313 [3:46:36<5:35:24, 2.77s/it] {'loss': 0.4709, 'grad_norm': 33.302750260887564, 'learning_rate': 3.3367948974509743e-06, 'epoch': 0.41} 41%|████ | 5044/12313 [3:46:36<5:35:24, 2.77s/it] 41%|████ | 5045/12313 [3:46:39<5:26:57, 2.70s/it] {'loss': 0.4976, 'grad_norm': 5.565821515126126, 'learning_rate': 3.336175179059224e-06, 'epoch': 0.41} 41%|████ | 5045/12313 [3:46:39<5:26:57, 2.70s/it] 41%|████ | 5046/12313 [3:46:42<5:31:51, 2.74s/it] {'loss': 0.5237, 'grad_norm': 4.92473685068305, 'learning_rate': 3.335555402808577e-06, 'epoch': 0.41} 41%|████ | 5046/12313 [3:46:42<5:31:51, 2.74s/it] 41%|████ | 5047/12313 [3:46:44<5:21:00, 2.65s/it] {'loss': 0.505, 'grad_norm': 3.6230723915327343, 'learning_rate': 3.334935568741918e-06, 'epoch': 0.41} 41%|████ | 5047/12313 [3:46:44<5:21:00, 2.65s/it] 41%|████ | 5048/12313 [3:46:47<5:19:57, 2.64s/it] {'loss': 0.5275, 'grad_norm': 4.073098074480497, 'learning_rate': 3.3343156769021355e-06, 'epoch': 0.41} 41%|████ | 5048/12313 [3:46:47<5:19:57, 2.64s/it] 41%|████ | 5049/12313 [3:46:50<5:25:47, 2.69s/it] {'loss': 0.635, 'grad_norm': 3.908868759278294, 'learning_rate': 3.333695727332125e-06, 'epoch': 0.41} 41%|████ | 5049/12313 [3:46:50<5:25:47, 2.69s/it] 41%|████ | 5050/12313 [3:46:52<5:22:42, 2.67s/it] {'loss': 0.4958, 'grad_norm': 4.898135196202766, 'learning_rate': 3.3330757200747828e-06, 'epoch': 0.41} 41%|████ | 5050/12313 [3:46:52<5:22:42, 2.67s/it] 41%|████ | 5051/12313 [3:46:55<5:20:57, 2.65s/it] {'loss': 0.4647, 'grad_norm': 11.463379753224626, 'learning_rate': 3.332455655173008e-06, 'epoch': 0.41} 41%|████ | 5051/12313 [3:46:55<5:20:57, 2.65s/it] 41%|████ | 5052/12313 [3:46:58<5:21:58, 2.66s/it] {'loss': 0.6904, 'grad_norm': 4.243407156064371, 'learning_rate': 3.3318355326697093e-06, 'epoch': 0.41} 41%|████ | 5052/12313 [3:46:58<5:21:58, 2.66s/it] 41%|████ | 5053/12313 [3:47:00<5:15:51, 2.61s/it] {'loss': 0.5248, 'grad_norm': 3.5903815519776585, 'learning_rate': 3.3312153526077933e-06, 'epoch': 0.41} 41%|████ | 5053/12313 [3:47:00<5:15:51, 2.61s/it] 41%|████ | 5054/12313 [3:47:03<5:23:32, 2.67s/it] {'loss': 0.4837, 'grad_norm': 5.5607811919006345, 'learning_rate': 3.330595115030174e-06, 'epoch': 0.41} 41%|████ | 5054/12313 [3:47:03<5:23:32, 2.67s/it] 41%|████ | 5055/12313 [3:47:06<5:23:45, 2.68s/it] {'loss': 0.4331, 'grad_norm': 4.396444235590735, 'learning_rate': 3.3299748199797686e-06, 'epoch': 0.41} 41%|████ | 5055/12313 [3:47:06<5:23:45, 2.68s/it] 41%|████ | 5056/12313 [3:47:08<5:21:07, 2.65s/it] {'loss': 0.5109, 'grad_norm': 5.6326755354518845, 'learning_rate': 3.3293544674994987e-06, 'epoch': 0.41} 41%|████ | 5056/12313 [3:47:08<5:21:07, 2.65s/it] 41%|████ | 5057/12313 [3:47:12<5:51:56, 2.91s/it] {'loss': 0.5488, 'grad_norm': 6.458559629689074, 'learning_rate': 3.328734057632289e-06, 'epoch': 0.41} 41%|████ | 5057/12313 [3:47:12<5:51:56, 2.91s/it] 41%|████ | 5058/12313 [3:47:15<5:49:37, 2.89s/it] {'loss': 0.787, 'grad_norm': 6.259903972340533, 'learning_rate': 3.328113590421068e-06, 'epoch': 0.41} 41%|████ | 5058/12313 [3:47:15<5:49:37, 2.89s/it] 41%|████ | 5059/12313 [3:47:17<5:37:54, 2.79s/it] {'loss': 0.6045, 'grad_norm': 5.031847327165066, 'learning_rate': 3.3274930659087694e-06, 'epoch': 0.41} 41%|████ | 5059/12313 [3:47:17<5:37:54, 2.79s/it] 41%|████ | 5060/12313 [3:47:20<5:30:14, 2.73s/it] {'loss': 0.5007, 'grad_norm': 12.757281627069613, 'learning_rate': 3.3268724841383302e-06, 'epoch': 0.41} 41%|████ | 5060/12313 [3:47:20<5:30:14, 2.73s/it] 41%|████ | 5061/12313 [3:47:22<5:29:28, 2.73s/it] {'loss': 0.6009, 'grad_norm': 8.972711433026783, 'learning_rate': 3.3262518451526916e-06, 'epoch': 0.41} 41%|████ | 5061/12313 [3:47:22<5:29:28, 2.73s/it] 41%|████ | 5062/12313 [3:47:25<5:34:44, 2.77s/it] {'loss': 0.4718, 'grad_norm': 3.924314034573167, 'learning_rate': 3.3256311489947973e-06, 'epoch': 0.41} 41%|████ | 5062/12313 [3:47:25<5:34:44, 2.77s/it] 41%|████ | 5063/12313 [3:47:28<5:33:36, 2.76s/it] {'loss': 0.6721, 'grad_norm': 3.8326582602133796, 'learning_rate': 3.3250103957075987e-06, 'epoch': 0.41} 41%|████ | 5063/12313 [3:47:28<5:33:36, 2.76s/it] 41%|████ | 5064/12313 [3:47:31<5:27:59, 2.71s/it] {'loss': 0.3982, 'grad_norm': 5.858057317848575, 'learning_rate': 3.3243895853340445e-06, 'epoch': 0.41} 41%|████ | 5064/12313 [3:47:31<5:27:59, 2.71s/it] 41%|████ | 5065/12313 [3:47:33<5:23:24, 2.68s/it] {'loss': 0.6906, 'grad_norm': 4.79530936974041, 'learning_rate': 3.323768717917096e-06, 'epoch': 0.41} 41%|████ | 5065/12313 [3:47:33<5:23:24, 2.68s/it] 41%|████ | 5066/12313 [3:47:36<5:36:26, 2.79s/it] {'loss': 0.5835, 'grad_norm': 3.761887070675124, 'learning_rate': 3.323147793499712e-06, 'epoch': 0.41} 41%|████ | 5066/12313 [3:47:36<5:36:26, 2.79s/it] 41%|████ | 5067/12313 [3:47:39<5:33:06, 2.76s/it] {'loss': 0.5361, 'grad_norm': 4.8103574968475735, 'learning_rate': 3.3225268121248567e-06, 'epoch': 0.41} 41%|████ | 5067/12313 [3:47:39<5:33:06, 2.76s/it] 41%|████ | 5068/12313 [3:47:42<5:31:02, 2.74s/it] {'loss': 0.6076, 'grad_norm': 3.859720909374451, 'learning_rate': 3.321905773835498e-06, 'epoch': 0.41} 41%|████ | 5068/12313 [3:47:42<5:31:02, 2.74s/it] 41%|████ | 5069/12313 [3:47:44<5:25:01, 2.69s/it] {'loss': 0.4216, 'grad_norm': 3.758573255947487, 'learning_rate': 3.3212846786746113e-06, 'epoch': 0.41} 41%|████ | 5069/12313 [3:47:44<5:25:01, 2.69s/it] 41%|████ | 5070/12313 [3:47:47<5:28:50, 2.72s/it] {'loss': 0.5195, 'grad_norm': 7.220042667589109, 'learning_rate': 3.3206635266851707e-06, 'epoch': 0.41} 41%|████ | 5070/12313 [3:47:47<5:28:50, 2.72s/it] 41%|████ | 5071/12313 [3:47:50<5:37:16, 2.79s/it] {'loss': 0.5241, 'grad_norm': 5.446095850430871, 'learning_rate': 3.320042317910157e-06, 'epoch': 0.41} 41%|████ | 5071/12313 [3:47:50<5:37:16, 2.79s/it] 41%|████ | 5072/12313 [3:47:53<5:41:30, 2.83s/it] {'loss': 0.6736, 'grad_norm': 4.589008245862078, 'learning_rate': 3.319421052392556e-06, 'epoch': 0.41} 41%|████ | 5072/12313 [3:47:53<5:41:30, 2.83s/it] 41%|████ | 5073/12313 [3:47:55<5:32:22, 2.75s/it] {'loss': 0.5927, 'grad_norm': 9.586999090184612, 'learning_rate': 3.318799730175354e-06, 'epoch': 0.41} 41%|████ | 5073/12313 [3:47:55<5:32:22, 2.75s/it] 41%|████ | 5074/12313 [3:47:58<5:23:07, 2.68s/it] {'loss': 0.5262, 'grad_norm': 3.790231648461632, 'learning_rate': 3.3181783513015443e-06, 'epoch': 0.41} 41%|████ | 5074/12313 [3:47:58<5:23:07, 2.68s/it] 41%|████ | 5075/12313 [3:48:01<5:23:42, 2.68s/it] {'loss': 0.5578, 'grad_norm': 7.8339239381427435, 'learning_rate': 3.317556915814123e-06, 'epoch': 0.41} 41%|████ | 5075/12313 [3:48:01<5:23:42, 2.68s/it] 41%|████ | 5076/12313 [3:48:03<5:23:29, 2.68s/it] {'loss': 0.4758, 'grad_norm': 4.189621669708876, 'learning_rate': 3.31693542375609e-06, 'epoch': 0.41} 41%|████ | 5076/12313 [3:48:03<5:23:29, 2.68s/it] 41%|████ | 5077/12313 [3:48:06<5:23:27, 2.68s/it] {'loss': 0.5989, 'grad_norm': 3.23569345024193, 'learning_rate': 3.316313875170449e-06, 'epoch': 0.41} 41%|████ | 5077/12313 [3:48:06<5:23:27, 2.68s/it] 41%|████ | 5078/12313 [3:48:09<5:15:28, 2.62s/it] {'loss': 0.4451, 'grad_norm': 5.092573629488183, 'learning_rate': 3.3156922701002082e-06, 'epoch': 0.41} 41%|████ | 5078/12313 [3:48:09<5:15:28, 2.62s/it] 41%|████ | 5079/12313 [3:48:11<5:25:50, 2.70s/it] {'loss': 0.6206, 'grad_norm': 3.7634789724489774, 'learning_rate': 3.3150706085883795e-06, 'epoch': 0.41} 41%|████ | 5079/12313 [3:48:11<5:25:50, 2.70s/it] 41%|████▏ | 5080/12313 [3:48:14<5:18:42, 2.64s/it] {'loss': 0.5531, 'grad_norm': 9.494128622068938, 'learning_rate': 3.3144488906779775e-06, 'epoch': 0.41} 41%|████▏ | 5080/12313 [3:48:14<5:18:42, 2.64s/it] 41%|████▏ | 5081/12313 [3:48:17<5:17:56, 2.64s/it] {'loss': 0.4668, 'grad_norm': 5.01807369882204, 'learning_rate': 3.3138271164120235e-06, 'epoch': 0.41} 41%|████▏ | 5081/12313 [3:48:17<5:17:56, 2.64s/it] 41%|████▏ | 5082/12313 [3:48:19<5:10:43, 2.58s/it] {'loss': 0.471, 'grad_norm': 4.224562759189403, 'learning_rate': 3.3132052858335405e-06, 'epoch': 0.41} 41%|████▏ | 5082/12313 [3:48:19<5:10:43, 2.58s/it] 41%|████▏ | 5083/12313 [3:48:22<5:19:22, 2.65s/it] {'loss': 0.478, 'grad_norm': 4.452581770491441, 'learning_rate': 3.312583398985555e-06, 'epoch': 0.41} 41%|████▏ | 5083/12313 [3:48:22<5:19:22, 2.65s/it] 41%|████▏ | 5084/12313 [3:48:25<5:23:16, 2.68s/it] {'loss': 0.7315, 'grad_norm': 6.547729750220411, 'learning_rate': 3.3119614559110986e-06, 'epoch': 0.41} 41%|████▏ | 5084/12313 [3:48:25<5:23:16, 2.68s/it] 41%|████▏ | 5085/12313 [3:48:27<5:27:59, 2.72s/it] {'loss': 0.4778, 'grad_norm': 3.7189334139526307, 'learning_rate': 3.3113394566532076e-06, 'epoch': 0.41} 41%|████▏ | 5085/12313 [3:48:27<5:27:59, 2.72s/it] 41%|████▏ | 5086/12313 [3:48:30<5:29:43, 2.74s/it] {'loss': 0.6956, 'grad_norm': 5.682620271914815, 'learning_rate': 3.310717401254919e-06, 'epoch': 0.41} 41%|████▏ | 5086/12313 [3:48:30<5:29:43, 2.74s/it] 41%|████▏ | 5087/12313 [3:48:33<5:24:40, 2.70s/it] {'loss': 0.4507, 'grad_norm': 5.539361693130007, 'learning_rate': 3.3100952897592774e-06, 'epoch': 0.41} 41%|████▏ | 5087/12313 [3:48:33<5:24:40, 2.70s/it] 41%|████▏ | 5088/12313 [3:48:36<5:29:38, 2.74s/it] {'loss': 0.5914, 'grad_norm': 5.822291658506241, 'learning_rate': 3.3094731222093297e-06, 'epoch': 0.41} 41%|████▏ | 5088/12313 [3:48:36<5:29:38, 2.74s/it] 41%|████▏ | 5089/12313 [3:48:38<5:28:10, 2.73s/it] {'loss': 0.4881, 'grad_norm': 4.73471914454109, 'learning_rate': 3.3088508986481256e-06, 'epoch': 0.41} 41%|████▏ | 5089/12313 [3:48:38<5:28:10, 2.73s/it] 41%|████▏ | 5090/12313 [3:48:41<5:18:52, 2.65s/it] {'loss': 0.603, 'grad_norm': 6.22074010943859, 'learning_rate': 3.30822861911872e-06, 'epoch': 0.41} 41%|████▏ | 5090/12313 [3:48:41<5:18:52, 2.65s/it] 41%|████▏ | 5091/12313 [3:48:43<5:15:56, 2.62s/it] {'loss': 0.43, 'grad_norm': 6.1051785076416385, 'learning_rate': 3.3076062836641716e-06, 'epoch': 0.41} 41%|████▏ | 5091/12313 [3:48:43<5:15:56, 2.62s/it] 41%|████▏ | 5092/12313 [3:48:46<5:19:04, 2.65s/it] {'loss': 0.4868, 'grad_norm': 4.391606365906377, 'learning_rate': 3.306983892327542e-06, 'epoch': 0.41} 41%|████▏ | 5092/12313 [3:48:46<5:19:04, 2.65s/it] 41%|████▏ | 5093/12313 [3:48:49<5:18:54, 2.65s/it] {'loss': 0.5556, 'grad_norm': 3.796872062788742, 'learning_rate': 3.306361445151899e-06, 'epoch': 0.41} 41%|████▏ | 5093/12313 [3:48:49<5:18:54, 2.65s/it] 41%|████▏ | 5094/12313 [3:48:51<5:15:31, 2.62s/it] {'loss': 0.5305, 'grad_norm': 5.3352051929174165, 'learning_rate': 3.3057389421803104e-06, 'epoch': 0.41} 41%|████▏ | 5094/12313 [3:48:51<5:15:31, 2.62s/it] 41%|████▏ | 5095/12313 [3:48:54<5:14:52, 2.62s/it] {'loss': 0.8252, 'grad_norm': 5.507108563864587, 'learning_rate': 3.305116383455852e-06, 'epoch': 0.41} 41%|████▏ | 5095/12313 [3:48:54<5:14:52, 2.62s/it] 41%|████▏ | 5096/12313 [3:48:56<5:15:57, 2.63s/it] {'loss': 0.5663, 'grad_norm': 3.239020837952215, 'learning_rate': 3.304493769021601e-06, 'epoch': 0.41} 41%|████▏ | 5096/12313 [3:48:56<5:15:57, 2.63s/it] 41%|████▏ | 5097/12313 [3:48:59<5:19:54, 2.66s/it] {'loss': 0.5291, 'grad_norm': 5.771163705049402, 'learning_rate': 3.3038710989206386e-06, 'epoch': 0.41} 41%|████▏ | 5097/12313 [3:48:59<5:19:54, 2.66s/it] 41%|████▏ | 5098/12313 [3:49:02<5:16:53, 2.64s/it] {'loss': 0.5858, 'grad_norm': 7.098970415020295, 'learning_rate': 3.303248373196051e-06, 'epoch': 0.41} 41%|████▏ | 5098/12313 [3:49:02<5:16:53, 2.64s/it] 41%|████▏ | 5099/12313 [3:49:05<5:21:01, 2.67s/it] {'loss': 0.4381, 'grad_norm': 6.2763287339806535, 'learning_rate': 3.3026255918909267e-06, 'epoch': 0.41} 41%|████▏ | 5099/12313 [3:49:05<5:21:01, 2.67s/it] 41%|████▏ | 5100/12313 [3:49:07<5:29:00, 2.74s/it] {'loss': 0.7504, 'grad_norm': 3.6575950544234708, 'learning_rate': 3.302002755048359e-06, 'epoch': 0.41} 41%|████▏ | 5100/12313 [3:49:07<5:29:00, 2.74s/it] 41%|████▏ | 5101/12313 [3:49:11<5:42:01, 2.85s/it] {'loss': 0.3803, 'grad_norm': 4.90381745333394, 'learning_rate': 3.3013798627114457e-06, 'epoch': 0.41} 41%|████▏ | 5101/12313 [3:49:11<5:42:01, 2.85s/it] 41%|████▏ | 5102/12313 [3:49:13<5:40:05, 2.83s/it] {'loss': 0.4982, 'grad_norm': 4.885461667100054, 'learning_rate': 3.300756914923287e-06, 'epoch': 0.41} 41%|████▏ | 5102/12313 [3:49:13<5:40:05, 2.83s/it] 41%|████▏ | 5103/12313 [3:49:16<5:34:05, 2.78s/it] {'loss': 0.6321, 'grad_norm': 4.517776876693077, 'learning_rate': 3.3001339117269883e-06, 'epoch': 0.41} 41%|████▏ | 5103/12313 [3:49:16<5:34:05, 2.78s/it] 41%|████▏ | 5104/12313 [3:49:19<5:24:53, 2.70s/it] {'loss': 0.4652, 'grad_norm': 4.594992279285769, 'learning_rate': 3.2995108531656566e-06, 'epoch': 0.41} 41%|████▏ | 5104/12313 [3:49:19<5:24:53, 2.70s/it] 41%|████▏ | 5105/12313 [3:49:21<5:24:02, 2.70s/it] {'loss': 0.5871, 'grad_norm': 4.353302582897617, 'learning_rate': 3.298887739282406e-06, 'epoch': 0.41} 41%|████▏ | 5105/12313 [3:49:21<5:24:02, 2.70s/it] 41%|████▏ | 5106/12313 [3:49:24<5:20:25, 2.67s/it] {'loss': 0.7833, 'grad_norm': 8.458802751943896, 'learning_rate': 3.298264570120351e-06, 'epoch': 0.41} 41%|████▏ | 5106/12313 [3:49:24<5:20:25, 2.67s/it] 41%|████▏ | 5107/12313 [3:49:26<5:17:05, 2.64s/it] {'loss': 0.5037, 'grad_norm': 3.170384706895752, 'learning_rate': 3.297641345722613e-06, 'epoch': 0.41} 41%|████▏ | 5107/12313 [3:49:26<5:17:05, 2.64s/it] 41%|████▏ | 5108/12313 [3:49:29<5:19:26, 2.66s/it] {'loss': 0.6279, 'grad_norm': 2.9676348563178094, 'learning_rate': 3.2970180661323155e-06, 'epoch': 0.41} 41%|████▏ | 5108/12313 [3:49:29<5:19:26, 2.66s/it] 41%|████▏ | 5109/12313 [3:49:32<5:19:05, 2.66s/it] {'loss': 0.559, 'grad_norm': 4.616218599976382, 'learning_rate': 3.2963947313925857e-06, 'epoch': 0.41} 41%|████▏ | 5109/12313 [3:49:32<5:19:05, 2.66s/it] 42%|████▏ | 5110/12313 [3:49:34<5:22:04, 2.68s/it] {'loss': 0.5066, 'grad_norm': 6.559911673941787, 'learning_rate': 3.295771341546555e-06, 'epoch': 0.42} 42%|████▏ | 5110/12313 [3:49:34<5:22:04, 2.68s/it] 42%|████▏ | 5111/12313 [3:49:37<5:19:18, 2.66s/it] {'loss': 0.4724, 'grad_norm': 4.702456652104927, 'learning_rate': 3.2951478966373602e-06, 'epoch': 0.42} 42%|████▏ | 5111/12313 [3:49:37<5:19:18, 2.66s/it] 42%|████▏ | 5112/12313 [3:49:40<5:20:19, 2.67s/it] {'loss': 0.5304, 'grad_norm': 6.625773908660462, 'learning_rate': 3.2945243967081386e-06, 'epoch': 0.42} 42%|████▏ | 5112/12313 [3:49:40<5:20:19, 2.67s/it] 42%|████▏ | 5113/12313 [3:49:43<5:27:59, 2.73s/it] {'loss': 0.5455, 'grad_norm': 4.794677100016037, 'learning_rate': 3.2939008418020334e-06, 'epoch': 0.42} 42%|████▏ | 5113/12313 [3:49:43<5:27:59, 2.73s/it] 42%|████▏ | 5114/12313 [3:49:46<5:40:05, 2.83s/it] {'loss': 0.4838, 'grad_norm': 4.228847169637584, 'learning_rate': 3.293277231962192e-06, 'epoch': 0.42} 42%|████▏ | 5114/12313 [3:49:46<5:40:05, 2.83s/it] 42%|████▏ | 5115/12313 [3:49:48<5:31:56, 2.77s/it] {'loss': 0.5851, 'grad_norm': 5.780889261663269, 'learning_rate': 3.292653567231765e-06, 'epoch': 0.42} 42%|████▏ | 5115/12313 [3:49:48<5:31:56, 2.77s/it] 42%|████▏ | 5116/12313 [3:49:51<5:39:45, 2.83s/it] {'loss': 0.4035, 'grad_norm': 3.5767019478443514, 'learning_rate': 3.2920298476539047e-06, 'epoch': 0.42} 42%|████▏ | 5116/12313 [3:49:51<5:39:45, 2.83s/it] 42%|████▏ | 5117/12313 [3:49:54<5:37:29, 2.81s/it] {'loss': 0.4834, 'grad_norm': 3.173589475313961, 'learning_rate': 3.2914060732717725e-06, 'epoch': 0.42} 42%|████▏ | 5117/12313 [3:49:54<5:37:29, 2.81s/it] 42%|████▏ | 5118/12313 [3:49:57<5:32:27, 2.77s/it] {'loss': 0.4122, 'grad_norm': 5.702832664803554, 'learning_rate': 3.290782244128527e-06, 'epoch': 0.42} 42%|████▏ | 5118/12313 [3:49:57<5:32:27, 2.77s/it] 42%|████▏ | 5119/12313 [3:50:00<5:37:04, 2.81s/it] {'loss': 0.7049, 'grad_norm': 4.350245123381758, 'learning_rate': 3.290158360267336e-06, 'epoch': 0.42} 42%|████▏ | 5119/12313 [3:50:00<5:37:04, 2.81s/it] 42%|████▏ | 5120/12313 [3:50:02<5:35:31, 2.80s/it] {'loss': 0.5299, 'grad_norm': 3.6272992204161887, 'learning_rate': 3.2895344217313683e-06, 'epoch': 0.42} 42%|████▏ | 5120/12313 [3:50:02<5:35:31, 2.80s/it] 42%|████▏ | 5121/12313 [3:50:05<5:35:08, 2.80s/it] {'loss': 0.5795, 'grad_norm': 3.9213077791760966, 'learning_rate': 3.2889104285637967e-06, 'epoch': 0.42} 42%|████▏ | 5121/12313 [3:50:05<5:35:08, 2.80s/it] 42%|████▏ | 5122/12313 [3:50:08<5:31:31, 2.77s/it] {'loss': 0.5066, 'grad_norm': 4.304699561537021, 'learning_rate': 3.2882863808077993e-06, 'epoch': 0.42} 42%|████▏ | 5122/12313 [3:50:08<5:31:31, 2.77s/it] 42%|████▏ | 5123/12313 [3:50:11<5:30:40, 2.76s/it] {'loss': 0.4601, 'grad_norm': 13.4372391747002, 'learning_rate': 3.287662278506556e-06, 'epoch': 0.42} 42%|████▏ | 5123/12313 [3:50:11<5:30:40, 2.76s/it] 42%|████▏ | 5124/12313 [3:50:13<5:19:32, 2.67s/it] {'loss': 0.4771, 'grad_norm': 8.076930536538704, 'learning_rate': 3.2870381217032522e-06, 'epoch': 0.42} 42%|████▏ | 5124/12313 [3:50:13<5:19:32, 2.67s/it] 42%|████▏ | 5125/12313 [3:50:16<5:20:14, 2.67s/it] {'loss': 0.3819, 'grad_norm': 5.712865061327913, 'learning_rate': 3.2864139104410753e-06, 'epoch': 0.42} 42%|████▏ | 5125/12313 [3:50:16<5:20:14, 2.67s/it] 42%|████▏ | 5126/12313 [3:50:18<5:17:34, 2.65s/it] {'loss': 0.5816, 'grad_norm': 8.244135425876427, 'learning_rate': 3.2857896447632174e-06, 'epoch': 0.42} 42%|████▏ | 5126/12313 [3:50:18<5:17:34, 2.65s/it] 42%|████▏ | 5127/12313 [3:50:21<5:28:11, 2.74s/it] {'loss': 0.5813, 'grad_norm': 3.9304755927131785, 'learning_rate': 3.2851653247128755e-06, 'epoch': 0.42} 42%|████▏ | 5127/12313 [3:50:21<5:28:11, 2.74s/it] 42%|████▏ | 5128/12313 [3:50:24<5:21:38, 2.69s/it] {'loss': 0.7026, 'grad_norm': 3.476794981739987, 'learning_rate': 3.2845409503332488e-06, 'epoch': 0.42} 42%|████▏ | 5128/12313 [3:50:24<5:21:38, 2.69s/it] 42%|████▏ | 5129/12313 [3:50:27<5:29:07, 2.75s/it] {'loss': 0.3883, 'grad_norm': 6.988919037400744, 'learning_rate': 3.2839165216675396e-06, 'epoch': 0.42} 42%|████▏ | 5129/12313 [3:50:27<5:29:07, 2.75s/it] 42%|████▏ | 5130/12313 [3:50:30<5:28:03, 2.74s/it] {'loss': 0.6782, 'grad_norm': 9.043393482730245, 'learning_rate': 3.283292038758956e-06, 'epoch': 0.42} 42%|████▏ | 5130/12313 [3:50:30<5:28:03, 2.74s/it] 42%|████▏ | 5131/12313 [3:50:32<5:35:56, 2.81s/it] {'loss': 0.3717, 'grad_norm': 9.047180698274353, 'learning_rate': 3.2826675016507094e-06, 'epoch': 0.42} 42%|████▏ | 5131/12313 [3:50:33<5:35:56, 2.81s/it] 42%|████▏ | 5132/12313 [3:50:35<5:28:20, 2.74s/it] {'loss': 0.5285, 'grad_norm': 4.352324808412342, 'learning_rate': 3.2820429103860133e-06, 'epoch': 0.42} 42%|████▏ | 5132/12313 [3:50:35<5:28:20, 2.74s/it] 42%|████▏ | 5133/12313 [3:50:38<5:25:13, 2.72s/it] {'loss': 0.571, 'grad_norm': 4.3428722591294155, 'learning_rate': 3.281418265008087e-06, 'epoch': 0.42} 42%|████▏ | 5133/12313 [3:50:38<5:25:13, 2.72s/it] 42%|████▏ | 5134/12313 [3:50:41<5:29:53, 2.76s/it] {'loss': 0.6332, 'grad_norm': 4.529875570698177, 'learning_rate': 3.280793565560153e-06, 'epoch': 0.42} 42%|████▏ | 5134/12313 [3:50:41<5:29:53, 2.76s/it] 42%|████▏ | 5135/12313 [3:50:44<5:37:01, 2.82s/it] {'loss': 0.4528, 'grad_norm': 10.119978089195198, 'learning_rate': 3.280168812085436e-06, 'epoch': 0.42} 42%|████▏ | 5135/12313 [3:50:44<5:37:01, 2.82s/it] 42%|████▏ | 5136/12313 [3:50:46<5:31:59, 2.78s/it] {'loss': 0.6027, 'grad_norm': 3.973556997891654, 'learning_rate': 3.279544004627166e-06, 'epoch': 0.42} 42%|████▏ | 5136/12313 [3:50:46<5:31:59, 2.78s/it] 42%|████▏ | 5137/12313 [3:50:49<5:19:51, 2.67s/it] {'loss': 0.5312, 'grad_norm': 3.7856125466570814, 'learning_rate': 3.2789191432285767e-06, 'epoch': 0.42} 42%|████▏ | 5137/12313 [3:50:49<5:19:51, 2.67s/it] 42%|████▏ | 5138/12313 [3:50:52<5:49:20, 2.92s/it] {'loss': 0.4928, 'grad_norm': 3.897046711276253, 'learning_rate': 3.278294227932905e-06, 'epoch': 0.42} 42%|████▏ | 5138/12313 [3:50:52<5:49:20, 2.92s/it] 42%|████▏ | 5139/12313 [3:50:55<6:01:09, 3.02s/it] {'loss': 0.4981, 'grad_norm': 3.9807540904239715, 'learning_rate': 3.277669258783391e-06, 'epoch': 0.42} 42%|████▏ | 5139/12313 [3:50:55<6:01:09, 3.02s/it] 42%|████▏ | 5140/12313 [3:50:58<5:49:09, 2.92s/it] {'loss': 0.5247, 'grad_norm': 4.193176888500685, 'learning_rate': 3.277044235823281e-06, 'epoch': 0.42} 42%|████▏ | 5140/12313 [3:50:58<5:49:09, 2.92s/it] 42%|████▏ | 5141/12313 [3:51:01<5:36:29, 2.81s/it] {'loss': 0.5928, 'grad_norm': 4.102295880640859, 'learning_rate': 3.2764191590958234e-06, 'epoch': 0.42} 42%|████▏ | 5141/12313 [3:51:01<5:36:29, 2.81s/it] 42%|████▏ | 5142/12313 [3:51:03<5:29:01, 2.75s/it] {'loss': 0.4061, 'grad_norm': 4.4166185861935, 'learning_rate': 3.2757940286442676e-06, 'epoch': 0.42} 42%|████▏ | 5142/12313 [3:51:03<5:29:01, 2.75s/it] 42%|████▏ | 5143/12313 [3:51:06<5:22:22, 2.70s/it] {'loss': 0.5939, 'grad_norm': 6.205963915441601, 'learning_rate': 3.2751688445118705e-06, 'epoch': 0.42} 42%|████▏ | 5143/12313 [3:51:06<5:22:22, 2.70s/it] 42%|████▏ | 5144/12313 [3:51:08<5:14:07, 2.63s/it] {'loss': 0.5445, 'grad_norm': 9.375588558168431, 'learning_rate': 3.2745436067418934e-06, 'epoch': 0.42} 42%|████▏ | 5144/12313 [3:51:08<5:14:07, 2.63s/it] 42%|████▏ | 5145/12313 [3:51:11<5:23:26, 2.71s/it] {'loss': 0.3809, 'grad_norm': 7.231124759113749, 'learning_rate': 3.2739183153775964e-06, 'epoch': 0.42} 42%|████▏ | 5145/12313 [3:51:11<5:23:26, 2.71s/it] 42%|████▏ | 5146/12313 [3:51:14<5:29:39, 2.76s/it] {'loss': 0.4421, 'grad_norm': 5.36109001138212, 'learning_rate': 3.2732929704622485e-06, 'epoch': 0.42} 42%|████▏ | 5146/12313 [3:51:14<5:29:39, 2.76s/it] 42%|████▏ | 5147/12313 [3:51:17<5:31:16, 2.77s/it] {'loss': 0.4925, 'grad_norm': 5.1614659053586545, 'learning_rate': 3.2726675720391203e-06, 'epoch': 0.42} 42%|████▏ | 5147/12313 [3:51:17<5:31:16, 2.77s/it] 42%|████▏ | 5148/12313 [3:51:20<5:30:53, 2.77s/it] {'loss': 0.7233, 'grad_norm': 5.095668465921769, 'learning_rate': 3.272042120151485e-06, 'epoch': 0.42} 42%|████▏ | 5148/12313 [3:51:20<5:30:53, 2.77s/it] 42%|████▏ | 5149/12313 [3:51:22<5:27:19, 2.74s/it] {'loss': 0.4635, 'grad_norm': 6.655912818912801, 'learning_rate': 3.2714166148426204e-06, 'epoch': 0.42} 42%|████▏ | 5149/12313 [3:51:22<5:27:19, 2.74s/it] 42%|████▏ | 5150/12313 [3:51:25<5:21:47, 2.70s/it] {'loss': 0.5281, 'grad_norm': 5.621140760979749, 'learning_rate': 3.27079105615581e-06, 'epoch': 0.42} 42%|████▏ | 5150/12313 [3:51:25<5:21:47, 2.70s/it] 42%|████▏ | 5151/12313 [3:51:28<5:20:18, 2.68s/it] {'loss': 0.5175, 'grad_norm': 5.0469957881501255, 'learning_rate': 3.2701654441343365e-06, 'epoch': 0.42} 42%|████▏ | 5151/12313 [3:51:28<5:20:18, 2.68s/it] 42%|████▏ | 5152/12313 [3:51:30<5:21:22, 2.69s/it] {'loss': 0.4824, 'grad_norm': 4.710792152616251, 'learning_rate': 3.269539778821491e-06, 'epoch': 0.42} 42%|████▏ | 5152/12313 [3:51:30<5:21:22, 2.69s/it] 42%|████▏ | 5153/12313 [3:51:33<5:25:21, 2.73s/it] {'loss': 0.6031, 'grad_norm': 5.248614874618813, 'learning_rate': 3.268914060260565e-06, 'epoch': 0.42} 42%|████▏ | 5153/12313 [3:51:33<5:25:21, 2.73s/it] 42%|████▏ | 5154/12313 [3:51:36<5:13:42, 2.63s/it] {'loss': 0.445, 'grad_norm': 3.7091517078378096, 'learning_rate': 3.2682882884948557e-06, 'epoch': 0.42} 42%|████▏ | 5154/12313 [3:51:36<5:13:42, 2.63s/it] 42%|████▏ | 5155/12313 [3:51:39<5:26:48, 2.74s/it] {'loss': 0.4225, 'grad_norm': 7.473085780555477, 'learning_rate': 3.2676624635676637e-06, 'epoch': 0.42} 42%|████▏ | 5155/12313 [3:51:39<5:26:48, 2.74s/it] 42%|████▏ | 5156/12313 [3:51:41<5:25:13, 2.73s/it] {'loss': 0.3815, 'grad_norm': 5.984937157684743, 'learning_rate': 3.267036585522291e-06, 'epoch': 0.42} 42%|████▏ | 5156/12313 [3:51:41<5:25:13, 2.73s/it] 42%|████▏ | 5157/12313 [3:51:44<5:22:17, 2.70s/it] {'loss': 0.4595, 'grad_norm': 7.910654251052096, 'learning_rate': 3.2664106544020464e-06, 'epoch': 0.42} 42%|████▏ | 5157/12313 [3:51:44<5:22:17, 2.70s/it] 42%|████▏ | 5158/12313 [3:51:46<5:17:46, 2.66s/it] {'loss': 0.5105, 'grad_norm': 5.447690662144098, 'learning_rate': 3.2657846702502404e-06, 'epoch': 0.42} 42%|████▏ | 5158/12313 [3:51:46<5:17:46, 2.66s/it] 42%|████▏ | 5159/12313 [3:51:49<5:17:16, 2.66s/it] {'loss': 0.4271, 'grad_norm': 6.61183045497076, 'learning_rate': 3.2651586331101887e-06, 'epoch': 0.42} 42%|████▏ | 5159/12313 [3:51:49<5:17:16, 2.66s/it] 42%|████▏ | 5160/12313 [3:51:52<5:19:07, 2.68s/it] {'loss': 0.5066, 'grad_norm': 6.227375988300761, 'learning_rate': 3.2645325430252096e-06, 'epoch': 0.42} 42%|████▏ | 5160/12313 [3:51:52<5:19:07, 2.68s/it] 42%|████▏ | 5161/12313 [3:51:54<5:10:34, 2.61s/it] {'loss': 0.5637, 'grad_norm': 3.5873779321139714, 'learning_rate': 3.2639064000386236e-06, 'epoch': 0.42} 42%|████▏ | 5161/12313 [3:51:54<5:10:34, 2.61s/it] 42%|████▏ | 5162/12313 [3:51:57<5:20:38, 2.69s/it] {'loss': 0.4754, 'grad_norm': 3.2262674778585376, 'learning_rate': 3.2632802041937574e-06, 'epoch': 0.42} 42%|████▏ | 5162/12313 [3:51:57<5:20:38, 2.69s/it] 42%|████▏ | 5163/12313 [3:52:00<5:17:07, 2.66s/it] {'loss': 0.4554, 'grad_norm': 4.976278018847007, 'learning_rate': 3.262653955533942e-06, 'epoch': 0.42} 42%|████▏ | 5163/12313 [3:52:00<5:17:07, 2.66s/it] 42%|████▏ | 5164/12313 [3:52:03<5:34:25, 2.81s/it] {'loss': 0.5582, 'grad_norm': 7.7409127238618245, 'learning_rate': 3.262027654102508e-06, 'epoch': 0.42} 42%|████▏ | 5164/12313 [3:52:03<5:34:25, 2.81s/it] 42%|████▏ | 5165/12313 [3:52:05<5:27:06, 2.75s/it] {'loss': 0.4712, 'grad_norm': 5.01210383268172, 'learning_rate': 3.2614012999427934e-06, 'epoch': 0.42} 42%|████▏ | 5165/12313 [3:52:05<5:27:06, 2.75s/it] 42%|████▏ | 5166/12313 [3:52:08<5:23:29, 2.72s/it] {'loss': 0.7056, 'grad_norm': 4.527219081383319, 'learning_rate': 3.26077489309814e-06, 'epoch': 0.42} 42%|████▏ | 5166/12313 [3:52:08<5:23:29, 2.72s/it] 42%|████▏ | 5167/12313 [3:52:11<5:35:23, 2.82s/it] {'loss': 0.6324, 'grad_norm': 4.790861705461241, 'learning_rate': 3.2601484336118887e-06, 'epoch': 0.42} 42%|████▏ | 5167/12313 [3:52:11<5:35:23, 2.82s/it] 42%|████▏ | 5168/12313 [3:52:14<5:22:32, 2.71s/it] {'loss': 0.4605, 'grad_norm': 5.6189045494809875, 'learning_rate': 3.2595219215273895e-06, 'epoch': 0.42} 42%|████▏ | 5168/12313 [3:52:14<5:22:32, 2.71s/it] 42%|████▏ | 5169/12313 [3:52:16<5:12:04, 2.62s/it] {'loss': 0.4653, 'grad_norm': 3.3926840995979206, 'learning_rate': 3.258895356887993e-06, 'epoch': 0.42} 42%|████▏ | 5169/12313 [3:52:16<5:12:04, 2.62s/it] 42%|████▏ | 5170/12313 [3:52:19<5:13:24, 2.63s/it] {'loss': 0.3422, 'grad_norm': 4.7133012677803405, 'learning_rate': 3.2582687397370538e-06, 'epoch': 0.42} 42%|████▏ | 5170/12313 [3:52:19<5:13:24, 2.63s/it] 42%|████▏ | 5171/12313 [3:52:21<5:08:18, 2.59s/it] {'loss': 0.4689, 'grad_norm': 6.7508413558662355, 'learning_rate': 3.257642070117931e-06, 'epoch': 0.42} 42%|████▏ | 5171/12313 [3:52:21<5:08:18, 2.59s/it] 42%|████▏ | 5172/12313 [3:52:24<5:06:53, 2.58s/it] {'loss': 0.5847, 'grad_norm': 7.457776695287912, 'learning_rate': 3.2570153480739867e-06, 'epoch': 0.42} 42%|████▏ | 5172/12313 [3:52:24<5:06:53, 2.58s/it] 42%|████▏ | 5173/12313 [3:52:26<5:07:20, 2.58s/it] {'loss': 0.5858, 'grad_norm': 5.26972034775471, 'learning_rate': 3.2563885736485873e-06, 'epoch': 0.42} 42%|████▏ | 5173/12313 [3:52:26<5:07:20, 2.58s/it] 42%|████▏ | 5174/12313 [3:52:29<5:17:19, 2.67s/it] {'loss': 0.437, 'grad_norm': 2.8327813340648325, 'learning_rate': 3.255761746885101e-06, 'epoch': 0.42} 42%|████▏ | 5174/12313 [3:52:29<5:17:19, 2.67s/it] 42%|████▏ | 5175/12313 [3:52:32<5:22:00, 2.71s/it] {'loss': 0.6866, 'grad_norm': 4.515606181218866, 'learning_rate': 3.2551348678269023e-06, 'epoch': 0.42} 42%|████▏ | 5175/12313 [3:52:32<5:22:00, 2.71s/it] 42%|████▏ | 5176/12313 [3:52:35<5:24:57, 2.73s/it] {'loss': 0.4457, 'grad_norm': 4.52820690426378, 'learning_rate': 3.2545079365173672e-06, 'epoch': 0.42} 42%|████▏ | 5176/12313 [3:52:35<5:24:57, 2.73s/it] 42%|████▏ | 5177/12313 [3:52:39<6:03:47, 3.06s/it] {'loss': 0.501, 'grad_norm': 5.8094460720614505, 'learning_rate': 3.253880952999876e-06, 'epoch': 0.42} 42%|████▏ | 5177/12313 [3:52:39<6:03:47, 3.06s/it] 42%|████▏ | 5178/12313 [3:52:41<5:51:54, 2.96s/it] {'loss': 0.6308, 'grad_norm': 9.872169395018048, 'learning_rate': 3.2532539173178125e-06, 'epoch': 0.42} 42%|████▏ | 5178/12313 [3:52:41<5:51:54, 2.96s/it] 42%|████▏ | 5179/12313 [3:52:44<5:41:12, 2.87s/it] {'loss': 0.5709, 'grad_norm': 8.660617050371519, 'learning_rate': 3.2526268295145647e-06, 'epoch': 0.42} 42%|████▏ | 5179/12313 [3:52:44<5:41:12, 2.87s/it] 42%|████▏ | 5180/12313 [3:52:47<5:30:15, 2.78s/it] {'loss': 0.4353, 'grad_norm': 4.736443287957975, 'learning_rate': 3.251999689633523e-06, 'epoch': 0.42} 42%|████▏ | 5180/12313 [3:52:47<5:30:15, 2.78s/it] 42%|████▏ | 5181/12313 [3:52:49<5:28:07, 2.76s/it] {'loss': 0.623, 'grad_norm': 10.700216416621961, 'learning_rate': 3.2513724977180828e-06, 'epoch': 0.42} 42%|████▏ | 5181/12313 [3:52:49<5:28:07, 2.76s/it] 42%|████▏ | 5182/12313 [3:52:52<5:24:19, 2.73s/it] {'loss': 0.4766, 'grad_norm': 11.331063237198684, 'learning_rate': 3.250745253811643e-06, 'epoch': 0.42} 42%|████▏ | 5182/12313 [3:52:52<5:24:19, 2.73s/it] 42%|████▏ | 5183/12313 [3:52:55<5:28:22, 2.76s/it] {'loss': 0.5879, 'grad_norm': 4.634568286716608, 'learning_rate': 3.250117957957604e-06, 'epoch': 0.42} 42%|████▏ | 5183/12313 [3:52:55<5:28:22, 2.76s/it] 42%|████▏ | 5184/12313 [3:52:58<5:31:29, 2.79s/it] {'loss': 0.4788, 'grad_norm': 4.031580348842282, 'learning_rate': 3.249490610199373e-06, 'epoch': 0.42} 42%|████▏ | 5184/12313 [3:52:58<5:31:29, 2.79s/it] 42%|████▏ | 5185/12313 [3:53:00<5:27:00, 2.75s/it] {'loss': 0.4319, 'grad_norm': 5.071428970099418, 'learning_rate': 3.248863210580358e-06, 'epoch': 0.42} 42%|████▏ | 5185/12313 [3:53:00<5:27:00, 2.75s/it] 42%|████▏ | 5186/12313 [3:53:03<5:16:40, 2.67s/it] {'loss': 0.4285, 'grad_norm': 4.271403990802303, 'learning_rate': 3.248235759143972e-06, 'epoch': 0.42} 42%|████▏ | 5186/12313 [3:53:03<5:16:40, 2.67s/it] 42%|████▏ | 5187/12313 [3:53:05<5:14:27, 2.65s/it] {'loss': 0.5508, 'grad_norm': 5.620600198973997, 'learning_rate': 3.247608255933632e-06, 'epoch': 0.42} 42%|████▏ | 5187/12313 [3:53:05<5:14:27, 2.65s/it] 42%|████▏ | 5188/12313 [3:53:08<5:16:02, 2.66s/it] {'loss': 0.5465, 'grad_norm': 3.5656379095963366, 'learning_rate': 3.2469807009927568e-06, 'epoch': 0.42} 42%|████▏ | 5188/12313 [3:53:08<5:16:02, 2.66s/it] 42%|████▏ | 5189/12313 [3:53:11<5:22:38, 2.72s/it] {'loss': 0.6901, 'grad_norm': 4.267563683154685, 'learning_rate': 3.2463530943647708e-06, 'epoch': 0.42} 42%|████▏ | 5189/12313 [3:53:11<5:22:38, 2.72s/it] 42%|████▏ | 5190/12313 [3:53:14<5:25:49, 2.74s/it] {'loss': 0.6524, 'grad_norm': 6.968445424375631, 'learning_rate': 3.2457254360931013e-06, 'epoch': 0.42} 42%|████▏ | 5190/12313 [3:53:14<5:25:49, 2.74s/it] 42%|████▏ | 5191/12313 [3:53:16<5:19:47, 2.69s/it] {'loss': 0.4577, 'grad_norm': 6.404940362152012, 'learning_rate': 3.245097726221177e-06, 'epoch': 0.42} 42%|████▏ | 5191/12313 [3:53:16<5:19:47, 2.69s/it] 42%|████▏ | 5192/12313 [3:53:19<5:20:22, 2.70s/it] {'loss': 0.487, 'grad_norm': 10.422891227755217, 'learning_rate': 3.244469964792434e-06, 'epoch': 0.42} 42%|████▏ | 5192/12313 [3:53:19<5:20:22, 2.70s/it] 42%|████▏ | 5193/12313 [3:53:22<5:30:56, 2.79s/it] {'loss': 0.5221, 'grad_norm': 4.235778167208473, 'learning_rate': 3.24384215185031e-06, 'epoch': 0.42} 42%|████▏ | 5193/12313 [3:53:22<5:30:56, 2.79s/it] 42%|████▏ | 5194/12313 [3:53:25<5:24:22, 2.73s/it] {'loss': 0.5772, 'grad_norm': 5.314122103787476, 'learning_rate': 3.2432142874382442e-06, 'epoch': 0.42} 42%|████▏ | 5194/12313 [3:53:25<5:24:22, 2.73s/it] 42%|████▏ | 5195/12313 [3:53:27<5:19:42, 2.69s/it] {'loss': 0.6579, 'grad_norm': 34.115592182605965, 'learning_rate': 3.2425863715996852e-06, 'epoch': 0.42} 42%|████▏ | 5195/12313 [3:53:27<5:19:42, 2.69s/it] 42%|████▏ | 5196/12313 [3:53:30<5:18:01, 2.68s/it] {'loss': 0.5213, 'grad_norm': 5.189198561105788, 'learning_rate': 3.241958404378078e-06, 'epoch': 0.42} 42%|████▏ | 5196/12313 [3:53:30<5:18:01, 2.68s/it] 42%|████▏ | 5197/12313 [3:53:32<5:14:10, 2.65s/it] {'loss': 0.6707, 'grad_norm': 4.933122374718706, 'learning_rate': 3.2413303858168767e-06, 'epoch': 0.42} 42%|████▏ | 5197/12313 [3:53:32<5:14:10, 2.65s/it] 42%|████▏ | 5198/12313 [3:53:35<5:23:28, 2.73s/it] {'loss': 0.4882, 'grad_norm': 4.33962971377441, 'learning_rate': 3.2407023159595356e-06, 'epoch': 0.42} 42%|████▏ | 5198/12313 [3:53:35<5:23:28, 2.73s/it] 42%|████▏ | 5199/12313 [3:53:38<5:24:24, 2.74s/it] {'loss': 0.383, 'grad_norm': 6.399904760111607, 'learning_rate': 3.2400741948495146e-06, 'epoch': 0.42} 42%|████▏ | 5199/12313 [3:53:38<5:24:24, 2.74s/it] 42%|████▏ | 5200/12313 [3:53:41<5:25:27, 2.75s/it] {'loss': 0.6851, 'grad_norm': 11.60358668566535, 'learning_rate': 3.239446022530276e-06, 'epoch': 0.42} 42%|████▏ | 5200/12313 [3:53:41<5:25:27, 2.75s/it] 42%|████▏ | 5201/12313 [3:53:43<5:17:29, 2.68s/it] {'loss': 0.5153, 'grad_norm': 10.376518506368765, 'learning_rate': 3.2388177990452863e-06, 'epoch': 0.42} 42%|████▏ | 5201/12313 [3:53:43<5:17:29, 2.68s/it] 42%|████▏ | 5202/12313 [3:53:46<5:16:20, 2.67s/it] {'loss': 0.4698, 'grad_norm': 5.087706195580212, 'learning_rate': 3.2381895244380146e-06, 'epoch': 0.42} 42%|████▏ | 5202/12313 [3:53:46<5:16:20, 2.67s/it] 42%|████▏ | 5203/12313 [3:53:49<5:23:12, 2.73s/it] {'loss': 0.5198, 'grad_norm': 3.447914836981935, 'learning_rate': 3.237561198751935e-06, 'epoch': 0.42} 42%|████▏ | 5203/12313 [3:53:49<5:23:12, 2.73s/it] 42%|████▏ | 5204/12313 [3:53:52<5:20:36, 2.71s/it] {'loss': 0.5854, 'grad_norm': 7.767514821519397, 'learning_rate': 3.2369328220305242e-06, 'epoch': 0.42} 42%|████▏ | 5204/12313 [3:53:52<5:20:36, 2.71s/it] 42%|████▏ | 5205/12313 [3:53:54<5:09:39, 2.61s/it] {'loss': 0.4495, 'grad_norm': 7.5527718662260295, 'learning_rate': 3.2363043943172616e-06, 'epoch': 0.42} 42%|████▏ | 5205/12313 [3:53:54<5:09:39, 2.61s/it] 42%|████▏ | 5206/12313 [3:53:57<5:12:14, 2.64s/it] {'loss': 0.4103, 'grad_norm': 6.408787873539771, 'learning_rate': 3.235675915655633e-06, 'epoch': 0.42} 42%|████▏ | 5206/12313 [3:53:57<5:12:14, 2.64s/it] 42%|████▏ | 5207/12313 [3:53:59<5:07:49, 2.60s/it] {'loss': 0.4829, 'grad_norm': 8.10134326274207, 'learning_rate': 3.235047386089123e-06, 'epoch': 0.42} 42%|████▏ | 5207/12313 [3:53:59<5:07:49, 2.60s/it] 42%|████▏ | 5208/12313 [3:54:02<5:07:40, 2.60s/it] {'loss': 0.5541, 'grad_norm': 4.1969050205146425, 'learning_rate': 3.2344188056612247e-06, 'epoch': 0.42} 42%|████▏ | 5208/12313 [3:54:02<5:07:40, 2.60s/it] 42%|████▏ | 5209/12313 [3:54:04<5:06:46, 2.59s/it] {'loss': 0.4104, 'grad_norm': 4.954025585754433, 'learning_rate': 3.233790174415432e-06, 'epoch': 0.42} 42%|████▏ | 5209/12313 [3:54:04<5:06:46, 2.59s/it] 42%|████▏ | 5210/12313 [3:54:07<5:06:48, 2.59s/it] {'loss': 0.6211, 'grad_norm': 5.191981721617316, 'learning_rate': 3.2331614923952424e-06, 'epoch': 0.42} 42%|████▏ | 5210/12313 [3:54:07<5:06:48, 2.59s/it] 42%|████▏ | 5211/12313 [3:54:10<5:07:35, 2.60s/it] {'loss': 0.5638, 'grad_norm': 10.46456809201404, 'learning_rate': 3.232532759644158e-06, 'epoch': 0.42} 42%|████▏ | 5211/12313 [3:54:10<5:07:35, 2.60s/it] 42%|████▏ | 5212/12313 [3:54:12<5:19:43, 2.70s/it] {'loss': 0.4139, 'grad_norm': 4.916424280031669, 'learning_rate': 3.231903976205684e-06, 'epoch': 0.42} 42%|████▏ | 5212/12313 [3:54:12<5:19:43, 2.70s/it] 42%|████▏ | 5213/12313 [3:54:15<5:20:30, 2.71s/it] {'loss': 0.3899, 'grad_norm': 4.247004575276243, 'learning_rate': 3.231275142123328e-06, 'epoch': 0.42} 42%|████▏ | 5213/12313 [3:54:15<5:20:30, 2.71s/it] 42%|████▏ | 5214/12313 [3:54:18<5:14:20, 2.66s/it] {'loss': 0.697, 'grad_norm': 4.899674792955789, 'learning_rate': 3.2306462574406024e-06, 'epoch': 0.42} 42%|████▏ | 5214/12313 [3:54:18<5:14:20, 2.66s/it] 42%|████▏ | 5215/12313 [3:54:20<5:13:30, 2.65s/it] {'loss': 0.6492, 'grad_norm': 4.934275124900704, 'learning_rate': 3.2300173222010225e-06, 'epoch': 0.42} 42%|████▏ | 5215/12313 [3:54:20<5:13:30, 2.65s/it] 42%|████▏ | 5216/12313 [3:54:23<5:18:59, 2.70s/it] {'loss': 0.5192, 'grad_norm': 4.326952640941383, 'learning_rate': 3.229388336448107e-06, 'epoch': 0.42} 42%|████▏ | 5216/12313 [3:54:23<5:18:59, 2.70s/it] 42%|████▏ | 5217/12313 [3:54:26<5:14:04, 2.66s/it] {'loss': 0.5203, 'grad_norm': 6.478214535298061, 'learning_rate': 3.22875930022538e-06, 'epoch': 0.42} 42%|████▏ | 5217/12313 [3:54:26<5:14:04, 2.66s/it] 42%|████▏ | 5218/12313 [3:54:29<5:23:18, 2.73s/it] {'loss': 0.5004, 'grad_norm': 3.7343533496893393, 'learning_rate': 3.2281302135763655e-06, 'epoch': 0.42} 42%|████▏ | 5218/12313 [3:54:29<5:23:18, 2.73s/it] 42%|████▏ | 5219/12313 [3:54:31<5:18:59, 2.70s/it] {'loss': 0.5008, 'grad_norm': 4.189623891419984, 'learning_rate': 3.227501076544594e-06, 'epoch': 0.42} 42%|████▏ | 5219/12313 [3:54:31<5:18:59, 2.70s/it] 42%|████▏ | 5220/12313 [3:54:34<5:11:08, 2.63s/it] {'loss': 0.4428, 'grad_norm': 9.421806026431668, 'learning_rate': 3.2268718891735985e-06, 'epoch': 0.42} 42%|████▏ | 5220/12313 [3:54:34<5:11:08, 2.63s/it] 42%|████▏ | 5221/12313 [3:54:36<5:10:12, 2.62s/it] {'loss': 0.5658, 'grad_norm': 8.411484589237242, 'learning_rate': 3.2262426515069144e-06, 'epoch': 0.42} 42%|████▏ | 5221/12313 [3:54:36<5:10:12, 2.62s/it] 42%|████▏ | 5222/12313 [3:54:39<5:07:36, 2.60s/it] {'loss': 0.5688, 'grad_norm': 5.809866223556941, 'learning_rate': 3.225613363588084e-06, 'epoch': 0.42} 42%|████▏ | 5222/12313 [3:54:39<5:07:36, 2.60s/it] 42%|████▏ | 5223/12313 [3:54:41<5:07:04, 2.60s/it] {'loss': 0.5421, 'grad_norm': 4.3592044879636465, 'learning_rate': 3.2249840254606474e-06, 'epoch': 0.42} 42%|████▏ | 5223/12313 [3:54:41<5:07:04, 2.60s/it] 42%|████▏ | 5224/12313 [3:54:44<5:20:52, 2.72s/it] {'loss': 0.5438, 'grad_norm': 2.8417643428931147, 'learning_rate': 3.2243546371681535e-06, 'epoch': 0.42} 42%|████▏ | 5224/12313 [3:54:44<5:20:52, 2.72s/it] 42%|████▏ | 5225/12313 [3:54:47<5:25:09, 2.75s/it] {'loss': 0.669, 'grad_norm': 4.321132502798912, 'learning_rate': 3.2237251987541535e-06, 'epoch': 0.42} 42%|████▏ | 5225/12313 [3:54:47<5:25:09, 2.75s/it] 42%|████▏ | 5226/12313 [3:54:50<5:34:50, 2.83s/it] {'loss': 0.5083, 'grad_norm': 17.616950980213982, 'learning_rate': 3.223095710262199e-06, 'epoch': 0.42} 42%|████▏ | 5226/12313 [3:54:50<5:34:50, 2.83s/it] 42%|████▏ | 5227/12313 [3:54:53<5:31:15, 2.80s/it] {'loss': 0.5919, 'grad_norm': 6.679913112655923, 'learning_rate': 3.2224661717358474e-06, 'epoch': 0.42} 42%|████▏ | 5227/12313 [3:54:53<5:31:15, 2.80s/it] 42%|████▏ | 5228/12313 [3:54:56<5:28:45, 2.78s/it] {'loss': 0.376, 'grad_norm': 4.47571261524808, 'learning_rate': 3.221836583218662e-06, 'epoch': 0.42} 42%|████▏ | 5228/12313 [3:54:56<5:28:45, 2.78s/it] 42%|████▏ | 5229/12313 [3:54:59<5:32:21, 2.82s/it] {'loss': 0.4275, 'grad_norm': 8.501783709365114, 'learning_rate': 3.221206944754205e-06, 'epoch': 0.42} 42%|████▏ | 5229/12313 [3:54:59<5:32:21, 2.82s/it] 42%|████▏ | 5230/12313 [3:55:01<5:23:18, 2.74s/it] {'loss': 0.6864, 'grad_norm': 4.892245082709042, 'learning_rate': 3.220577256386043e-06, 'epoch': 0.42} 42%|████▏ | 5230/12313 [3:55:01<5:23:18, 2.74s/it] 42%|████▏ | 5231/12313 [3:55:04<5:18:09, 2.70s/it] {'loss': 0.5241, 'grad_norm': 5.054086864480611, 'learning_rate': 3.21994751815775e-06, 'epoch': 0.42} 42%|████▏ | 5231/12313 [3:55:04<5:18:09, 2.70s/it] 42%|████▏ | 5232/12313 [3:55:07<5:19:28, 2.71s/it] {'loss': 0.5298, 'grad_norm': 3.2768063978983095, 'learning_rate': 3.2193177301128985e-06, 'epoch': 0.42} 42%|████▏ | 5232/12313 [3:55:07<5:19:28, 2.71s/it] 42%|████▏ | 5233/12313 [3:55:10<5:28:23, 2.78s/it] {'loss': 0.6465, 'grad_norm': 4.649050909544704, 'learning_rate': 3.2186878922950672e-06, 'epoch': 0.42} 42%|████▏ | 5233/12313 [3:55:10<5:28:23, 2.78s/it] 43%|████▎ | 5234/12313 [3:55:12<5:24:49, 2.75s/it] {'loss': 0.5813, 'grad_norm': 4.194611280947634, 'learning_rate': 3.218058004747837e-06, 'epoch': 0.43} 43%|████▎ | 5234/12313 [3:55:12<5:24:49, 2.75s/it] 43%|████▎ | 5235/12313 [3:55:15<5:19:27, 2.71s/it] {'loss': 0.444, 'grad_norm': 12.035010001320948, 'learning_rate': 3.2174280675147933e-06, 'epoch': 0.43} 43%|████▎ | 5235/12313 [3:55:15<5:19:27, 2.71s/it] 43%|████▎ | 5236/12313 [3:55:18<5:23:34, 2.74s/it] {'loss': 0.4574, 'grad_norm': 3.9279173401772285, 'learning_rate': 3.2167980806395244e-06, 'epoch': 0.43} 43%|████▎ | 5236/12313 [3:55:18<5:23:34, 2.74s/it] 43%|████▎ | 5237/12313 [3:55:20<5:13:51, 2.66s/it] {'loss': 0.4997, 'grad_norm': 3.760359692561372, 'learning_rate': 3.216168044165622e-06, 'epoch': 0.43} 43%|████▎ | 5237/12313 [3:55:20<5:13:51, 2.66s/it] 43%|████▎ | 5238/12313 [3:55:23<5:05:45, 2.59s/it] {'loss': 0.5207, 'grad_norm': 6.915480718748514, 'learning_rate': 3.215537958136681e-06, 'epoch': 0.43} 43%|████▎ | 5238/12313 [3:55:23<5:05:45, 2.59s/it] 43%|████▎ | 5239/12313 [3:55:25<5:03:21, 2.57s/it] {'loss': 0.4996, 'grad_norm': 9.393926456415134, 'learning_rate': 3.2149078225963e-06, 'epoch': 0.43} 43%|████▎ | 5239/12313 [3:55:25<5:03:21, 2.57s/it] 43%|████▎ | 5240/12313 [3:55:27<4:57:10, 2.52s/it] {'loss': 0.6179, 'grad_norm': 5.607193609498181, 'learning_rate': 3.2142776375880814e-06, 'epoch': 0.43} 43%|████▎ | 5240/12313 [3:55:27<4:57:10, 2.52s/it] 43%|████▎ | 5241/12313 [3:55:30<4:55:01, 2.50s/it] {'loss': 0.4407, 'grad_norm': 7.777688950913407, 'learning_rate': 3.213647403155631e-06, 'epoch': 0.43} 43%|████▎ | 5241/12313 [3:55:30<4:55:01, 2.50s/it] 43%|████▎ | 5242/12313 [3:55:33<4:58:28, 2.53s/it] {'loss': 0.4342, 'grad_norm': 5.406198429944911, 'learning_rate': 3.213017119342557e-06, 'epoch': 0.43} 43%|████▎ | 5242/12313 [3:55:33<4:58:28, 2.53s/it] 43%|████▎ | 5243/12313 [3:55:35<5:05:42, 2.59s/it] {'loss': 0.547, 'grad_norm': 4.332500644815715, 'learning_rate': 3.2123867861924705e-06, 'epoch': 0.43} 43%|████▎ | 5243/12313 [3:55:35<5:05:42, 2.59s/it] 43%|████▎ | 5244/12313 [3:55:38<5:14:47, 2.67s/it] {'loss': 0.6803, 'grad_norm': 4.158064015085327, 'learning_rate': 3.211756403748991e-06, 'epoch': 0.43} 43%|████▎ | 5244/12313 [3:55:38<5:14:47, 2.67s/it] 43%|████▎ | 5245/12313 [3:55:41<5:15:54, 2.68s/it] {'loss': 0.3712, 'grad_norm': 4.670204213436716, 'learning_rate': 3.211125972055734e-06, 'epoch': 0.43} 43%|████▎ | 5245/12313 [3:55:41<5:15:54, 2.68s/it] 43%|████▎ | 5246/12313 [3:55:44<5:27:18, 2.78s/it] {'loss': 0.6592, 'grad_norm': 3.1072025340370835, 'learning_rate': 3.210495491156323e-06, 'epoch': 0.43} 43%|████▎ | 5246/12313 [3:55:44<5:27:18, 2.78s/it] 43%|████▎ | 5247/12313 [3:55:47<5:23:46, 2.75s/it] {'loss': 0.3973, 'grad_norm': 3.845276286103103, 'learning_rate': 3.2098649610943855e-06, 'epoch': 0.43} 43%|████▎ | 5247/12313 [3:55:47<5:23:46, 2.75s/it] 43%|████▎ | 5248/12313 [3:55:49<5:21:12, 2.73s/it] {'loss': 0.4361, 'grad_norm': 5.263686826495678, 'learning_rate': 3.2092343819135485e-06, 'epoch': 0.43} 43%|████▎ | 5248/12313 [3:55:49<5:21:12, 2.73s/it] 43%|████▎ | 5249/12313 [3:55:52<5:32:20, 2.82s/it] {'loss': 0.5642, 'grad_norm': 4.706614514686621, 'learning_rate': 3.2086037536574467e-06, 'epoch': 0.43} 43%|████▎ | 5249/12313 [3:55:52<5:32:20, 2.82s/it] 43%|████▎ | 5250/12313 [3:55:55<5:27:57, 2.79s/it] {'loss': 0.4473, 'grad_norm': 7.9955278197924455, 'learning_rate': 3.207973076369715e-06, 'epoch': 0.43} 43%|████▎ | 5250/12313 [3:55:55<5:27:57, 2.79s/it] 43%|████▎ | 5251/12313 [3:55:58<5:21:47, 2.73s/it] {'loss': 0.4853, 'grad_norm': 5.6591134035265505, 'learning_rate': 3.2073423500939926e-06, 'epoch': 0.43} 43%|████▎ | 5251/12313 [3:55:58<5:21:47, 2.73s/it] 43%|████▎ | 5252/12313 [3:56:00<5:18:44, 2.71s/it] {'loss': 0.462, 'grad_norm': 8.612516192884062, 'learning_rate': 3.206711574873924e-06, 'epoch': 0.43} 43%|████▎ | 5252/12313 [3:56:00<5:18:44, 2.71s/it] 43%|████▎ | 5253/12313 [3:56:03<5:19:53, 2.72s/it] {'loss': 0.5559, 'grad_norm': 5.102810963488392, 'learning_rate': 3.2060807507531545e-06, 'epoch': 0.43} 43%|████▎ | 5253/12313 [3:56:03<5:19:53, 2.72s/it] 43%|████▎ | 5254/12313 [3:56:06<5:18:39, 2.71s/it] {'loss': 0.5183, 'grad_norm': 7.341129074246249, 'learning_rate': 3.2054498777753335e-06, 'epoch': 0.43} 43%|████▎ | 5254/12313 [3:56:06<5:18:39, 2.71s/it] 43%|████▎ | 5255/12313 [3:56:08<5:16:24, 2.69s/it] {'loss': 0.4557, 'grad_norm': 5.00922975998508, 'learning_rate': 3.204818955984115e-06, 'epoch': 0.43} 43%|████▎ | 5255/12313 [3:56:08<5:16:24, 2.69s/it] 43%|████▎ | 5256/12313 [3:56:11<5:10:58, 2.64s/it] {'loss': 0.6202, 'grad_norm': 3.0225137055666087, 'learning_rate': 3.2041879854231545e-06, 'epoch': 0.43} 43%|████▎ | 5256/12313 [3:56:11<5:10:58, 2.64s/it] 43%|████▎ | 5257/12313 [3:56:13<5:07:26, 2.61s/it] {'loss': 0.5674, 'grad_norm': 5.862930940720101, 'learning_rate': 3.203556966136113e-06, 'epoch': 0.43} 43%|████▎ | 5257/12313 [3:56:13<5:07:26, 2.61s/it] 43%|████▎ | 5258/12313 [3:56:16<5:02:20, 2.57s/it] {'loss': 0.4603, 'grad_norm': 8.760401643981206, 'learning_rate': 3.202925898166652e-06, 'epoch': 0.43} 43%|████▎ | 5258/12313 [3:56:16<5:02:20, 2.57s/it] 43%|████▎ | 5259/12313 [3:56:19<5:06:06, 2.60s/it] {'loss': 0.4481, 'grad_norm': 7.198009661677629, 'learning_rate': 3.2022947815584393e-06, 'epoch': 0.43} 43%|████▎ | 5259/12313 [3:56:19<5:06:06, 2.60s/it] 43%|████▎ | 5260/12313 [3:56:21<5:11:58, 2.65s/it] {'loss': 0.4721, 'grad_norm': 4.513288458419195, 'learning_rate': 3.2016636163551456e-06, 'epoch': 0.43} 43%|████▎ | 5260/12313 [3:56:21<5:11:58, 2.65s/it] 43%|████▎ | 5261/12313 [3:56:24<5:30:06, 2.81s/it] {'loss': 0.5153, 'grad_norm': 4.556711076123282, 'learning_rate': 3.2010324026004425e-06, 'epoch': 0.43} 43%|████▎ | 5261/12313 [3:56:24<5:30:06, 2.81s/it] 43%|████▎ | 5262/12313 [3:56:27<5:23:57, 2.76s/it] {'loss': 0.5405, 'grad_norm': 4.403874321167366, 'learning_rate': 3.200401140338007e-06, 'epoch': 0.43} 43%|████▎ | 5262/12313 [3:56:27<5:23:57, 2.76s/it] 43%|████▎ | 5263/12313 [3:56:30<5:13:47, 2.67s/it] {'loss': 0.5812, 'grad_norm': 15.464524857508628, 'learning_rate': 3.1997698296115192e-06, 'epoch': 0.43} 43%|████▎ | 5263/12313 [3:56:30<5:13:47, 2.67s/it] 43%|████▎ | 5264/12313 [3:56:32<5:04:33, 2.59s/it] {'loss': 0.4366, 'grad_norm': 6.763078104210325, 'learning_rate': 3.1991384704646632e-06, 'epoch': 0.43} 43%|████▎ | 5264/12313 [3:56:32<5:04:33, 2.59s/it] 43%|████▎ | 5265/12313 [3:56:35<5:04:37, 2.59s/it] {'loss': 0.7049, 'grad_norm': 3.8083736623019377, 'learning_rate': 3.198507062941125e-06, 'epoch': 0.43} 43%|████▎ | 5265/12313 [3:56:35<5:04:37, 2.59s/it] 43%|████▎ | 5266/12313 [3:56:37<5:08:42, 2.63s/it] {'loss': 0.5536, 'grad_norm': 4.411432581791518, 'learning_rate': 3.197875607084595e-06, 'epoch': 0.43} 43%|████▎ | 5266/12313 [3:56:37<5:08:42, 2.63s/it] 43%|████▎ | 5267/12313 [3:56:40<5:09:55, 2.64s/it] {'loss': 0.7173, 'grad_norm': 5.209356808212503, 'learning_rate': 3.1972441029387664e-06, 'epoch': 0.43} 43%|████▎ | 5267/12313 [3:56:40<5:09:55, 2.64s/it] 43%|████▎ | 5268/12313 [3:56:43<5:12:13, 2.66s/it] {'loss': 0.5126, 'grad_norm': 4.422812878298527, 'learning_rate': 3.196612550547336e-06, 'epoch': 0.43} 43%|████▎ | 5268/12313 [3:56:43<5:12:13, 2.66s/it] 43%|████▎ | 5269/12313 [3:56:45<5:17:03, 2.70s/it] {'loss': 0.4163, 'grad_norm': 4.063375030795528, 'learning_rate': 3.1959809499540033e-06, 'epoch': 0.43} 43%|████▎ | 5269/12313 [3:56:45<5:17:03, 2.70s/it] 43%|████▎ | 5270/12313 [3:56:48<5:25:25, 2.77s/it] {'loss': 0.46, 'grad_norm': 6.439580078200473, 'learning_rate': 3.1953493012024728e-06, 'epoch': 0.43} 43%|████▎ | 5270/12313 [3:56:48<5:25:25, 2.77s/it] 43%|████▎ | 5271/12313 [3:56:51<5:17:39, 2.71s/it] {'loss': 0.5559, 'grad_norm': 6.644696648989759, 'learning_rate': 3.1947176043364512e-06, 'epoch': 0.43} 43%|████▎ | 5271/12313 [3:56:51<5:17:39, 2.71s/it] 43%|████▎ | 5272/12313 [3:56:54<5:14:36, 2.68s/it] {'loss': 0.4542, 'grad_norm': 9.625730715608832, 'learning_rate': 3.194085859399647e-06, 'epoch': 0.43} 43%|████▎ | 5272/12313 [3:56:54<5:14:36, 2.68s/it] 43%|████▎ | 5273/12313 [3:56:56<5:15:45, 2.69s/it] {'loss': 0.3875, 'grad_norm': 6.80723990351529, 'learning_rate': 3.1934540664357756e-06, 'epoch': 0.43} 43%|████▎ | 5273/12313 [3:56:56<5:15:45, 2.69s/it] 43%|████▎ | 5274/12313 [3:56:59<5:33:05, 2.84s/it] {'loss': 0.4263, 'grad_norm': 4.063505294236836, 'learning_rate': 3.1928222254885527e-06, 'epoch': 0.43} 43%|████▎ | 5274/12313 [3:56:59<5:33:05, 2.84s/it] 43%|████▎ | 5275/12313 [3:57:02<5:19:59, 2.73s/it] {'loss': 0.601, 'grad_norm': 5.254426921833707, 'learning_rate': 3.192190336601698e-06, 'epoch': 0.43} 43%|████▎ | 5275/12313 [3:57:02<5:19:59, 2.73s/it] 43%|████▎ | 5276/12313 [3:57:05<5:15:02, 2.69s/it] {'loss': 0.8393, 'grad_norm': 3.3439818153250274, 'learning_rate': 3.1915583998189365e-06, 'epoch': 0.43} 43%|████▎ | 5276/12313 [3:57:05<5:15:02, 2.69s/it] 43%|████▎ | 5277/12313 [3:57:07<5:10:18, 2.65s/it] {'loss': 0.4905, 'grad_norm': 4.423002542739268, 'learning_rate': 3.190926415183993e-06, 'epoch': 0.43} 43%|████▎ | 5277/12313 [3:57:07<5:10:18, 2.65s/it] 43%|████▎ | 5278/12313 [3:57:10<5:12:38, 2.67s/it] {'loss': 0.4598, 'grad_norm': 4.969141698478439, 'learning_rate': 3.190294382740598e-06, 'epoch': 0.43} 43%|████▎ | 5278/12313 [3:57:10<5:12:38, 2.67s/it] 43%|████▎ | 5279/12313 [3:57:13<5:14:44, 2.68s/it] {'loss': 0.5027, 'grad_norm': 5.329263080541811, 'learning_rate': 3.189662302532486e-06, 'epoch': 0.43} 43%|████▎ | 5279/12313 [3:57:13<5:14:44, 2.68s/it] 43%|████▎ | 5280/12313 [3:57:15<5:18:45, 2.72s/it] {'loss': 0.4451, 'grad_norm': 8.465008482689347, 'learning_rate': 3.1890301746033914e-06, 'epoch': 0.43} 43%|████▎ | 5280/12313 [3:57:15<5:18:45, 2.72s/it] 43%|████▎ | 5281/12313 [3:57:18<5:16:27, 2.70s/it] {'loss': 0.5379, 'grad_norm': 4.288959841432594, 'learning_rate': 3.188397998997056e-06, 'epoch': 0.43} 43%|████▎ | 5281/12313 [3:57:18<5:16:27, 2.70s/it] 43%|████▎ | 5282/12313 [3:57:21<5:20:33, 2.74s/it] {'loss': 0.3864, 'grad_norm': 4.307272162236529, 'learning_rate': 3.1877657757572223e-06, 'epoch': 0.43} 43%|████▎ | 5282/12313 [3:57:21<5:20:33, 2.74s/it] 43%|████▎ | 5283/12313 [3:57:23<5:12:51, 2.67s/it] {'loss': 0.5744, 'grad_norm': 4.853723061541425, 'learning_rate': 3.187133504927637e-06, 'epoch': 0.43} 43%|████▎ | 5283/12313 [3:57:23<5:12:51, 2.67s/it] 43%|████▎ | 5284/12313 [3:57:26<5:21:04, 2.74s/it] {'loss': 0.4017, 'grad_norm': 9.888573881930817, 'learning_rate': 3.18650118655205e-06, 'epoch': 0.43} 43%|████▎ | 5284/12313 [3:57:26<5:21:04, 2.74s/it] 43%|████▎ | 5285/12313 [3:57:29<5:17:44, 2.71s/it] {'loss': 0.4613, 'grad_norm': 6.195263958719191, 'learning_rate': 3.1858688206742135e-06, 'epoch': 0.43} 43%|████▎ | 5285/12313 [3:57:29<5:17:44, 2.71s/it] 43%|████▎ | 5286/12313 [3:57:31<5:09:44, 2.64s/it] {'loss': 0.5304, 'grad_norm': 5.993567524504076, 'learning_rate': 3.1852364073378845e-06, 'epoch': 0.43} 43%|████▎ | 5286/12313 [3:57:31<5:09:44, 2.64s/it] 43%|████▎ | 5287/12313 [3:57:34<5:07:08, 2.62s/it] {'loss': 0.4809, 'grad_norm': 6.246122494535498, 'learning_rate': 3.1846039465868233e-06, 'epoch': 0.43} 43%|████▎ | 5287/12313 [3:57:34<5:07:08, 2.62s/it] 43%|████▎ | 5288/12313 [3:57:37<5:10:57, 2.66s/it] {'loss': 0.416, 'grad_norm': 11.344295485158227, 'learning_rate': 3.1839714384647914e-06, 'epoch': 0.43} 43%|████▎ | 5288/12313 [3:57:37<5:10:57, 2.66s/it] 43%|████▎ | 5289/12313 [3:57:40<5:18:28, 2.72s/it] {'loss': 0.5286, 'grad_norm': 6.789926972632632, 'learning_rate': 3.1833388830155564e-06, 'epoch': 0.43} 43%|████▎ | 5289/12313 [3:57:40<5:18:28, 2.72s/it] 43%|████▎ | 5290/12313 [3:57:42<5:18:04, 2.72s/it] {'loss': 0.6368, 'grad_norm': 4.731313984743315, 'learning_rate': 3.1827062802828878e-06, 'epoch': 0.43} 43%|████▎ | 5290/12313 [3:57:42<5:18:04, 2.72s/it] 43%|████▎ | 5291/12313 [3:57:45<5:10:02, 2.65s/it] {'loss': 0.4566, 'grad_norm': 8.422362176653303, 'learning_rate': 3.182073630310557e-06, 'epoch': 0.43} 43%|████▎ | 5291/12313 [3:57:45<5:10:02, 2.65s/it] 43%|████▎ | 5292/12313 [3:57:47<5:11:07, 2.66s/it] {'loss': 0.7081, 'grad_norm': 5.811200383485644, 'learning_rate': 3.18144093314234e-06, 'epoch': 0.43} 43%|████▎ | 5292/12313 [3:57:47<5:11:07, 2.66s/it] 43%|████▎ | 5293/12313 [3:57:50<5:09:27, 2.64s/it] {'loss': 0.6889, 'grad_norm': 3.464910361407899, 'learning_rate': 3.180808188822019e-06, 'epoch': 0.43} 43%|████▎ | 5293/12313 [3:57:50<5:09:27, 2.64s/it] 43%|████▎ | 5294/12313 [3:57:53<5:07:32, 2.63s/it] {'loss': 0.6863, 'grad_norm': 3.8106177663295115, 'learning_rate': 3.180175397393373e-06, 'epoch': 0.43} 43%|████▎ | 5294/12313 [3:57:53<5:07:32, 2.63s/it] 43%|████▎ | 5295/12313 [3:57:55<5:09:07, 2.64s/it] {'loss': 0.5488, 'grad_norm': 8.071444985141785, 'learning_rate': 3.1795425589001896e-06, 'epoch': 0.43} 43%|████▎ | 5295/12313 [3:57:55<5:09:07, 2.64s/it] 43%|████▎ | 5296/12313 [3:57:58<5:08:31, 2.64s/it] {'loss': 0.5592, 'grad_norm': 4.430290323012749, 'learning_rate': 3.178909673386257e-06, 'epoch': 0.43} 43%|████▎ | 5296/12313 [3:57:58<5:08:31, 2.64s/it] 43%|████▎ | 5297/12313 [3:58:01<5:12:49, 2.68s/it] {'loss': 0.5002, 'grad_norm': 4.5330502482145505, 'learning_rate': 3.178276740895369e-06, 'epoch': 0.43} 43%|████▎ | 5297/12313 [3:58:01<5:12:49, 2.68s/it] 43%|████▎ | 5298/12313 [3:58:03<5:15:29, 2.70s/it] {'loss': 0.547, 'grad_norm': 7.279026055668376, 'learning_rate': 3.1776437614713197e-06, 'epoch': 0.43} 43%|████▎ | 5298/12313 [3:58:03<5:15:29, 2.70s/it] 43%|████▎ | 5299/12313 [3:58:06<5:07:17, 2.63s/it] {'loss': 0.6157, 'grad_norm': 5.6210852302334855, 'learning_rate': 3.177010735157909e-06, 'epoch': 0.43} 43%|████▎ | 5299/12313 [3:58:06<5:07:17, 2.63s/it] 43%|████▎ | 5300/12313 [3:58:09<5:20:04, 2.74s/it] {'loss': 0.5323, 'grad_norm': 6.112959552581171, 'learning_rate': 3.1763776619989377e-06, 'epoch': 0.43} 43%|████▎ | 5300/12313 [3:58:09<5:20:04, 2.74s/it] 43%|████▎ | 5301/12313 [3:58:11<5:08:51, 2.64s/it] {'loss': 0.525, 'grad_norm': 4.4197983016582745, 'learning_rate': 3.175744542038212e-06, 'epoch': 0.43} 43%|████▎ | 5301/12313 [3:58:11<5:08:51, 2.64s/it] 43%|████▎ | 5302/12313 [3:58:14<5:04:34, 2.61s/it] {'loss': 0.5436, 'grad_norm': 5.4564596091826, 'learning_rate': 3.175111375319541e-06, 'epoch': 0.43} 43%|████▎ | 5302/12313 [3:58:14<5:04:34, 2.61s/it] 43%|████▎ | 5303/12313 [3:58:16<5:03:48, 2.60s/it] {'loss': 0.4957, 'grad_norm': 4.9689391465007615, 'learning_rate': 3.174478161886736e-06, 'epoch': 0.43} 43%|████▎ | 5303/12313 [3:58:16<5:03:48, 2.60s/it] 43%|████▎ | 5304/12313 [3:58:19<5:00:58, 2.58s/it] {'loss': 0.4726, 'grad_norm': 9.04638437397121, 'learning_rate': 3.1738449017836102e-06, 'epoch': 0.43} 43%|████▎ | 5304/12313 [3:58:19<5:00:58, 2.58s/it] 43%|████▎ | 5305/12313 [3:58:21<4:57:31, 2.55s/it] {'loss': 0.4154, 'grad_norm': 6.518271460980143, 'learning_rate': 3.173211595053985e-06, 'epoch': 0.43} 43%|████▎ | 5305/12313 [3:58:21<4:57:31, 2.55s/it] 43%|████▎ | 5306/12313 [3:58:24<5:00:38, 2.57s/it] {'loss': 0.6605, 'grad_norm': 9.139018032286032, 'learning_rate': 3.17257824174168e-06, 'epoch': 0.43} 43%|████▎ | 5306/12313 [3:58:24<5:00:38, 2.57s/it] 43%|████▎ | 5307/12313 [3:58:27<5:03:30, 2.60s/it] {'loss': 0.5479, 'grad_norm': 5.893703080064373, 'learning_rate': 3.17194484189052e-06, 'epoch': 0.43} 43%|████▎ | 5307/12313 [3:58:27<5:03:30, 2.60s/it] 43%|████▎ | 5308/12313 [3:58:29<5:06:18, 2.62s/it] {'loss': 0.5071, 'grad_norm': 6.561881133428007, 'learning_rate': 3.171311395544333e-06, 'epoch': 0.43} 43%|████▎ | 5308/12313 [3:58:29<5:06:18, 2.62s/it] 43%|████▎ | 5309/12313 [3:58:32<5:09:17, 2.65s/it] {'loss': 0.4559, 'grad_norm': 5.394616296616023, 'learning_rate': 3.170677902746951e-06, 'epoch': 0.43} 43%|████▎ | 5309/12313 [3:58:32<5:09:17, 2.65s/it] 43%|████▎ | 5310/12313 [3:58:34<4:59:55, 2.57s/it] {'loss': 0.4429, 'grad_norm': 6.001462406111083, 'learning_rate': 3.170044363542207e-06, 'epoch': 0.43} 43%|████▎ | 5310/12313 [3:58:34<4:59:55, 2.57s/it] 43%|████▎ | 5311/12313 [3:58:37<5:04:21, 2.61s/it] {'loss': 0.4939, 'grad_norm': 5.400528463179634, 'learning_rate': 3.1694107779739394e-06, 'epoch': 0.43} 43%|████▎ | 5311/12313 [3:58:37<5:04:21, 2.61s/it] 43%|████▎ | 5312/12313 [3:58:40<5:20:55, 2.75s/it] {'loss': 0.5207, 'grad_norm': 3.4971169628242347, 'learning_rate': 3.1687771460859886e-06, 'epoch': 0.43} 43%|████▎ | 5312/12313 [3:58:40<5:20:55, 2.75s/it] 43%|████▎ | 5313/12313 [3:58:43<5:15:56, 2.71s/it] {'loss': 0.537, 'grad_norm': 5.801443009456409, 'learning_rate': 3.168143467922199e-06, 'epoch': 0.43} 43%|████▎ | 5313/12313 [3:58:43<5:15:56, 2.71s/it] 43%|████▎ | 5314/12313 [3:58:45<5:04:56, 2.61s/it] {'loss': 0.6002, 'grad_norm': 3.877634512463603, 'learning_rate': 3.1675097435264175e-06, 'epoch': 0.43} 43%|████▎ | 5314/12313 [3:58:45<5:04:56, 2.61s/it] 43%|████▎ | 5315/12313 [3:58:48<5:08:42, 2.65s/it] {'loss': 0.5651, 'grad_norm': 4.436235293442774, 'learning_rate': 3.166875972942494e-06, 'epoch': 0.43} 43%|████▎ | 5315/12313 [3:58:48<5:08:42, 2.65s/it] 43%|████▎ | 5316/12313 [3:58:51<5:16:33, 2.71s/it] {'loss': 0.5255, 'grad_norm': 3.8720642376323857, 'learning_rate': 3.166242156214283e-06, 'epoch': 0.43} 43%|████▎ | 5316/12313 [3:58:51<5:16:33, 2.71s/it] 43%|████▎ | 5317/12313 [3:58:53<5:11:38, 2.67s/it] {'loss': 0.4563, 'grad_norm': 5.096619815050037, 'learning_rate': 3.1656082933856415e-06, 'epoch': 0.43} 43%|████▎ | 5317/12313 [3:58:53<5:11:38, 2.67s/it] 43%|████▎ | 5318/12313 [3:58:56<5:10:38, 2.66s/it] {'loss': 0.4742, 'grad_norm': 4.129271114889403, 'learning_rate': 3.1649743845004275e-06, 'epoch': 0.43} 43%|████▎ | 5318/12313 [3:58:56<5:10:38, 2.66s/it] 43%|████▎ | 5319/12313 [3:58:59<5:06:37, 2.63s/it] {'loss': 0.5313, 'grad_norm': 5.369554351025192, 'learning_rate': 3.164340429602506e-06, 'epoch': 0.43} 43%|████▎ | 5319/12313 [3:58:59<5:06:37, 2.63s/it] 43%|████▎ | 5320/12313 [3:59:02<5:26:37, 2.80s/it] {'loss': 0.427, 'grad_norm': 5.6537665065078615, 'learning_rate': 3.1637064287357433e-06, 'epoch': 0.43} 43%|████▎ | 5320/12313 [3:59:02<5:26:37, 2.80s/it] 43%|████▎ | 5321/12313 [3:59:05<5:21:56, 2.76s/it] {'loss': 0.4343, 'grad_norm': 5.909754479859293, 'learning_rate': 3.1630723819440075e-06, 'epoch': 0.43} 43%|████▎ | 5321/12313 [3:59:05<5:21:56, 2.76s/it] 43%|████▎ | 5322/12313 [3:59:07<5:18:30, 2.73s/it] {'loss': 0.4722, 'grad_norm': 9.806257811882773, 'learning_rate': 3.1624382892711724e-06, 'epoch': 0.43} 43%|████▎ | 5322/12313 [3:59:07<5:18:30, 2.73s/it] 43%|████▎ | 5323/12313 [3:59:10<5:14:02, 2.70s/it] {'loss': 0.5959, 'grad_norm': 9.73741516646952, 'learning_rate': 3.161804150761114e-06, 'epoch': 0.43} 43%|████▎ | 5323/12313 [3:59:10<5:14:02, 2.70s/it] 43%|████▎ | 5324/12313 [3:59:13<5:23:48, 2.78s/it] {'loss': 0.5979, 'grad_norm': 4.940223216087869, 'learning_rate': 3.16116996645771e-06, 'epoch': 0.43} 43%|████▎ | 5324/12313 [3:59:13<5:23:48, 2.78s/it] 43%|████▎ | 5325/12313 [3:59:15<5:19:22, 2.74s/it] {'loss': 0.4802, 'grad_norm': 16.649964044548284, 'learning_rate': 3.1605357364048446e-06, 'epoch': 0.43} 43%|████▎ | 5325/12313 [3:59:15<5:19:22, 2.74s/it] 43%|████▎ | 5326/12313 [3:59:18<5:17:29, 2.73s/it] {'loss': 0.5291, 'grad_norm': 7.719840433898037, 'learning_rate': 3.159901460646401e-06, 'epoch': 0.43} 43%|████▎ | 5326/12313 [3:59:18<5:17:29, 2.73s/it] 43%|████▎ | 5327/12313 [3:59:21<5:15:40, 2.71s/it] {'loss': 0.4709, 'grad_norm': 5.612098859952399, 'learning_rate': 3.15926713922627e-06, 'epoch': 0.43} 43%|████▎ | 5327/12313 [3:59:21<5:15:40, 2.71s/it] 43%|████▎ | 5328/12313 [3:59:23<5:10:13, 2.66s/it] {'loss': 0.4699, 'grad_norm': 15.165120351544296, 'learning_rate': 3.1586327721883416e-06, 'epoch': 0.43} 43%|████▎ | 5328/12313 [3:59:23<5:10:13, 2.66s/it] 43%|████▎ | 5329/12313 [3:59:26<5:08:24, 2.65s/it] {'loss': 0.5246, 'grad_norm': 7.373093805345169, 'learning_rate': 3.1579983595765107e-06, 'epoch': 0.43} 43%|████▎ | 5329/12313 [3:59:26<5:08:24, 2.65s/it] 43%|████▎ | 5330/12313 [3:59:29<5:12:48, 2.69s/it] {'loss': 0.413, 'grad_norm': 4.232276270918911, 'learning_rate': 3.1573639014346756e-06, 'epoch': 0.43} 43%|████▎ | 5330/12313 [3:59:29<5:12:48, 2.69s/it] 43%|████▎ | 5331/12313 [3:59:32<5:16:05, 2.72s/it] {'loss': 0.4798, 'grad_norm': 5.475928131532231, 'learning_rate': 3.1567293978067383e-06, 'epoch': 0.43} 43%|████▎ | 5331/12313 [3:59:32<5:16:05, 2.72s/it] 43%|████▎ | 5332/12313 [3:59:34<5:14:25, 2.70s/it] {'loss': 0.5221, 'grad_norm': 5.319170320864578, 'learning_rate': 3.1560948487366016e-06, 'epoch': 0.43} 43%|████▎ | 5332/12313 [3:59:34<5:14:25, 2.70s/it] 43%|████▎ | 5333/12313 [3:59:37<5:09:56, 2.66s/it] {'loss': 0.5395, 'grad_norm': 4.7595654317237654, 'learning_rate': 3.1554602542681746e-06, 'epoch': 0.43} 43%|████▎ | 5333/12313 [3:59:37<5:09:56, 2.66s/it] 43%|████▎ | 5334/12313 [3:59:39<5:11:34, 2.68s/it] {'loss': 0.5755, 'grad_norm': 5.163449251799242, 'learning_rate': 3.154825614445366e-06, 'epoch': 0.43} 43%|████▎ | 5334/12313 [3:59:39<5:11:34, 2.68s/it] 43%|████▎ | 5335/12313 [3:59:42<5:03:21, 2.61s/it] {'loss': 0.4156, 'grad_norm': 4.2774589507058085, 'learning_rate': 3.154190929312091e-06, 'epoch': 0.43} 43%|████▎ | 5335/12313 [3:59:42<5:03:21, 2.61s/it] 43%|████▎ | 5336/12313 [3:59:44<4:57:24, 2.56s/it] {'loss': 0.3532, 'grad_norm': 6.636509210502926, 'learning_rate': 3.1535561989122667e-06, 'epoch': 0.43} 43%|████▎ | 5336/12313 [3:59:44<4:57:24, 2.56s/it] 43%|████▎ | 5337/12313 [3:59:47<5:04:43, 2.62s/it] {'loss': 0.5302, 'grad_norm': 4.679244393970052, 'learning_rate': 3.152921423289811e-06, 'epoch': 0.43} 43%|████▎ | 5337/12313 [3:59:47<5:04:43, 2.62s/it] 43%|████▎ | 5338/12313 [3:59:50<5:01:45, 2.60s/it] {'loss': 0.5156, 'grad_norm': 4.600315390689705, 'learning_rate': 3.1522866024886497e-06, 'epoch': 0.43} 43%|████▎ | 5338/12313 [3:59:50<5:01:45, 2.60s/it] 43%|████▎ | 5339/12313 [3:59:52<4:59:04, 2.57s/it] {'loss': 0.6254, 'grad_norm': 7.387419218755155, 'learning_rate': 3.1516517365527064e-06, 'epoch': 0.43} 43%|████▎ | 5339/12313 [3:59:52<4:59:04, 2.57s/it] 43%|████▎ | 5340/12313 [3:59:55<5:00:32, 2.59s/it] {'loss': 0.4804, 'grad_norm': 8.038125663875935, 'learning_rate': 3.151016825525912e-06, 'epoch': 0.43} 43%|████▎ | 5340/12313 [3:59:55<5:00:32, 2.59s/it] 43%|████▎ | 5341/12313 [3:59:57<5:01:07, 2.59s/it] {'loss': 0.5997, 'grad_norm': 5.664820632074068, 'learning_rate': 3.1503818694521993e-06, 'epoch': 0.43} 43%|████▎ | 5341/12313 [3:59:57<5:01:07, 2.59s/it] 43%|████▎ | 5342/12313 [4:00:00<5:02:37, 2.60s/it] {'loss': 0.5103, 'grad_norm': 5.420884590250101, 'learning_rate': 3.1497468683755027e-06, 'epoch': 0.43} 43%|████▎ | 5342/12313 [4:00:00<5:02:37, 2.60s/it] 43%|████▎ | 5343/12313 [4:00:03<5:02:03, 2.60s/it] {'loss': 0.6567, 'grad_norm': 4.344293142047334, 'learning_rate': 3.1491118223397622e-06, 'epoch': 0.43} 43%|████▎ | 5343/12313 [4:00:03<5:02:03, 2.60s/it] 43%|████▎ | 5344/12313 [4:00:05<5:03:47, 2.62s/it] {'loss': 0.5217, 'grad_norm': 3.3509931709068184, 'learning_rate': 3.1484767313889186e-06, 'epoch': 0.43} 43%|████▎ | 5344/12313 [4:00:05<5:03:47, 2.62s/it] 43%|████▎ | 5345/12313 [4:00:08<5:08:24, 2.66s/it] {'loss': 0.5403, 'grad_norm': 6.46488127239872, 'learning_rate': 3.1478415955669174e-06, 'epoch': 0.43} 43%|████▎ | 5345/12313 [4:00:08<5:08:24, 2.66s/it] 43%|████▎ | 5346/12313 [4:00:10<4:59:03, 2.58s/it] {'loss': 0.613, 'grad_norm': 28.243274098376233, 'learning_rate': 3.1472064149177063e-06, 'epoch': 0.43} 43%|████▎ | 5346/12313 [4:00:10<4:59:03, 2.58s/it] 43%|████▎ | 5347/12313 [4:00:13<4:52:54, 2.52s/it] {'loss': 0.5199, 'grad_norm': 6.961851378871304, 'learning_rate': 3.1465711894852364e-06, 'epoch': 0.43} 43%|████▎ | 5347/12313 [4:00:13<4:52:54, 2.52s/it] 43%|████▎ | 5348/12313 [4:00:16<5:00:22, 2.59s/it] {'loss': 0.6373, 'grad_norm': 4.438081889262291, 'learning_rate': 3.145935919313462e-06, 'epoch': 0.43} 43%|████▎ | 5348/12313 [4:00:16<5:00:22, 2.59s/it] 43%|████▎ | 5349/12313 [4:00:18<4:55:02, 2.54s/it] {'loss': 0.5597, 'grad_norm': 4.323565064610984, 'learning_rate': 3.1453006044463417e-06, 'epoch': 0.43} 43%|████▎ | 5349/12313 [4:00:18<4:55:02, 2.54s/it] 43%|████▎ | 5350/12313 [4:00:21<4:58:18, 2.57s/it] {'loss': 0.5194, 'grad_norm': 3.0858340228520826, 'learning_rate': 3.144665244927833e-06, 'epoch': 0.43} 43%|████▎ | 5350/12313 [4:00:21<4:58:18, 2.57s/it] 43%|████▎ | 5351/12313 [4:00:23<5:01:52, 2.60s/it] {'loss': 0.7601, 'grad_norm': 4.877755892925608, 'learning_rate': 3.144029840801902e-06, 'epoch': 0.43} 43%|████▎ | 5351/12313 [4:00:23<5:01:52, 2.60s/it] 43%|████▎ | 5352/12313 [4:00:26<5:06:14, 2.64s/it] {'loss': 0.5054, 'grad_norm': 4.213210244527702, 'learning_rate': 3.1433943921125154e-06, 'epoch': 0.43} 43%|████▎ | 5352/12313 [4:00:26<5:06:14, 2.64s/it] 43%|████▎ | 5353/12313 [4:00:29<5:02:38, 2.61s/it] {'loss': 0.8653, 'grad_norm': 4.009304731887304, 'learning_rate': 3.1427588989036406e-06, 'epoch': 0.43} 43%|████▎ | 5353/12313 [4:00:29<5:02:38, 2.61s/it] 43%|████▎ | 5354/12313 [4:00:31<5:01:19, 2.60s/it] {'loss': 0.5943, 'grad_norm': 6.671797591935914, 'learning_rate': 3.1421233612192527e-06, 'epoch': 0.43} 43%|████▎ | 5354/12313 [4:00:31<5:01:19, 2.60s/it] 43%|████▎ | 5355/12313 [4:00:34<5:05:18, 2.63s/it] {'loss': 0.5151, 'grad_norm': 6.798919847330098, 'learning_rate': 3.1414877791033267e-06, 'epoch': 0.43} 43%|████▎ | 5355/12313 [4:00:34<5:05:18, 2.63s/it] 43%|████▎ | 5356/12313 [4:00:36<5:02:55, 2.61s/it] {'loss': 0.4387, 'grad_norm': 5.3539572195583975, 'learning_rate': 3.1408521525998403e-06, 'epoch': 0.43} 43%|████▎ | 5356/12313 [4:00:36<5:02:55, 2.61s/it] 44%|████▎ | 5357/12313 [4:00:39<5:05:50, 2.64s/it] {'loss': 0.3866, 'grad_norm': 4.194017130183013, 'learning_rate': 3.1402164817527776e-06, 'epoch': 0.44} 44%|████▎ | 5357/12313 [4:00:39<5:05:50, 2.64s/it] 44%|████▎ | 5358/12313 [4:00:42<5:03:30, 2.62s/it] {'loss': 0.6504, 'grad_norm': 3.6695150698980354, 'learning_rate': 3.1395807666061223e-06, 'epoch': 0.44} 44%|████▎ | 5358/12313 [4:00:42<5:03:30, 2.62s/it] 44%|████▎ | 5359/12313 [4:00:44<5:06:30, 2.64s/it] {'loss': 0.4588, 'grad_norm': 4.641952406086023, 'learning_rate': 3.138945007203863e-06, 'epoch': 0.44} 44%|████▎ | 5359/12313 [4:00:44<5:06:30, 2.64s/it] 44%|████▎ | 5360/12313 [4:00:47<5:03:44, 2.62s/it] {'loss': 0.5878, 'grad_norm': 10.931096401774482, 'learning_rate': 3.1383092035899903e-06, 'epoch': 0.44} 44%|████▎ | 5360/12313 [4:00:47<5:03:44, 2.62s/it] 44%|████▎ | 5361/12313 [4:00:50<5:19:02, 2.75s/it] {'loss': 0.3661, 'grad_norm': 5.762955444790604, 'learning_rate': 3.1376733558084994e-06, 'epoch': 0.44} 44%|████▎ | 5361/12313 [4:00:50<5:19:02, 2.75s/it] 44%|████▎ | 5362/12313 [4:00:53<5:10:15, 2.68s/it] {'loss': 0.5696, 'grad_norm': 4.461787974097396, 'learning_rate': 3.1370374639033876e-06, 'epoch': 0.44} 44%|████▎ | 5362/12313 [4:00:53<5:10:15, 2.68s/it] 44%|████▎ | 5363/12313 [4:00:55<5:03:26, 2.62s/it] {'loss': 0.6584, 'grad_norm': 3.0791414260530683, 'learning_rate': 3.1364015279186537e-06, 'epoch': 0.44} 44%|████▎ | 5363/12313 [4:00:55<5:03:26, 2.62s/it] 44%|████▎ | 5364/12313 [4:00:58<5:03:10, 2.62s/it] {'loss': 0.4157, 'grad_norm': 5.145988507189497, 'learning_rate': 3.1357655478983028e-06, 'epoch': 0.44} 44%|████▎ | 5364/12313 [4:00:58<5:03:10, 2.62s/it] 44%|████▎ | 5365/12313 [4:01:00<5:03:02, 2.62s/it] {'loss': 0.6098, 'grad_norm': 4.936564393357792, 'learning_rate': 3.135129523886341e-06, 'epoch': 0.44} 44%|████▎ | 5365/12313 [4:01:00<5:03:02, 2.62s/it] 44%|████▎ | 5366/12313 [4:01:03<5:02:49, 2.62s/it] {'loss': 0.4316, 'grad_norm': 5.83447257075552, 'learning_rate': 3.1344934559267763e-06, 'epoch': 0.44} 44%|████▎ | 5366/12313 [4:01:03<5:02:49, 2.62s/it] 44%|████▎ | 5367/12313 [4:01:06<5:05:59, 2.64s/it] {'loss': 0.591, 'grad_norm': 4.5475608942051196, 'learning_rate': 3.1338573440636232e-06, 'epoch': 0.44} 44%|████▎ | 5367/12313 [4:01:06<5:05:59, 2.64s/it] 44%|████▎ | 5368/12313 [4:01:08<5:04:44, 2.63s/it] {'loss': 0.5388, 'grad_norm': 6.710109555046167, 'learning_rate': 3.133221188340897e-06, 'epoch': 0.44} 44%|████▎ | 5368/12313 [4:01:08<5:04:44, 2.63s/it] 44%|████▎ | 5369/12313 [4:01:11<5:05:53, 2.64s/it] {'loss': 0.5046, 'grad_norm': 6.706321579158362, 'learning_rate': 3.132584988802615e-06, 'epoch': 0.44} 44%|████▎ | 5369/12313 [4:01:11<5:05:53, 2.64s/it] 44%|████▎ | 5370/12313 [4:01:14<5:08:41, 2.67s/it] {'loss': 0.5206, 'grad_norm': 5.443757710169265, 'learning_rate': 3.1319487454928005e-06, 'epoch': 0.44} 44%|████▎ | 5370/12313 [4:01:14<5:08:41, 2.67s/it] 44%|████▎ | 5371/12313 [4:01:16<5:07:29, 2.66s/it] {'loss': 0.601, 'grad_norm': 43.84951492925826, 'learning_rate': 3.1313124584554772e-06, 'epoch': 0.44} 44%|████▎ | 5371/12313 [4:01:16<5:07:29, 2.66s/it] 44%|████▎ | 5372/12313 [4:01:19<5:16:05, 2.73s/it] {'loss': 0.5649, 'grad_norm': 2.965181185591869, 'learning_rate': 3.130676127734673e-06, 'epoch': 0.44} 44%|████▎ | 5372/12313 [4:01:19<5:16:05, 2.73s/it] 44%|████▎ | 5373/12313 [4:01:22<5:13:24, 2.71s/it] {'loss': 0.5637, 'grad_norm': 4.0688538065516955, 'learning_rate': 3.1300397533744176e-06, 'epoch': 0.44} 44%|████▎ | 5373/12313 [4:01:22<5:13:24, 2.71s/it] 44%|████▎ | 5374/12313 [4:01:24<5:10:24, 2.68s/it] {'loss': 0.6039, 'grad_norm': 4.285947029576996, 'learning_rate': 3.129403335418747e-06, 'epoch': 0.44} 44%|████▎ | 5374/12313 [4:01:24<5:10:24, 2.68s/it] 44%|████▎ | 5375/12313 [4:01:27<5:07:28, 2.66s/it] {'loss': 0.504, 'grad_norm': 9.05418272699839, 'learning_rate': 3.128766873911696e-06, 'epoch': 0.44} 44%|████▎ | 5375/12313 [4:01:27<5:07:28, 2.66s/it] 44%|████▎ | 5376/12313 [4:01:30<5:03:14, 2.62s/it] {'loss': 0.437, 'grad_norm': 7.150334825744675, 'learning_rate': 3.1281303688973054e-06, 'epoch': 0.44} 44%|████▎ | 5376/12313 [4:01:30<5:03:14, 2.62s/it] 44%|████▎ | 5377/12313 [4:01:32<5:09:50, 2.68s/it] {'loss': 0.4809, 'grad_norm': 4.999179068937834, 'learning_rate': 3.127493820419617e-06, 'epoch': 0.44} 44%|████▎ | 5377/12313 [4:01:32<5:09:50, 2.68s/it] 44%|████▎ | 5378/12313 [4:01:35<5:00:43, 2.60s/it] {'loss': 0.4894, 'grad_norm': 4.8473879562154565, 'learning_rate': 3.1268572285226773e-06, 'epoch': 0.44} 44%|████▎ | 5378/12313 [4:01:35<5:00:43, 2.60s/it] 44%|████▎ | 5379/12313 [4:01:37<5:00:22, 2.60s/it] {'loss': 0.4363, 'grad_norm': 6.929506260339618, 'learning_rate': 3.1262205932505353e-06, 'epoch': 0.44} 44%|████▎ | 5379/12313 [4:01:37<5:00:22, 2.60s/it] 44%|████▎ | 5380/12313 [4:01:40<5:15:10, 2.73s/it] {'loss': 0.5205, 'grad_norm': 5.5615771450834846, 'learning_rate': 3.125583914647242e-06, 'epoch': 0.44} 44%|████▎ | 5380/12313 [4:01:40<5:15:10, 2.73s/it] 44%|████▎ | 5381/12313 [4:01:43<5:09:01, 2.67s/it] {'loss': 0.6615, 'grad_norm': 3.3462484960680072, 'learning_rate': 3.124947192756853e-06, 'epoch': 0.44} 44%|████▎ | 5381/12313 [4:01:43<5:09:01, 2.67s/it] 44%|████▎ | 5382/12313 [4:01:45<5:03:36, 2.63s/it] {'loss': 0.4979, 'grad_norm': 6.634956155249173, 'learning_rate': 3.124310427623426e-06, 'epoch': 0.44} 44%|████▎ | 5382/12313 [4:01:45<5:03:36, 2.63s/it] 44%|████▎ | 5383/12313 [4:01:48<4:59:22, 2.59s/it] {'loss': 0.5231, 'grad_norm': 5.3106857326366095, 'learning_rate': 3.123673619291021e-06, 'epoch': 0.44} 44%|████▎ | 5383/12313 [4:01:48<4:59:22, 2.59s/it] 44%|████▎ | 5384/12313 [4:01:51<4:59:07, 2.59s/it] {'loss': 0.6155, 'grad_norm': 4.925146872849586, 'learning_rate': 3.123036767803703e-06, 'epoch': 0.44} 44%|████▎ | 5384/12313 [4:01:51<4:59:07, 2.59s/it] 44%|████▎ | 5385/12313 [4:01:53<5:08:47, 2.67s/it] {'loss': 0.4055, 'grad_norm': 6.351273194254498, 'learning_rate': 3.122399873205538e-06, 'epoch': 0.44} 44%|████▎ | 5385/12313 [4:01:53<5:08:47, 2.67s/it] 44%|████▎ | 5386/12313 [4:01:56<5:10:46, 2.69s/it] {'loss': 0.4508, 'grad_norm': 7.439635575652295, 'learning_rate': 3.121762935540595e-06, 'epoch': 0.44} 44%|████▎ | 5386/12313 [4:01:56<5:10:46, 2.69s/it] 44%|████▍ | 5387/12313 [4:01:59<5:02:30, 2.62s/it] {'loss': 0.4207, 'grad_norm': 11.221500508022611, 'learning_rate': 3.121125954852948e-06, 'epoch': 0.44} 44%|████▍ | 5387/12313 [4:01:59<5:02:30, 2.62s/it] 44%|████▍ | 5388/12313 [4:02:01<5:01:57, 2.62s/it] {'loss': 0.4755, 'grad_norm': 15.28900713021598, 'learning_rate': 3.120488931186672e-06, 'epoch': 0.44} 44%|████▍ | 5388/12313 [4:02:01<5:01:57, 2.62s/it] 44%|████▍ | 5389/12313 [4:02:04<5:02:02, 2.62s/it] {'loss': 0.4798, 'grad_norm': 3.903889410795801, 'learning_rate': 3.1198518645858455e-06, 'epoch': 0.44} 44%|████▍ | 5389/12313 [4:02:04<5:02:02, 2.62s/it] 44%|████▍ | 5390/12313 [4:02:06<5:04:05, 2.64s/it] {'loss': 0.6514, 'grad_norm': 6.674485525301077, 'learning_rate': 3.1192147550945517e-06, 'epoch': 0.44} 44%|████▍ | 5390/12313 [4:02:06<5:04:05, 2.64s/it] 44%|████▍ | 5391/12313 [4:02:09<5:04:00, 2.64s/it] {'loss': 0.4016, 'grad_norm': 14.258332144198288, 'learning_rate': 3.118577602756873e-06, 'epoch': 0.44} 44%|████▍ | 5391/12313 [4:02:09<5:04:00, 2.64s/it] 44%|████▍ | 5392/12313 [4:02:12<5:03:19, 2.63s/it] {'loss': 0.5567, 'grad_norm': 4.922640216869197, 'learning_rate': 3.1179404076168983e-06, 'epoch': 0.44} 44%|████▍ | 5392/12313 [4:02:12<5:03:19, 2.63s/it] 44%|████▍ | 5393/12313 [4:02:14<5:07:08, 2.66s/it] {'loss': 0.6444, 'grad_norm': 5.597787362812228, 'learning_rate': 3.1173031697187178e-06, 'epoch': 0.44} 44%|████▍ | 5393/12313 [4:02:14<5:07:08, 2.66s/it] 44%|████▍ | 5394/12313 [4:02:18<5:20:38, 2.78s/it] {'loss': 0.3967, 'grad_norm': 3.8553056656955143, 'learning_rate': 3.116665889106425e-06, 'epoch': 0.44} 44%|████▍ | 5394/12313 [4:02:18<5:20:38, 2.78s/it] 44%|████▍ | 5395/12313 [4:02:20<5:15:58, 2.74s/it] {'loss': 0.4896, 'grad_norm': 3.517771835764705, 'learning_rate': 3.1160285658241157e-06, 'epoch': 0.44} 44%|████▍ | 5395/12313 [4:02:20<5:15:58, 2.74s/it] 44%|████▍ | 5396/12313 [4:02:23<5:11:39, 2.70s/it] {'loss': 0.6903, 'grad_norm': 3.6077248593647715, 'learning_rate': 3.11539119991589e-06, 'epoch': 0.44} 44%|████▍ | 5396/12313 [4:02:23<5:11:39, 2.70s/it] 44%|████▍ | 5397/12313 [4:02:25<5:09:09, 2.68s/it] {'loss': 0.602, 'grad_norm': 3.6023017655336056, 'learning_rate': 3.1147537914258513e-06, 'epoch': 0.44} 44%|████▍ | 5397/12313 [4:02:25<5:09:09, 2.68s/it] 44%|████▍ | 5398/12313 [4:02:28<5:00:24, 2.61s/it] {'loss': 0.4444, 'grad_norm': 5.836048987707051, 'learning_rate': 3.1141163403981033e-06, 'epoch': 0.44} 44%|████▍ | 5398/12313 [4:02:28<5:00:24, 2.61s/it] 44%|████▍ | 5399/12313 [4:02:31<5:06:19, 2.66s/it] {'loss': 0.6145, 'grad_norm': 4.215159805640962, 'learning_rate': 3.113478846876754e-06, 'epoch': 0.44} 44%|████▍ | 5399/12313 [4:02:31<5:06:19, 2.66s/it] 44%|████▍ | 5400/12313 [4:02:33<5:04:38, 2.64s/it] {'loss': 0.5145, 'grad_norm': 7.095922534739042, 'learning_rate': 3.1128413109059164e-06, 'epoch': 0.44} 44%|████▍ | 5400/12313 [4:02:33<5:04:38, 2.64s/it] 44%|████▍ | 5401/12313 [4:02:36<5:11:52, 2.71s/it] {'loss': 0.6229, 'grad_norm': 3.5538230920359077, 'learning_rate': 3.1122037325297027e-06, 'epoch': 0.44} 44%|████▍ | 5401/12313 [4:02:36<5:11:52, 2.71s/it] 44%|████▍ | 5402/12313 [4:02:39<5:04:35, 2.64s/it] {'loss': 0.5481, 'grad_norm': 5.0267826629365455, 'learning_rate': 3.1115661117922307e-06, 'epoch': 0.44} 44%|████▍ | 5402/12313 [4:02:39<5:04:35, 2.64s/it] 44%|████▍ | 5403/12313 [4:02:42<5:22:38, 2.80s/it] {'loss': 0.5039, 'grad_norm': 3.179251364264237, 'learning_rate': 3.1109284487376213e-06, 'epoch': 0.44} 44%|████▍ | 5403/12313 [4:02:42<5:22:38, 2.80s/it] 44%|████▍ | 5404/12313 [4:02:44<5:14:48, 2.73s/it] {'loss': 0.5207, 'grad_norm': 3.2478475873654626, 'learning_rate': 3.1102907434099962e-06, 'epoch': 0.44} 44%|████▍ | 5404/12313 [4:02:44<5:14:48, 2.73s/it] 44%|████▍ | 5405/12313 [4:02:47<5:14:59, 2.74s/it] {'loss': 0.5009, 'grad_norm': 7.568604033338787, 'learning_rate': 3.1096529958534805e-06, 'epoch': 0.44} 44%|████▍ | 5405/12313 [4:02:47<5:14:59, 2.74s/it] 44%|████▍ | 5406/12313 [4:02:50<5:14:43, 2.73s/it] {'loss': 0.4465, 'grad_norm': 5.352814742876498, 'learning_rate': 3.1090152061122053e-06, 'epoch': 0.44} 44%|████▍ | 5406/12313 [4:02:50<5:14:43, 2.73s/it] 44%|████▍ | 5407/12313 [4:02:53<5:20:33, 2.79s/it] {'loss': 0.5651, 'grad_norm': 6.7191876007979525, 'learning_rate': 3.1083773742303003e-06, 'epoch': 0.44} 44%|████▍ | 5407/12313 [4:02:53<5:20:33, 2.79s/it] 44%|████▍ | 5408/12313 [4:02:56<5:20:19, 2.78s/it] {'loss': 0.5716, 'grad_norm': 4.940449250528321, 'learning_rate': 3.1077395002519013e-06, 'epoch': 0.44} 44%|████▍ | 5408/12313 [4:02:56<5:20:19, 2.78s/it] 44%|████▍ | 5409/12313 [4:02:58<5:16:04, 2.75s/it] {'loss': 0.9121, 'grad_norm': 3.648123205660983, 'learning_rate': 3.1071015842211447e-06, 'epoch': 0.44} 44%|████▍ | 5409/12313 [4:02:58<5:16:04, 2.75s/it] 44%|████▍ | 5410/12313 [4:03:01<5:13:13, 2.72s/it] {'loss': 0.6146, 'grad_norm': 4.953572403071827, 'learning_rate': 3.1064636261821716e-06, 'epoch': 0.44} 44%|████▍ | 5410/12313 [4:03:01<5:13:13, 2.72s/it] 44%|████▍ | 5411/12313 [4:03:04<5:15:27, 2.74s/it] {'loss': 0.3964, 'grad_norm': 6.9246508482882145, 'learning_rate': 3.105825626179126e-06, 'epoch': 0.44} 44%|████▍ | 5411/12313 [4:03:04<5:15:27, 2.74s/it] 44%|████▍ | 5412/12313 [4:03:06<5:17:30, 2.76s/it] {'loss': 0.6518, 'grad_norm': 4.790951487900791, 'learning_rate': 3.1051875842561523e-06, 'epoch': 0.44} 44%|████▍ | 5412/12313 [4:03:06<5:17:30, 2.76s/it] 44%|████▍ | 5413/12313 [4:03:09<5:12:34, 2.72s/it] {'loss': 0.5212, 'grad_norm': 4.736239149088806, 'learning_rate': 3.1045495004574017e-06, 'epoch': 0.44} 44%|████▍ | 5413/12313 [4:03:09<5:12:34, 2.72s/it] 44%|████▍ | 5414/12313 [4:03:12<5:10:18, 2.70s/it] {'loss': 0.5596, 'grad_norm': 7.540621113454675, 'learning_rate': 3.1039113748270248e-06, 'epoch': 0.44} 44%|████▍ | 5414/12313 [4:03:12<5:10:18, 2.70s/it] 44%|████▍ | 5415/12313 [4:03:14<5:10:44, 2.70s/it] {'loss': 0.4385, 'grad_norm': 3.720380110888026, 'learning_rate': 3.1032732074091765e-06, 'epoch': 0.44} 44%|████▍ | 5415/12313 [4:03:14<5:10:44, 2.70s/it] 44%|████▍ | 5416/12313 [4:03:17<5:04:53, 2.65s/it] {'loss': 0.6445, 'grad_norm': 4.7658804425871235, 'learning_rate': 3.1026349982480153e-06, 'epoch': 0.44} 44%|████▍ | 5416/12313 [4:03:17<5:04:53, 2.65s/it] 44%|████▍ | 5417/12313 [4:03:20<5:10:37, 2.70s/it] {'loss': 0.5791, 'grad_norm': 3.8751042373300963, 'learning_rate': 3.101996747387702e-06, 'epoch': 0.44} 44%|████▍ | 5417/12313 [4:03:20<5:10:37, 2.70s/it] 44%|████▍ | 5418/12313 [4:03:22<5:11:21, 2.71s/it] {'loss': 0.5171, 'grad_norm': 4.852177339352803, 'learning_rate': 3.101358454872399e-06, 'epoch': 0.44} 44%|████▍ | 5418/12313 [4:03:22<5:11:21, 2.71s/it] 44%|████▍ | 5419/12313 [4:03:25<5:08:24, 2.68s/it] {'loss': 0.491, 'grad_norm': 4.223969934330754, 'learning_rate': 3.1007201207462745e-06, 'epoch': 0.44} 44%|████▍ | 5419/12313 [4:03:25<5:08:24, 2.68s/it] 44%|████▍ | 5420/12313 [4:03:28<5:06:19, 2.67s/it] {'loss': 0.4736, 'grad_norm': 6.43559751438046, 'learning_rate': 3.1000817450534964e-06, 'epoch': 0.44} 44%|████▍ | 5420/12313 [4:03:28<5:06:19, 2.67s/it] 44%|████▍ | 5421/12313 [4:03:31<5:16:19, 2.75s/it] {'loss': 0.5335, 'grad_norm': 4.415584725720336, 'learning_rate': 3.0994433278382374e-06, 'epoch': 0.44} 44%|████▍ | 5421/12313 [4:03:31<5:16:19, 2.75s/it] 44%|████▍ | 5422/12313 [4:03:33<5:10:55, 2.71s/it] {'loss': 0.4042, 'grad_norm': 7.044600702022488, 'learning_rate': 3.0988048691446733e-06, 'epoch': 0.44} 44%|████▍ | 5422/12313 [4:03:33<5:10:55, 2.71s/it] 44%|████▍ | 5423/12313 [4:03:36<5:16:12, 2.75s/it] {'loss': 0.5884, 'grad_norm': 5.958216205084174, 'learning_rate': 3.0981663690169806e-06, 'epoch': 0.44} 44%|████▍ | 5423/12313 [4:03:36<5:16:12, 2.75s/it] 44%|████▍ | 5424/12313 [4:03:39<5:10:29, 2.70s/it] {'loss': 0.4748, 'grad_norm': 7.972188724833584, 'learning_rate': 3.097527827499341e-06, 'epoch': 0.44} 44%|████▍ | 5424/12313 [4:03:39<5:10:29, 2.70s/it] 44%|████▍ | 5425/12313 [4:03:42<5:16:28, 2.76s/it] {'loss': 0.4807, 'grad_norm': 5.3560614787912195, 'learning_rate': 3.0968892446359383e-06, 'epoch': 0.44} 44%|████▍ | 5425/12313 [4:03:42<5:16:28, 2.76s/it] 44%|████▍ | 5426/12313 [4:03:44<5:11:15, 2.71s/it] {'loss': 0.5928, 'grad_norm': 4.339200995867066, 'learning_rate': 3.0962506204709587e-06, 'epoch': 0.44} 44%|████▍ | 5426/12313 [4:03:44<5:11:15, 2.71s/it] 44%|████▍ | 5427/12313 [4:03:47<5:10:32, 2.71s/it] {'loss': 0.4506, 'grad_norm': 5.4395379188651605, 'learning_rate': 3.0956119550485925e-06, 'epoch': 0.44} 44%|████▍ | 5427/12313 [4:03:47<5:10:32, 2.71s/it] 44%|████▍ | 5428/12313 [4:03:49<5:04:42, 2.66s/it] {'loss': 0.4891, 'grad_norm': 6.180414632693948, 'learning_rate': 3.09497324841303e-06, 'epoch': 0.44} 44%|████▍ | 5428/12313 [4:03:49<5:04:42, 2.66s/it] 44%|████▍ | 5429/12313 [4:03:52<4:55:24, 2.57s/it] {'loss': 0.4157, 'grad_norm': 4.088720798041899, 'learning_rate': 3.0943345006084678e-06, 'epoch': 0.44} 44%|████▍ | 5429/12313 [4:03:52<4:55:24, 2.57s/it] 44%|████▍ | 5430/12313 [4:03:55<5:02:45, 2.64s/it] {'loss': 0.4818, 'grad_norm': 4.834350877095814, 'learning_rate': 3.0936957116791048e-06, 'epoch': 0.44} 44%|████▍ | 5430/12313 [4:03:55<5:02:45, 2.64s/it] 44%|████▍ | 5431/12313 [4:03:57<5:03:47, 2.65s/it] {'loss': 0.4463, 'grad_norm': 6.729193046534626, 'learning_rate': 3.0930568816691394e-06, 'epoch': 0.44} 44%|████▍ | 5431/12313 [4:03:57<5:03:47, 2.65s/it] 44%|████▍ | 5432/12313 [4:04:00<5:00:32, 2.62s/it] {'loss': 0.5464, 'grad_norm': 4.907909407785353, 'learning_rate': 3.092418010622777e-06, 'epoch': 0.44} 44%|████▍ | 5432/12313 [4:04:00<5:00:32, 2.62s/it] 44%|████▍ | 5433/12313 [4:04:03<5:03:56, 2.65s/it] {'loss': 0.5931, 'grad_norm': 3.5985370572492146, 'learning_rate': 3.091779098584224e-06, 'epoch': 0.44} 44%|████▍ | 5433/12313 [4:04:03<5:03:56, 2.65s/it] 44%|████▍ | 5434/12313 [4:04:05<5:03:07, 2.64s/it] {'loss': 0.483, 'grad_norm': 3.530928584827468, 'learning_rate': 3.0911401455976882e-06, 'epoch': 0.44} 44%|████▍ | 5434/12313 [4:04:05<5:03:07, 2.64s/it] 44%|████▍ | 5435/12313 [4:04:08<5:05:56, 2.67s/it] {'loss': 0.4682, 'grad_norm': 4.3804647274752035, 'learning_rate': 3.0905011517073834e-06, 'epoch': 0.44} 44%|████▍ | 5435/12313 [4:04:08<5:05:56, 2.67s/it] 44%|████▍ | 5436/12313 [4:04:11<5:06:31, 2.67s/it] {'loss': 0.5017, 'grad_norm': 8.630610987447541, 'learning_rate': 3.089862116957525e-06, 'epoch': 0.44} 44%|████▍ | 5436/12313 [4:04:11<5:06:31, 2.67s/it] 44%|████▍ | 5437/12313 [4:04:13<5:10:03, 2.71s/it] {'loss': 0.4949, 'grad_norm': 4.490184934510654, 'learning_rate': 3.089223041392329e-06, 'epoch': 0.44} 44%|████▍ | 5437/12313 [4:04:13<5:10:03, 2.71s/it] 44%|████▍ | 5438/12313 [4:04:16<5:08:18, 2.69s/it] {'loss': 0.6344, 'grad_norm': 4.853522689003282, 'learning_rate': 3.0885839250560172e-06, 'epoch': 0.44} 44%|████▍ | 5438/12313 [4:04:16<5:08:18, 2.69s/it] 44%|████▍ | 5439/12313 [4:04:19<5:16:18, 2.76s/it] {'loss': 0.5162, 'grad_norm': 7.016095469900614, 'learning_rate': 3.087944767992813e-06, 'epoch': 0.44} 44%|████▍ | 5439/12313 [4:04:19<5:16:18, 2.76s/it] 44%|████▍ | 5440/12313 [4:04:22<5:10:06, 2.71s/it] {'loss': 0.6458, 'grad_norm': 5.994420464445846, 'learning_rate': 3.0873055702469416e-06, 'epoch': 0.44} 44%|████▍ | 5440/12313 [4:04:22<5:10:06, 2.71s/it] 44%|████▍ | 5441/12313 [4:04:24<5:10:18, 2.71s/it] {'loss': 0.4059, 'grad_norm': 4.376884747350049, 'learning_rate': 3.086666331862634e-06, 'epoch': 0.44} 44%|████▍ | 5441/12313 [4:04:24<5:10:18, 2.71s/it] 44%|████▍ | 5442/12313 [4:04:27<5:22:38, 2.82s/it] {'loss': 0.4641, 'grad_norm': 5.692559176862132, 'learning_rate': 3.0860270528841208e-06, 'epoch': 0.44} 44%|████▍ | 5442/12313 [4:04:27<5:22:38, 2.82s/it] 44%|████▍ | 5443/12313 [4:04:30<5:17:01, 2.77s/it] {'loss': 0.5179, 'grad_norm': 4.8077643907663665, 'learning_rate': 3.085387733355637e-06, 'epoch': 0.44} 44%|████▍ | 5443/12313 [4:04:30<5:17:01, 2.77s/it] 44%|████▍ | 5444/12313 [4:04:33<5:24:09, 2.83s/it] {'loss': 0.6234, 'grad_norm': 5.021716165337083, 'learning_rate': 3.08474837332142e-06, 'epoch': 0.44} 44%|████▍ | 5444/12313 [4:04:33<5:24:09, 2.83s/it] 44%|████▍ | 5445/12313 [4:04:36<5:21:33, 2.81s/it] {'loss': 0.5355, 'grad_norm': 5.826683036908708, 'learning_rate': 3.0841089728257108e-06, 'epoch': 0.44} 44%|████▍ | 5445/12313 [4:04:36<5:21:33, 2.81s/it] 44%|████▍ | 5446/12313 [4:04:38<5:17:46, 2.78s/it] {'loss': 0.5641, 'grad_norm': 4.024099529077247, 'learning_rate': 3.0834695319127516e-06, 'epoch': 0.44} 44%|████▍ | 5446/12313 [4:04:38<5:17:46, 2.78s/it] 44%|████▍ | 5447/12313 [4:04:41<5:18:34, 2.78s/it] {'loss': 0.4901, 'grad_norm': 6.876907227052113, 'learning_rate': 3.082830050626789e-06, 'epoch': 0.44} 44%|████▍ | 5447/12313 [4:04:41<5:18:34, 2.78s/it] 44%|████▍ | 5448/12313 [4:04:44<5:32:02, 2.90s/it] {'loss': 0.3913, 'grad_norm': 5.144105316785434, 'learning_rate': 3.0821905290120712e-06, 'epoch': 0.44} 44%|████▍ | 5448/12313 [4:04:44<5:32:02, 2.90s/it] 44%|████▍ | 5449/12313 [4:04:47<5:30:26, 2.89s/it] {'loss': 0.4229, 'grad_norm': 3.975940531276462, 'learning_rate': 3.0815509671128506e-06, 'epoch': 0.44} 44%|████▍ | 5449/12313 [4:04:47<5:30:26, 2.89s/it] 44%|████▍ | 5450/12313 [4:04:50<5:24:48, 2.84s/it] {'loss': 0.5113, 'grad_norm': 5.375633378244454, 'learning_rate': 3.0809113649733803e-06, 'epoch': 0.44} 44%|████▍ | 5450/12313 [4:04:50<5:24:48, 2.84s/it] 44%|████▍ | 5451/12313 [4:04:53<5:19:46, 2.80s/it] {'loss': 0.5075, 'grad_norm': 12.651478454129343, 'learning_rate': 3.0802717226379175e-06, 'epoch': 0.44} 44%|████▍ | 5451/12313 [4:04:53<5:19:46, 2.80s/it] 44%|████▍ | 5452/12313 [4:04:55<5:14:16, 2.75s/it] {'loss': 0.5238, 'grad_norm': 4.831174765479994, 'learning_rate': 3.079632040150724e-06, 'epoch': 0.44} 44%|████▍ | 5452/12313 [4:04:55<5:14:16, 2.75s/it] 44%|████▍ | 5453/12313 [4:04:58<5:03:03, 2.65s/it] {'loss': 0.5808, 'grad_norm': 5.064193661858532, 'learning_rate': 3.07899231755606e-06, 'epoch': 0.44} 44%|████▍ | 5453/12313 [4:04:58<5:03:03, 2.65s/it] 44%|████▍ | 5454/12313 [4:05:01<5:06:55, 2.68s/it] {'loss': 0.4835, 'grad_norm': 5.352945242031246, 'learning_rate': 3.0783525548981917e-06, 'epoch': 0.44} 44%|████▍ | 5454/12313 [4:05:01<5:06:55, 2.68s/it] 44%|████▍ | 5455/12313 [4:05:03<5:08:29, 2.70s/it] {'loss': 0.4251, 'grad_norm': 4.965261328987609, 'learning_rate': 3.077712752221388e-06, 'epoch': 0.44} 44%|████▍ | 5455/12313 [4:05:03<5:08:29, 2.70s/it] 44%|████▍ | 5456/12313 [4:05:06<5:08:46, 2.70s/it] {'loss': 0.6029, 'grad_norm': 4.9254182519569465, 'learning_rate': 3.0770729095699194e-06, 'epoch': 0.44} 44%|████▍ | 5456/12313 [4:05:06<5:08:46, 2.70s/it] 44%|████▍ | 5457/12313 [4:05:09<5:19:46, 2.80s/it] {'loss': 0.5718, 'grad_norm': 3.5617209008138992, 'learning_rate': 3.0764330269880593e-06, 'epoch': 0.44} 44%|████▍ | 5457/12313 [4:05:09<5:19:46, 2.80s/it] 44%|████▍ | 5458/12313 [4:05:12<5:19:34, 2.80s/it] {'loss': 0.6286, 'grad_norm': 4.355025485119006, 'learning_rate': 3.0757931045200844e-06, 'epoch': 0.44} 44%|████▍ | 5458/12313 [4:05:12<5:19:34, 2.80s/it] 44%|████▍ | 5459/12313 [4:05:14<5:13:16, 2.74s/it] {'loss': 0.5216, 'grad_norm': 6.239139168135333, 'learning_rate': 3.075153142210274e-06, 'epoch': 0.44} 44%|████▍ | 5459/12313 [4:05:14<5:13:16, 2.74s/it] 44%|████▍ | 5460/12313 [4:05:17<5:11:55, 2.73s/it] {'loss': 0.5841, 'grad_norm': 4.784319073015132, 'learning_rate': 3.0745131401029105e-06, 'epoch': 0.44} 44%|████▍ | 5460/12313 [4:05:17<5:11:55, 2.73s/it] 44%|████▍ | 5461/12313 [4:05:20<5:15:04, 2.76s/it] {'loss': 0.6327, 'grad_norm': 4.463298659345465, 'learning_rate': 3.073873098242278e-06, 'epoch': 0.44} 44%|████▍ | 5461/12313 [4:05:20<5:15:04, 2.76s/it] 44%|████▍ | 5462/12313 [4:05:23<5:13:55, 2.75s/it] {'loss': 0.5043, 'grad_norm': 6.8742335301264585, 'learning_rate': 3.0732330166726644e-06, 'epoch': 0.44} 44%|████▍ | 5462/12313 [4:05:23<5:13:55, 2.75s/it] 44%|████▍ | 5463/12313 [4:05:25<5:07:01, 2.69s/it] {'loss': 0.5506, 'grad_norm': 3.366707977466441, 'learning_rate': 3.07259289543836e-06, 'epoch': 0.44} 44%|████▍ | 5463/12313 [4:05:25<5:07:01, 2.69s/it] 44%|████▍ | 5464/12313 [4:05:28<5:09:49, 2.71s/it] {'loss': 0.4197, 'grad_norm': 4.220402613518687, 'learning_rate': 3.0719527345836568e-06, 'epoch': 0.44} 44%|████▍ | 5464/12313 [4:05:28<5:09:49, 2.71s/it] 44%|████▍ | 5465/12313 [4:05:31<5:12:48, 2.74s/it] {'loss': 0.4006, 'grad_norm': 3.555095408073302, 'learning_rate': 3.0713125341528527e-06, 'epoch': 0.44} 44%|████▍ | 5465/12313 [4:05:31<5:12:48, 2.74s/it] 44%|████▍ | 5466/12313 [4:05:33<5:09:02, 2.71s/it] {'loss': 0.5243, 'grad_norm': 5.959143884494173, 'learning_rate': 3.0706722941902438e-06, 'epoch': 0.44} 44%|████▍ | 5466/12313 [4:05:33<5:09:02, 2.71s/it] 44%|████▍ | 5467/12313 [4:05:36<5:03:52, 2.66s/it] {'loss': 0.5004, 'grad_norm': 3.7731524034106707, 'learning_rate': 3.0700320147401324e-06, 'epoch': 0.44} 44%|████▍ | 5467/12313 [4:05:36<5:03:52, 2.66s/it] 44%|████▍ | 5468/12313 [4:05:39<5:05:27, 2.68s/it] {'loss': 0.5797, 'grad_norm': 4.782660700197082, 'learning_rate': 3.0693916958468236e-06, 'epoch': 0.44} 44%|████▍ | 5468/12313 [4:05:39<5:05:27, 2.68s/it] 44%|████▍ | 5469/12313 [4:05:41<5:04:42, 2.67s/it] {'loss': 0.614, 'grad_norm': 4.37229758317183, 'learning_rate': 3.0687513375546216e-06, 'epoch': 0.44} 44%|████▍ | 5469/12313 [4:05:41<5:04:42, 2.67s/it] 44%|████▍ | 5470/12313 [4:05:44<5:08:49, 2.71s/it] {'loss': 0.7196, 'grad_norm': 2.828225215491486, 'learning_rate': 3.0681109399078375e-06, 'epoch': 0.44} 44%|████▍ | 5470/12313 [4:05:44<5:08:49, 2.71s/it] 44%|████▍ | 5471/12313 [4:05:47<5:06:27, 2.69s/it] {'loss': 0.476, 'grad_norm': 6.309177247959491, 'learning_rate': 3.0674705029507833e-06, 'epoch': 0.44} 44%|████▍ | 5471/12313 [4:05:47<5:06:27, 2.69s/it] 44%|████▍ | 5472/12313 [4:05:49<5:04:42, 2.67s/it] {'loss': 0.6375, 'grad_norm': 4.103834477568316, 'learning_rate': 3.0668300267277735e-06, 'epoch': 0.44} 44%|████▍ | 5472/12313 [4:05:49<5:04:42, 2.67s/it] 44%|████▍ | 5473/12313 [4:05:52<5:07:28, 2.70s/it] {'loss': 0.4653, 'grad_norm': 6.093953734178574, 'learning_rate': 3.066189511283126e-06, 'epoch': 0.44} 44%|████▍ | 5473/12313 [4:05:52<5:07:28, 2.70s/it] 44%|████▍ | 5474/12313 [4:05:55<5:04:29, 2.67s/it] {'loss': 0.6781, 'grad_norm': 5.225914425304106, 'learning_rate': 3.0655489566611603e-06, 'epoch': 0.44} 44%|████▍ | 5474/12313 [4:05:55<5:04:29, 2.67s/it] 44%|████▍ | 5475/12313 [4:05:57<5:00:32, 2.64s/it] {'loss': 0.5596, 'grad_norm': 5.8819081687868335, 'learning_rate': 3.0649083629062e-06, 'epoch': 0.44} 44%|████▍ | 5475/12313 [4:05:57<5:00:32, 2.64s/it] 44%|████▍ | 5476/12313 [4:06:00<4:55:59, 2.60s/it] {'loss': 0.3823, 'grad_norm': 7.739981641220584, 'learning_rate': 3.0642677300625704e-06, 'epoch': 0.44} 44%|████▍ | 5476/12313 [4:06:00<4:55:59, 2.60s/it] 44%|████▍ | 5477/12313 [4:06:02<4:56:20, 2.60s/it] {'loss': 0.4458, 'grad_norm': 5.131722265818073, 'learning_rate': 3.063627058174601e-06, 'epoch': 0.44} 44%|████▍ | 5477/12313 [4:06:02<4:56:20, 2.60s/it] 44%|████▍ | 5478/12313 [4:06:05<5:02:38, 2.66s/it] {'loss': 0.7122, 'grad_norm': 3.2411651551006364, 'learning_rate': 3.062986347286622e-06, 'epoch': 0.44} 44%|████▍ | 5478/12313 [4:06:05<5:02:38, 2.66s/it] 44%|████▍ | 5479/12313 [4:06:08<5:02:28, 2.66s/it] {'loss': 0.4025, 'grad_norm': 4.225618329584794, 'learning_rate': 3.0623455974429677e-06, 'epoch': 0.44} 44%|████▍ | 5479/12313 [4:06:08<5:02:28, 2.66s/it] 45%|████▍ | 5480/12313 [4:06:11<5:03:25, 2.66s/it] {'loss': 0.6129, 'grad_norm': 7.665112766366524, 'learning_rate': 3.061704808687973e-06, 'epoch': 0.45} 45%|████▍ | 5480/12313 [4:06:11<5:03:25, 2.66s/it] 45%|████▍ | 5481/12313 [4:06:13<4:55:35, 2.60s/it] {'loss': 0.5051, 'grad_norm': 9.07304807101099, 'learning_rate': 3.061063981065979e-06, 'epoch': 0.45} 45%|████▍ | 5481/12313 [4:06:13<4:55:35, 2.60s/it] 45%|████▍ | 5482/12313 [4:06:16<4:59:29, 2.63s/it] {'loss': 0.6264, 'grad_norm': 4.554173085391561, 'learning_rate': 3.0604231146213276e-06, 'epoch': 0.45} 45%|████▍ | 5482/12313 [4:06:16<4:59:29, 2.63s/it] 45%|████▍ | 5483/12313 [4:06:18<4:59:05, 2.63s/it] {'loss': 0.5022, 'grad_norm': 3.5414231796303515, 'learning_rate': 3.0597822093983614e-06, 'epoch': 0.45} 45%|████▍ | 5483/12313 [4:06:18<4:59:05, 2.63s/it] 45%|████▍ | 5484/12313 [4:06:21<5:00:03, 2.64s/it] {'loss': 0.6486, 'grad_norm': 8.230686535869143, 'learning_rate': 3.0591412654414297e-06, 'epoch': 0.45} 45%|████▍ | 5484/12313 [4:06:21<5:00:03, 2.64s/it] 45%|████▍ | 5485/12313 [4:06:24<4:59:50, 2.63s/it] {'loss': 0.5926, 'grad_norm': 5.579450710250304, 'learning_rate': 3.058500282794882e-06, 'epoch': 0.45} 45%|████▍ | 5485/12313 [4:06:24<4:59:50, 2.63s/it] 45%|████▍ | 5486/12313 [4:06:26<5:00:11, 2.64s/it] {'loss': 0.3853, 'grad_norm': 6.0408658738191425, 'learning_rate': 3.0578592615030693e-06, 'epoch': 0.45} 45%|████▍ | 5486/12313 [4:06:26<5:00:11, 2.64s/it] 45%|████▍ | 5487/12313 [4:06:29<5:01:22, 2.65s/it] {'loss': 0.4482, 'grad_norm': 4.852372309573421, 'learning_rate': 3.057218201610349e-06, 'epoch': 0.45} 45%|████▍ | 5487/12313 [4:06:29<5:01:22, 2.65s/it] 45%|████▍ | 5488/12313 [4:06:32<5:03:48, 2.67s/it] {'loss': 0.5611, 'grad_norm': 5.300082736246172, 'learning_rate': 3.056577103161078e-06, 'epoch': 0.45} 45%|████▍ | 5488/12313 [4:06:32<5:03:48, 2.67s/it] 45%|████▍ | 5489/12313 [4:06:35<5:14:38, 2.77s/it] {'loss': 0.505, 'grad_norm': 5.314374108815902, 'learning_rate': 3.055935966199617e-06, 'epoch': 0.45} 45%|████▍ | 5489/12313 [4:06:35<5:14:38, 2.77s/it] 45%|████▍ | 5490/12313 [4:06:38<5:18:09, 2.80s/it] {'loss': 0.5165, 'grad_norm': 3.462874288033993, 'learning_rate': 3.0552947907703296e-06, 'epoch': 0.45} 45%|████▍ | 5490/12313 [4:06:38<5:18:09, 2.80s/it] 45%|████▍ | 5491/12313 [4:06:40<5:13:34, 2.76s/it] {'loss': 0.3758, 'grad_norm': 5.895017788703498, 'learning_rate': 3.054653576917581e-06, 'epoch': 0.45} 45%|████▍ | 5491/12313 [4:06:40<5:13:34, 2.76s/it] 45%|████▍ | 5492/12313 [4:06:43<5:16:25, 2.78s/it] {'loss': 0.574, 'grad_norm': 3.720905933238603, 'learning_rate': 3.054012324685742e-06, 'epoch': 0.45} 45%|████▍ | 5492/12313 [4:06:43<5:16:25, 2.78s/it] 45%|████▍ | 5493/12313 [4:06:46<5:09:57, 2.73s/it] {'loss': 0.6723, 'grad_norm': 6.446011438422685, 'learning_rate': 3.05337103411918e-06, 'epoch': 0.45} 45%|████▍ | 5493/12313 [4:06:46<5:09:57, 2.73s/it] 45%|████▍ | 5494/12313 [4:06:48<5:11:03, 2.74s/it] {'loss': 0.5515, 'grad_norm': 4.27621399758535, 'learning_rate': 3.0527297052622724e-06, 'epoch': 0.45} 45%|████▍ | 5494/12313 [4:06:48<5:11:03, 2.74s/it] 45%|████▍ | 5495/12313 [4:06:51<5:07:01, 2.70s/it] {'loss': 0.4223, 'grad_norm': 14.21365870200448, 'learning_rate': 3.0520883381593945e-06, 'epoch': 0.45} 45%|████▍ | 5495/12313 [4:06:51<5:07:01, 2.70s/it] 45%|████▍ | 5496/12313 [4:06:54<5:03:19, 2.67s/it] {'loss': 0.7102, 'grad_norm': 3.727959140755334, 'learning_rate': 3.0514469328549244e-06, 'epoch': 0.45} 45%|████▍ | 5496/12313 [4:06:54<5:03:19, 2.67s/it] 45%|████▍ | 5497/12313 [4:06:56<4:59:28, 2.64s/it] {'loss': 0.6996, 'grad_norm': 4.155096776178402, 'learning_rate': 3.050805489393246e-06, 'epoch': 0.45} 45%|████▍ | 5497/12313 [4:06:56<4:59:28, 2.64s/it] 45%|████▍ | 5498/12313 [4:06:59<4:58:29, 2.63s/it] {'loss': 0.6374, 'grad_norm': 3.376388149465253, 'learning_rate': 3.0501640078187433e-06, 'epoch': 0.45} 45%|████▍ | 5498/12313 [4:06:59<4:58:29, 2.63s/it] 45%|████▍ | 5499/12313 [4:07:01<4:52:55, 2.58s/it] {'loss': 0.5527, 'grad_norm': 6.291374977431427, 'learning_rate': 3.049522488175802e-06, 'epoch': 0.45} 45%|████▍ | 5499/12313 [4:07:01<4:52:55, 2.58s/it] 45%|████▍ | 5500/12313 [4:07:04<4:57:02, 2.62s/it] {'loss': 0.4501, 'grad_norm': 6.926029198647835, 'learning_rate': 3.048880930508813e-06, 'epoch': 0.45} 45%|████▍ | 5500/12313 [4:07:04<4:57:02, 2.62s/it] 45%|████▍ | 5501/12313 [4:07:07<5:00:12, 2.64s/it] {'loss': 0.4361, 'grad_norm': 8.251812163196625, 'learning_rate': 3.0482393348621686e-06, 'epoch': 0.45} 45%|████▍ | 5501/12313 [4:07:07<5:00:12, 2.64s/it] 45%|████▍ | 5502/12313 [4:07:09<4:51:44, 2.57s/it] {'loss': 0.4884, 'grad_norm': 4.312548651784504, 'learning_rate': 3.0475977012802636e-06, 'epoch': 0.45} 45%|████▍ | 5502/12313 [4:07:09<4:51:44, 2.57s/it] 45%|████▍ | 5503/12313 [4:07:12<4:57:19, 2.62s/it] {'loss': 0.4457, 'grad_norm': 8.316308066336365, 'learning_rate': 3.0469560298074963e-06, 'epoch': 0.45} 45%|████▍ | 5503/12313 [4:07:12<4:57:19, 2.62s/it] 45%|████▍ | 5504/12313 [4:07:14<5:00:09, 2.64s/it] {'loss': 0.5825, 'grad_norm': 5.919440004233395, 'learning_rate': 3.046314320488266e-06, 'epoch': 0.45} 45%|████▍ | 5504/12313 [4:07:14<5:00:09, 2.64s/it] 45%|████▍ | 5505/12313 [4:07:17<4:57:42, 2.62s/it] {'loss': 0.7033, 'grad_norm': 3.520589949942169, 'learning_rate': 3.045672573366976e-06, 'epoch': 0.45} 45%|████▍ | 5505/12313 [4:07:17<4:57:42, 2.62s/it] 45%|████▍ | 5506/12313 [4:07:20<5:00:32, 2.65s/it] {'loss': 0.5824, 'grad_norm': 5.061834295396976, 'learning_rate': 3.045030788488032e-06, 'epoch': 0.45} 45%|████▍ | 5506/12313 [4:07:20<5:00:32, 2.65s/it] 45%|████▍ | 5507/12313 [4:07:22<4:59:55, 2.64s/it] {'loss': 0.5358, 'grad_norm': 5.285749057766182, 'learning_rate': 3.0443889658958425e-06, 'epoch': 0.45} 45%|████▍ | 5507/12313 [4:07:22<4:59:55, 2.64s/it] 45%|████▍ | 5508/12313 [4:07:25<5:00:01, 2.65s/it] {'loss': 0.4972, 'grad_norm': 6.668648467416671, 'learning_rate': 3.043747105634817e-06, 'epoch': 0.45} 45%|████▍ | 5508/12313 [4:07:25<5:00:01, 2.65s/it] 45%|████▍ | 5509/12313 [4:07:28<5:05:56, 2.70s/it] {'loss': 0.7238, 'grad_norm': 5.315769012630028, 'learning_rate': 3.0431052077493693e-06, 'epoch': 0.45} 45%|████▍ | 5509/12313 [4:07:28<5:05:56, 2.70s/it] 45%|████▍ | 5510/12313 [4:07:31<5:04:43, 2.69s/it] {'loss': 0.7628, 'grad_norm': 4.114487698288283, 'learning_rate': 3.0424632722839164e-06, 'epoch': 0.45} 45%|████▍ | 5510/12313 [4:07:31<5:04:43, 2.69s/it] 45%|████▍ | 5511/12313 [4:07:33<5:01:41, 2.66s/it] {'loss': 0.6014, 'grad_norm': 5.636091284907231, 'learning_rate': 3.041821299282876e-06, 'epoch': 0.45} 45%|████▍ | 5511/12313 [4:07:33<5:01:41, 2.66s/it] 45%|████▍ | 5512/12313 [4:07:36<5:03:24, 2.68s/it] {'loss': 0.4414, 'grad_norm': 5.436008606726467, 'learning_rate': 3.0411792887906684e-06, 'epoch': 0.45} 45%|████▍ | 5512/12313 [4:07:36<5:03:24, 2.68s/it] 45%|████▍ | 5513/12313 [4:07:39<5:13:11, 2.76s/it] {'loss': 0.477, 'grad_norm': 5.179891754796302, 'learning_rate': 3.0405372408517187e-06, 'epoch': 0.45} 45%|████▍ | 5513/12313 [4:07:39<5:13:11, 2.76s/it] 45%|████▍ | 5514/12313 [4:07:42<5:11:52, 2.75s/it] {'loss': 0.6141, 'grad_norm': 2.443055617371502, 'learning_rate': 3.0398951555104528e-06, 'epoch': 0.45} 45%|████▍ | 5514/12313 [4:07:42<5:11:52, 2.75s/it] 45%|████▍ | 5515/12313 [4:07:45<5:19:52, 2.82s/it] {'loss': 0.4829, 'grad_norm': 3.1073744907433745, 'learning_rate': 3.0392530328112997e-06, 'epoch': 0.45} 45%|████▍ | 5515/12313 [4:07:45<5:19:52, 2.82s/it] 45%|████▍ | 5516/12313 [4:07:47<5:13:09, 2.76s/it] {'loss': 0.4541, 'grad_norm': 10.2090255345531, 'learning_rate': 3.0386108727986903e-06, 'epoch': 0.45} 45%|████▍ | 5516/12313 [4:07:47<5:13:09, 2.76s/it] 45%|████▍ | 5517/12313 [4:07:50<5:12:05, 2.76s/it] {'loss': 0.4629, 'grad_norm': 5.848825993848949, 'learning_rate': 3.037968675517059e-06, 'epoch': 0.45} 45%|████▍ | 5517/12313 [4:07:50<5:12:05, 2.76s/it] 45%|████▍ | 5518/12313 [4:07:52<5:02:10, 2.67s/it] {'loss': 0.512, 'grad_norm': 5.418590209098287, 'learning_rate': 3.0373264410108422e-06, 'epoch': 0.45} 45%|████▍ | 5518/12313 [4:07:52<5:02:10, 2.67s/it] 45%|████▍ | 5519/12313 [4:07:55<5:03:49, 2.68s/it] {'loss': 0.4333, 'grad_norm': 4.498788774198856, 'learning_rate': 3.03668416932448e-06, 'epoch': 0.45} 45%|████▍ | 5519/12313 [4:07:55<5:03:49, 2.68s/it] 45%|████▍ | 5520/12313 [4:07:58<4:56:35, 2.62s/it] {'loss': 0.4415, 'grad_norm': 7.415061980682369, 'learning_rate': 3.0360418605024134e-06, 'epoch': 0.45} 45%|████▍ | 5520/12313 [4:07:58<4:56:35, 2.62s/it] 45%|████▍ | 5521/12313 [4:08:01<5:09:02, 2.73s/it] {'loss': 0.6796, 'grad_norm': 4.0616647064171465, 'learning_rate': 3.0353995145890868e-06, 'epoch': 0.45} 45%|████▍ | 5521/12313 [4:08:01<5:09:02, 2.73s/it] 45%|████▍ | 5522/12313 [4:08:03<5:10:21, 2.74s/it] {'loss': 0.4574, 'grad_norm': 5.786539246564089, 'learning_rate': 3.0347571316289476e-06, 'epoch': 0.45} 45%|████▍ | 5522/12313 [4:08:03<5:10:21, 2.74s/it] 45%|████▍ | 5523/12313 [4:08:06<5:12:32, 2.76s/it] {'loss': 0.5367, 'grad_norm': 4.2099399908062445, 'learning_rate': 3.0341147116664455e-06, 'epoch': 0.45} 45%|████▍ | 5523/12313 [4:08:06<5:12:32, 2.76s/it] 45%|████▍ | 5524/12313 [4:08:09<5:05:43, 2.70s/it] {'loss': 0.374, 'grad_norm': 4.538423749900379, 'learning_rate': 3.0334722547460317e-06, 'epoch': 0.45} 45%|████▍ | 5524/12313 [4:08:09<5:05:43, 2.70s/it] 45%|████▍ | 5525/12313 [4:08:11<5:05:05, 2.70s/it] {'loss': 0.4963, 'grad_norm': 3.5387099780140137, 'learning_rate': 3.032829760912161e-06, 'epoch': 0.45} 45%|████▍ | 5525/12313 [4:08:11<5:05:05, 2.70s/it] 45%|████▍ | 5526/12313 [4:08:14<5:04:10, 2.69s/it] {'loss': 0.5552, 'grad_norm': 5.798358992841778, 'learning_rate': 3.032187230209291e-06, 'epoch': 0.45} 45%|████▍ | 5526/12313 [4:08:14<5:04:10, 2.69s/it] 45%|████▍ | 5527/12313 [4:08:17<5:10:56, 2.75s/it] {'loss': 0.5549, 'grad_norm': 4.773816159502634, 'learning_rate': 3.0315446626818816e-06, 'epoch': 0.45} 45%|████▍ | 5527/12313 [4:08:17<5:10:56, 2.75s/it] 45%|████▍ | 5528/12313 [4:08:20<5:06:56, 2.71s/it] {'loss': 0.5485, 'grad_norm': 4.150570620974484, 'learning_rate': 3.030902058374394e-06, 'epoch': 0.45} 45%|████▍ | 5528/12313 [4:08:20<5:06:56, 2.71s/it] 45%|████▍ | 5529/12313 [4:08:22<5:08:08, 2.73s/it] {'loss': 0.4232, 'grad_norm': 5.126013922801945, 'learning_rate': 3.0302594173312937e-06, 'epoch': 0.45} 45%|████▍ | 5529/12313 [4:08:22<5:08:08, 2.73s/it] 45%|████▍ | 5530/12313 [4:08:25<5:07:36, 2.72s/it] {'loss': 0.5323, 'grad_norm': 3.268316271468154, 'learning_rate': 3.0296167395970494e-06, 'epoch': 0.45} 45%|████▍ | 5530/12313 [4:08:25<5:07:36, 2.72s/it] 45%|████▍ | 5531/12313 [4:08:28<5:05:01, 2.70s/it] {'loss': 0.6075, 'grad_norm': 5.730548961152452, 'learning_rate': 3.0289740252161288e-06, 'epoch': 0.45} 45%|████▍ | 5531/12313 [4:08:28<5:05:01, 2.70s/it] 45%|████▍ | 5532/12313 [4:08:30<5:03:57, 2.69s/it] {'loss': 0.5684, 'grad_norm': 7.380827782342893, 'learning_rate': 3.0283312742330044e-06, 'epoch': 0.45} 45%|████▍ | 5532/12313 [4:08:30<5:03:57, 2.69s/it] 45%|████▍ | 5533/12313 [4:08:33<5:18:42, 2.82s/it] {'loss': 0.6726, 'grad_norm': 3.0968839249512983, 'learning_rate': 3.027688486692153e-06, 'epoch': 0.45} 45%|████▍ | 5533/12313 [4:08:33<5:18:42, 2.82s/it] 45%|████▍ | 5534/12313 [4:08:36<5:04:08, 2.69s/it] {'loss': 0.5246, 'grad_norm': 3.139400489756557, 'learning_rate': 3.027045662638051e-06, 'epoch': 0.45} 45%|████▍ | 5534/12313 [4:08:36<5:04:08, 2.69s/it] 45%|████▍ | 5535/12313 [4:08:39<5:05:29, 2.70s/it] {'loss': 0.4677, 'grad_norm': 5.977877338701136, 'learning_rate': 3.026402802115178e-06, 'epoch': 0.45} 45%|████▍ | 5535/12313 [4:08:39<5:05:29, 2.70s/it] 45%|████▍ | 5536/12313 [4:08:41<5:10:38, 2.75s/it] {'loss': 0.4622, 'grad_norm': 4.58739972834906, 'learning_rate': 3.0257599051680175e-06, 'epoch': 0.45} 45%|████▍ | 5536/12313 [4:08:41<5:10:38, 2.75s/it] 45%|████▍ | 5537/12313 [4:08:44<5:11:12, 2.76s/it] {'loss': 0.6765, 'grad_norm': 6.184504437794764, 'learning_rate': 3.025116971841054e-06, 'epoch': 0.45} 45%|████▍ | 5537/12313 [4:08:44<5:11:12, 2.76s/it] 45%|████▍ | 5538/12313 [4:08:47<5:05:57, 2.71s/it] {'loss': 0.5273, 'grad_norm': 8.256944628820975, 'learning_rate': 3.0244740021787756e-06, 'epoch': 0.45} 45%|████▍ | 5538/12313 [4:08:47<5:05:57, 2.71s/it] 45%|████▍ | 5539/12313 [4:08:50<5:06:58, 2.72s/it] {'loss': 0.5202, 'grad_norm': 5.127185634205651, 'learning_rate': 3.023830996225671e-06, 'epoch': 0.45} 45%|████▍ | 5539/12313 [4:08:50<5:06:58, 2.72s/it] 45%|████▍ | 5540/12313 [4:08:52<5:13:29, 2.78s/it] {'loss': 0.588, 'grad_norm': 5.783243249616182, 'learning_rate': 3.023187954026234e-06, 'epoch': 0.45} 45%|████▍ | 5540/12313 [4:08:52<5:13:29, 2.78s/it] 45%|████▌ | 5541/12313 [4:08:55<5:18:42, 2.82s/it] {'loss': 0.5103, 'grad_norm': 11.669014789188214, 'learning_rate': 3.0225448756249605e-06, 'epoch': 0.45} 45%|████▌ | 5541/12313 [4:08:55<5:18:42, 2.82s/it] 45%|████▌ | 5542/12313 [4:08:58<5:13:10, 2.78s/it] {'loss': 0.6313, 'grad_norm': 12.31667513372858, 'learning_rate': 3.0219017610663466e-06, 'epoch': 0.45} 45%|████▌ | 5542/12313 [4:08:58<5:13:10, 2.78s/it] 45%|████▌ | 5543/12313 [4:09:01<5:07:51, 2.73s/it] {'loss': 0.5984, 'grad_norm': 4.705587679569699, 'learning_rate': 3.0212586103948933e-06, 'epoch': 0.45} 45%|████▌ | 5543/12313 [4:09:01<5:07:51, 2.73s/it] 45%|████▌ | 5544/12313 [4:09:03<5:02:41, 2.68s/it] {'loss': 0.4608, 'grad_norm': 7.657296950948716, 'learning_rate': 3.020615423655102e-06, 'epoch': 0.45} 45%|████▌ | 5544/12313 [4:09:03<5:02:41, 2.68s/it] 45%|████▌ | 5545/12313 [4:09:06<5:04:32, 2.70s/it] {'loss': 0.6098, 'grad_norm': 4.260385063187866, 'learning_rate': 3.0199722008914787e-06, 'epoch': 0.45} 45%|████▌ | 5545/12313 [4:09:06<5:04:32, 2.70s/it] 45%|████▌ | 5546/12313 [4:09:09<5:00:24, 2.66s/it] {'loss': 0.3907, 'grad_norm': 5.405032331240298, 'learning_rate': 3.0193289421485317e-06, 'epoch': 0.45} 45%|████▌ | 5546/12313 [4:09:09<5:00:24, 2.66s/it] 45%|████▌ | 5547/12313 [4:09:11<4:55:39, 2.62s/it] {'loss': 0.4768, 'grad_norm': 5.755862088555348, 'learning_rate': 3.0186856474707705e-06, 'epoch': 0.45} 45%|████▌ | 5547/12313 [4:09:11<4:55:39, 2.62s/it] 45%|████▌ | 5548/12313 [4:09:14<4:50:20, 2.58s/it] {'loss': 0.4632, 'grad_norm': 8.389991068275073, 'learning_rate': 3.0180423169027067e-06, 'epoch': 0.45} 45%|████▌ | 5548/12313 [4:09:14<4:50:20, 2.58s/it] 45%|████▌ | 5549/12313 [4:09:17<5:08:41, 2.74s/it] {'loss': 0.4662, 'grad_norm': 7.221227795632766, 'learning_rate': 3.0173989504888573e-06, 'epoch': 0.45} 45%|████▌ | 5549/12313 [4:09:17<5:08:41, 2.74s/it] 45%|████▌ | 5550/12313 [4:09:19<5:01:46, 2.68s/it] {'loss': 0.6383, 'grad_norm': 3.6157966531181467, 'learning_rate': 3.0167555482737384e-06, 'epoch': 0.45} 45%|████▌ | 5550/12313 [4:09:19<5:01:46, 2.68s/it] 45%|████▌ | 5551/12313 [4:09:22<4:54:59, 2.62s/it] {'loss': 0.3908, 'grad_norm': 5.5152149011394185, 'learning_rate': 3.01611211030187e-06, 'epoch': 0.45} 45%|████▌ | 5551/12313 [4:09:22<4:54:59, 2.62s/it] 45%|████▌ | 5552/12313 [4:09:24<4:56:28, 2.63s/it] {'loss': 0.4462, 'grad_norm': 4.18620086470904, 'learning_rate': 3.0154686366177753e-06, 'epoch': 0.45} 45%|████▌ | 5552/12313 [4:09:24<4:56:28, 2.63s/it] 45%|████▌ | 5553/12313 [4:09:27<5:00:37, 2.67s/it] {'loss': 0.4242, 'grad_norm': 5.069886109587496, 'learning_rate': 3.0148251272659795e-06, 'epoch': 0.45} 45%|████▌ | 5553/12313 [4:09:27<5:00:37, 2.67s/it] 45%|████▌ | 5554/12313 [4:09:30<5:13:53, 2.79s/it] {'loss': 0.5341, 'grad_norm': 4.107158289610819, 'learning_rate': 3.0141815822910094e-06, 'epoch': 0.45} 45%|████▌ | 5554/12313 [4:09:30<5:13:53, 2.79s/it] 45%|████▌ | 5555/12313 [4:09:33<5:09:07, 2.74s/it] {'loss': 0.561, 'grad_norm': 4.955579663758407, 'learning_rate': 3.013538001737395e-06, 'epoch': 0.45} 45%|████▌ | 5555/12313 [4:09:33<5:09:07, 2.74s/it] 45%|████▌ | 5556/12313 [4:09:35<5:06:14, 2.72s/it] {'loss': 0.6114, 'grad_norm': 4.9599952397922635, 'learning_rate': 3.0128943856496686e-06, 'epoch': 0.45} 45%|████▌ | 5556/12313 [4:09:35<5:06:14, 2.72s/it] 45%|████▌ | 5557/12313 [4:09:38<5:05:26, 2.71s/it] {'loss': 0.6247, 'grad_norm': 8.882975709525924, 'learning_rate': 3.0122507340723656e-06, 'epoch': 0.45} 45%|████▌ | 5557/12313 [4:09:38<5:05:26, 2.71s/it] 45%|████▌ | 5558/12313 [4:09:41<5:05:46, 2.72s/it] {'loss': 0.4825, 'grad_norm': 5.7173513452837135, 'learning_rate': 3.011607047050022e-06, 'epoch': 0.45} 45%|████▌ | 5558/12313 [4:09:41<5:05:46, 2.72s/it] 45%|████▌ | 5559/12313 [4:09:44<5:11:22, 2.77s/it] {'loss': 0.4608, 'grad_norm': 6.378003204165053, 'learning_rate': 3.0109633246271783e-06, 'epoch': 0.45} 45%|████▌ | 5559/12313 [4:09:44<5:11:22, 2.77s/it] 45%|████▌ | 5560/12313 [4:09:47<5:17:02, 2.82s/it] {'loss': 0.4853, 'grad_norm': 8.215078154867541, 'learning_rate': 3.0103195668483787e-06, 'epoch': 0.45} 45%|████▌ | 5560/12313 [4:09:47<5:17:02, 2.82s/it] 45%|████▌ | 5561/12313 [4:09:49<5:11:59, 2.77s/it] {'loss': 0.5338, 'grad_norm': 5.6937027402469305, 'learning_rate': 3.009675773758164e-06, 'epoch': 0.45} 45%|████▌ | 5561/12313 [4:09:49<5:11:59, 2.77s/it] 45%|████▌ | 5562/12313 [4:09:52<5:12:40, 2.78s/it] {'loss': 0.4927, 'grad_norm': 9.181599736624747, 'learning_rate': 3.009031945401084e-06, 'epoch': 0.45} 45%|████▌ | 5562/12313 [4:09:52<5:12:40, 2.78s/it] 45%|████▌ | 5563/12313 [4:09:55<5:09:29, 2.75s/it] {'loss': 0.5629, 'grad_norm': 11.154194461173983, 'learning_rate': 3.008388081821687e-06, 'epoch': 0.45} 45%|████▌ | 5563/12313 [4:09:55<5:09:29, 2.75s/it] 45%|████▌ | 5564/12313 [4:09:57<5:01:26, 2.68s/it] {'loss': 0.6261, 'grad_norm': 8.09877927664861, 'learning_rate': 3.0077441830645256e-06, 'epoch': 0.45} 45%|████▌ | 5564/12313 [4:09:57<5:01:26, 2.68s/it] 45%|████▌ | 5565/12313 [4:10:00<5:16:10, 2.81s/it] {'loss': 0.5261, 'grad_norm': 6.075046325765384, 'learning_rate': 3.0071002491741537e-06, 'epoch': 0.45} 45%|████▌ | 5565/12313 [4:10:00<5:16:10, 2.81s/it] 45%|████▌ | 5566/12313 [4:10:03<5:12:21, 2.78s/it] {'loss': 0.4545, 'grad_norm': 7.918146195582449, 'learning_rate': 3.0064562801951286e-06, 'epoch': 0.45} 45%|████▌ | 5566/12313 [4:10:03<5:12:21, 2.78s/it] 45%|████▌ | 5567/12313 [4:10:06<5:00:12, 2.67s/it] {'loss': 0.5743, 'grad_norm': 4.644487714982916, 'learning_rate': 3.005812276172009e-06, 'epoch': 0.45} 45%|████▌ | 5567/12313 [4:10:06<5:00:12, 2.67s/it] 45%|████▌ | 5568/12313 [4:10:08<4:55:24, 2.63s/it] {'loss': 0.572, 'grad_norm': 3.5759922555457924, 'learning_rate': 3.005168237149357e-06, 'epoch': 0.45} 45%|████▌ | 5568/12313 [4:10:08<4:55:24, 2.63s/it] 45%|████▌ | 5569/12313 [4:10:11<4:52:54, 2.61s/it] {'loss': 0.5287, 'grad_norm': 6.775625726215614, 'learning_rate': 3.0045241631717366e-06, 'epoch': 0.45} 45%|████▌ | 5569/12313 [4:10:11<4:52:54, 2.61s/it] 45%|████▌ | 5570/12313 [4:10:13<4:43:58, 2.53s/it] {'loss': 0.5277, 'grad_norm': 3.7500419034479457, 'learning_rate': 3.0038800542837137e-06, 'epoch': 0.45} 45%|████▌ | 5570/12313 [4:10:13<4:43:58, 2.53s/it] 45%|████▌ | 5571/12313 [4:10:16<4:57:25, 2.65s/it] {'loss': 0.4724, 'grad_norm': 4.670720788070081, 'learning_rate': 3.003235910529859e-06, 'epoch': 0.45} 45%|████▌ | 5571/12313 [4:10:16<4:57:25, 2.65s/it] 45%|████▌ | 5572/12313 [4:10:19<5:01:35, 2.68s/it] {'loss': 0.6593, 'grad_norm': 7.410732606803197, 'learning_rate': 3.0025917319547417e-06, 'epoch': 0.45} 45%|████▌ | 5572/12313 [4:10:19<5:01:35, 2.68s/it] 45%|████▌ | 5573/12313 [4:10:21<4:56:28, 2.64s/it] {'loss': 0.4562, 'grad_norm': 7.469496813543684, 'learning_rate': 3.001947518602937e-06, 'epoch': 0.45} 45%|████▌ | 5573/12313 [4:10:21<4:56:28, 2.64s/it] 45%|████▌ | 5574/12313 [4:10:24<4:51:30, 2.60s/it] {'loss': 0.5644, 'grad_norm': 4.482461581886912, 'learning_rate': 3.0013032705190196e-06, 'epoch': 0.45} 45%|████▌ | 5574/12313 [4:10:24<4:51:30, 2.60s/it] 45%|████▌ | 5575/12313 [4:10:26<4:55:58, 2.64s/it] {'loss': 0.5426, 'grad_norm': 4.5986439128006, 'learning_rate': 3.00065898774757e-06, 'epoch': 0.45} 45%|████▌ | 5575/12313 [4:10:26<4:55:58, 2.64s/it] 45%|████▌ | 5576/12313 [4:10:29<4:48:30, 2.57s/it] {'loss': 0.5278, 'grad_norm': 4.16197803163674, 'learning_rate': 3.000014670333168e-06, 'epoch': 0.45} 45%|████▌ | 5576/12313 [4:10:29<4:48:30, 2.57s/it] 45%|████▌ | 5577/12313 [4:10:32<5:01:55, 2.69s/it] {'loss': 0.5385, 'grad_norm': 3.5560084963087646, 'learning_rate': 2.9993703183203963e-06, 'epoch': 0.45} 45%|████▌ | 5577/12313 [4:10:32<5:01:55, 2.69s/it] 45%|████▌ | 5578/12313 [4:10:34<4:56:02, 2.64s/it] {'loss': 0.449, 'grad_norm': 4.260872328467781, 'learning_rate': 2.998725931753842e-06, 'epoch': 0.45} 45%|████▌ | 5578/12313 [4:10:34<4:56:02, 2.64s/it] 45%|████▌ | 5579/12313 [4:10:37<4:50:17, 2.59s/it] {'loss': 0.605, 'grad_norm': 4.8192798552612635, 'learning_rate': 2.9980815106780937e-06, 'epoch': 0.45} 45%|████▌ | 5579/12313 [4:10:37<4:50:17, 2.59s/it] 45%|████▌ | 5580/12313 [4:10:40<5:08:40, 2.75s/it] {'loss': 0.5863, 'grad_norm': 3.949873091272697, 'learning_rate': 2.9974370551377396e-06, 'epoch': 0.45} 45%|████▌ | 5580/12313 [4:10:40<5:08:40, 2.75s/it] 45%|████▌ | 5581/12313 [4:10:43<5:07:51, 2.74s/it] {'loss': 0.4891, 'grad_norm': 8.728645349811258, 'learning_rate': 2.9967925651773745e-06, 'epoch': 0.45} 45%|████▌ | 5581/12313 [4:10:43<5:07:51, 2.74s/it] 45%|████▌ | 5582/12313 [4:10:45<5:08:08, 2.75s/it] {'loss': 0.6374, 'grad_norm': 3.623472869472515, 'learning_rate': 2.9961480408415926e-06, 'epoch': 0.45} 45%|████▌ | 5582/12313 [4:10:45<5:08:08, 2.75s/it] 45%|████▌ | 5583/12313 [4:10:48<5:01:44, 2.69s/it] {'loss': 0.4461, 'grad_norm': 4.651906826052307, 'learning_rate': 2.995503482174993e-06, 'epoch': 0.45} 45%|████▌ | 5583/12313 [4:10:48<5:01:44, 2.69s/it] 45%|████▌ | 5584/12313 [4:10:51<5:00:33, 2.68s/it] {'loss': 0.4862, 'grad_norm': 4.256628817272692, 'learning_rate': 2.9948588892221744e-06, 'epoch': 0.45} 45%|████▌ | 5584/12313 [4:10:51<5:00:33, 2.68s/it] 45%|████▌ | 5585/12313 [4:10:53<5:04:48, 2.72s/it] {'loss': 0.5934, 'grad_norm': 3.3916680603373797, 'learning_rate': 2.9942142620277394e-06, 'epoch': 0.45} 45%|████▌ | 5585/12313 [4:10:53<5:04:48, 2.72s/it] 45%|████▌ | 5586/12313 [4:10:56<4:59:30, 2.67s/it] {'loss': 0.5009, 'grad_norm': 6.635234044978131, 'learning_rate': 2.993569600636293e-06, 'epoch': 0.45} 45%|████▌ | 5586/12313 [4:10:56<4:59:30, 2.67s/it] 45%|████▌ | 5587/12313 [4:10:59<5:02:00, 2.69s/it] {'loss': 0.4523, 'grad_norm': 5.742703156297073, 'learning_rate': 2.9929249050924424e-06, 'epoch': 0.45} 45%|████▌ | 5587/12313 [4:10:59<5:02:00, 2.69s/it] 45%|████▌ | 5588/12313 [4:11:01<4:57:03, 2.65s/it] {'loss': 0.5456, 'grad_norm': 4.30344798988703, 'learning_rate': 2.992280175440797e-06, 'epoch': 0.45} 45%|████▌ | 5588/12313 [4:11:01<4:57:03, 2.65s/it] 45%|████▌ | 5589/12313 [4:11:04<4:56:20, 2.64s/it] {'loss': 0.4736, 'grad_norm': 4.517980897014619, 'learning_rate': 2.99163541172597e-06, 'epoch': 0.45} 45%|████▌ | 5589/12313 [4:11:04<4:56:20, 2.64s/it] 45%|████▌ | 5590/12313 [4:11:07<4:56:57, 2.65s/it] {'loss': 0.6306, 'grad_norm': 5.224517211850113, 'learning_rate': 2.990990613992573e-06, 'epoch': 0.45} 45%|████▌ | 5590/12313 [4:11:07<4:56:57, 2.65s/it] 45%|████▌ | 5591/12313 [4:11:10<5:04:35, 2.72s/it] {'loss': 0.5577, 'grad_norm': 4.522124261787193, 'learning_rate': 2.990345782285225e-06, 'epoch': 0.45} 45%|████▌ | 5591/12313 [4:11:10<5:04:35, 2.72s/it] 45%|████▌ | 5592/12313 [4:11:12<4:58:23, 2.66s/it] {'loss': 0.4167, 'grad_norm': 4.777313229511017, 'learning_rate': 2.989700916648544e-06, 'epoch': 0.45} 45%|████▌ | 5592/12313 [4:11:12<4:58:23, 2.66s/it] 45%|████▌ | 5593/12313 [4:11:15<4:59:25, 2.67s/it] {'loss': 0.4435, 'grad_norm': 6.094818732275617, 'learning_rate': 2.989056017127151e-06, 'epoch': 0.45} 45%|████▌ | 5593/12313 [4:11:15<4:59:25, 2.67s/it] 45%|████▌ | 5594/12313 [4:11:18<5:04:16, 2.72s/it] {'loss': 0.5327, 'grad_norm': 7.062295268512343, 'learning_rate': 2.988411083765669e-06, 'epoch': 0.45} 45%|████▌ | 5594/12313 [4:11:18<5:04:16, 2.72s/it] 45%|████▌ | 5595/12313 [4:11:20<4:58:08, 2.66s/it] {'loss': 0.7565, 'grad_norm': 5.878383502169189, 'learning_rate': 2.9877661166087265e-06, 'epoch': 0.45} 45%|████▌ | 5595/12313 [4:11:20<4:58:08, 2.66s/it] 45%|████▌ | 5596/12313 [4:11:23<4:55:36, 2.64s/it] {'loss': 0.3842, 'grad_norm': 5.252077496243823, 'learning_rate': 2.9871211157009496e-06, 'epoch': 0.45} 45%|████▌ | 5596/12313 [4:11:23<4:55:36, 2.64s/it] 45%|████▌ | 5597/12313 [4:11:25<4:57:45, 2.66s/it] {'loss': 0.5382, 'grad_norm': 4.8053252445722014, 'learning_rate': 2.986476081086969e-06, 'epoch': 0.45} 45%|████▌ | 5597/12313 [4:11:25<4:57:45, 2.66s/it] 45%|████▌ | 5598/12313 [4:11:28<5:00:35, 2.69s/it] {'loss': 0.5417, 'grad_norm': 5.29723639288913, 'learning_rate': 2.9858310128114187e-06, 'epoch': 0.45} 45%|████▌ | 5598/12313 [4:11:28<5:00:35, 2.69s/it] 45%|████▌ | 5599/12313 [4:11:31<4:53:24, 2.62s/it] {'loss': 0.54, 'grad_norm': 4.021237021928457, 'learning_rate': 2.9851859109189335e-06, 'epoch': 0.45} 45%|████▌ | 5599/12313 [4:11:31<4:53:24, 2.62s/it] 45%|████▌ | 5600/12313 [4:11:33<4:48:03, 2.57s/it] {'loss': 0.4593, 'grad_norm': 26.465640352460827, 'learning_rate': 2.9845407754541513e-06, 'epoch': 0.45} 45%|████▌ | 5600/12313 [4:11:33<4:48:03, 2.57s/it] 45%|████▌ | 5601/12313 [4:11:36<4:53:20, 2.62s/it] {'loss': 0.547, 'grad_norm': 4.300973423387741, 'learning_rate': 2.9838956064617108e-06, 'epoch': 0.45} 45%|████▌ | 5601/12313 [4:11:36<4:53:20, 2.62s/it] 45%|████▌ | 5602/12313 [4:11:38<4:53:33, 2.62s/it] {'loss': 0.5216, 'grad_norm': 4.972732875615428, 'learning_rate': 2.9832504039862564e-06, 'epoch': 0.45} 45%|████▌ | 5602/12313 [4:11:38<4:53:33, 2.62s/it] 46%|████▌ | 5603/12313 [4:11:42<5:09:52, 2.77s/it] {'loss': 0.5508, 'grad_norm': 3.8151254239492074, 'learning_rate': 2.982605168072431e-06, 'epoch': 0.46} 46%|████▌ | 5603/12313 [4:11:42<5:09:52, 2.77s/it] 46%|████▌ | 5604/12313 [4:11:44<5:05:36, 2.73s/it] {'loss': 0.4318, 'grad_norm': 4.251587340007891, 'learning_rate': 2.981959898764882e-06, 'epoch': 0.46} 46%|████▌ | 5604/12313 [4:11:44<5:05:36, 2.73s/it] 46%|████▌ | 5605/12313 [4:11:47<4:58:57, 2.67s/it] {'loss': 0.5097, 'grad_norm': 4.436180317192108, 'learning_rate': 2.9813145961082594e-06, 'epoch': 0.46} 46%|████▌ | 5605/12313 [4:11:47<4:58:57, 2.67s/it] 46%|████▌ | 5606/12313 [4:11:49<4:50:39, 2.60s/it] {'loss': 0.6497, 'grad_norm': 4.5049604397224785, 'learning_rate': 2.9806692601472143e-06, 'epoch': 0.46} 46%|████▌ | 5606/12313 [4:11:49<4:50:39, 2.60s/it] 46%|████▌ | 5607/12313 [4:11:52<4:59:56, 2.68s/it] {'loss': 0.6143, 'grad_norm': 4.817837180058034, 'learning_rate': 2.9800238909263994e-06, 'epoch': 0.46} 46%|████▌ | 5607/12313 [4:11:52<4:59:56, 2.68s/it] 46%|████▌ | 5608/12313 [4:11:55<5:10:46, 2.78s/it] {'loss': 0.5063, 'grad_norm': 3.4243930089969785, 'learning_rate': 2.9793784884904733e-06, 'epoch': 0.46} 46%|████▌ | 5608/12313 [4:11:55<5:10:46, 2.78s/it] 46%|████▌ | 5609/12313 [4:11:58<5:02:55, 2.71s/it] {'loss': 0.4992, 'grad_norm': 5.925953178663676, 'learning_rate': 2.9787330528840915e-06, 'epoch': 0.46} 46%|████▌ | 5609/12313 [4:11:58<5:02:55, 2.71s/it] 46%|████▌ | 5610/12313 [4:12:00<4:56:16, 2.65s/it] {'loss': 0.5372, 'grad_norm': 5.497205043778295, 'learning_rate': 2.978087584151915e-06, 'epoch': 0.46} 46%|████▌ | 5610/12313 [4:12:00<4:56:16, 2.65s/it] 46%|████▌ | 5611/12313 [4:12:03<4:52:52, 2.62s/it] {'loss': 0.5619, 'grad_norm': 8.33939404348421, 'learning_rate': 2.9774420823386104e-06, 'epoch': 0.46} 46%|████▌ | 5611/12313 [4:12:03<4:52:52, 2.62s/it] 46%|████▌ | 5612/12313 [4:12:05<4:53:46, 2.63s/it] {'loss': 0.5683, 'grad_norm': 6.195975570900406, 'learning_rate': 2.9767965474888395e-06, 'epoch': 0.46} 46%|████▌ | 5612/12313 [4:12:05<4:53:46, 2.63s/it] 46%|████▌ | 5613/12313 [4:12:08<4:50:23, 2.60s/it] {'loss': 0.3429, 'grad_norm': 14.703831195583096, 'learning_rate': 2.9761509796472697e-06, 'epoch': 0.46} 46%|████▌ | 5613/12313 [4:12:08<4:50:23, 2.60s/it] 46%|████▌ | 5614/12313 [4:12:11<4:58:28, 2.67s/it] {'loss': 0.6409, 'grad_norm': 4.631555776984385, 'learning_rate': 2.975505378858574e-06, 'epoch': 0.46} 46%|████▌ | 5614/12313 [4:12:11<4:58:28, 2.67s/it] 46%|████▌ | 5615/12313 [4:12:13<4:53:20, 2.63s/it] {'loss': 0.769, 'grad_norm': 3.4809174977075803, 'learning_rate': 2.974859745167422e-06, 'epoch': 0.46} 46%|████▌ | 5615/12313 [4:12:13<4:53:20, 2.63s/it] 46%|████▌ | 5616/12313 [4:12:16<4:50:53, 2.61s/it] {'loss': 0.5612, 'grad_norm': 5.098517264596913, 'learning_rate': 2.9742140786184885e-06, 'epoch': 0.46} 46%|████▌ | 5616/12313 [4:12:16<4:50:53, 2.61s/it] 46%|████▌ | 5617/12313 [4:12:18<4:50:27, 2.60s/it] {'loss': 0.5576, 'grad_norm': 6.00392072759688, 'learning_rate': 2.9735683792564506e-06, 'epoch': 0.46} 46%|████▌ | 5617/12313 [4:12:18<4:50:27, 2.60s/it] 46%|████▌ | 5618/12313 [4:12:21<4:54:44, 2.64s/it] {'loss': 0.6709, 'grad_norm': 7.263834884088008, 'learning_rate': 2.9729226471259877e-06, 'epoch': 0.46} 46%|████▌ | 5618/12313 [4:12:21<4:54:44, 2.64s/it] 46%|████▌ | 5619/12313 [4:12:24<4:57:58, 2.67s/it] {'loss': 0.6453, 'grad_norm': 3.554328673137164, 'learning_rate': 2.9722768822717795e-06, 'epoch': 0.46} 46%|████▌ | 5619/12313 [4:12:24<4:57:58, 2.67s/it] 46%|████▌ | 5620/12313 [4:12:27<5:00:18, 2.69s/it] {'loss': 0.4381, 'grad_norm': 5.778288795426547, 'learning_rate': 2.971631084738511e-06, 'epoch': 0.46} 46%|████▌ | 5620/12313 [4:12:27<5:00:18, 2.69s/it] 46%|████▌ | 5621/12313 [4:12:29<5:00:26, 2.69s/it] {'loss': 0.5124, 'grad_norm': 3.465143813973219, 'learning_rate': 2.9709852545708677e-06, 'epoch': 0.46} 46%|████▌ | 5621/12313 [4:12:29<5:00:26, 2.69s/it] 46%|████▌ | 5622/12313 [4:12:32<5:10:37, 2.79s/it] {'loss': 0.4459, 'grad_norm': 4.670081929190185, 'learning_rate': 2.9703393918135383e-06, 'epoch': 0.46} 46%|████▌ | 5622/12313 [4:12:32<5:10:37, 2.79s/it] 46%|████▌ | 5623/12313 [4:12:35<5:03:27, 2.72s/it] {'loss': 0.4847, 'grad_norm': 4.713194770845547, 'learning_rate': 2.96969349651121e-06, 'epoch': 0.46} 46%|████▌ | 5623/12313 [4:12:35<5:03:27, 2.72s/it] 46%|████▌ | 5624/12313 [4:12:38<5:11:28, 2.79s/it] {'loss': 0.4093, 'grad_norm': 3.630426739892531, 'learning_rate': 2.9690475687085795e-06, 'epoch': 0.46} 46%|████▌ | 5624/12313 [4:12:38<5:11:28, 2.79s/it] 46%|████▌ | 5625/12313 [4:12:40<5:05:05, 2.74s/it] {'loss': 0.6108, 'grad_norm': 5.486371575732508, 'learning_rate': 2.968401608450339e-06, 'epoch': 0.46} 46%|████▌ | 5625/12313 [4:12:40<5:05:05, 2.74s/it] 46%|████▌ | 5626/12313 [4:12:43<5:04:06, 2.73s/it] {'loss': 0.45, 'grad_norm': 8.575804166016505, 'learning_rate': 2.967755615781186e-06, 'epoch': 0.46} 46%|████▌ | 5626/12313 [4:12:43<5:04:06, 2.73s/it] 46%|████▌ | 5627/12313 [4:12:46<4:55:09, 2.65s/it] {'loss': 0.5099, 'grad_norm': 8.699853217920039, 'learning_rate': 2.9671095907458203e-06, 'epoch': 0.46} 46%|████▌ | 5627/12313 [4:12:46<4:55:09, 2.65s/it] 46%|████▌ | 5628/12313 [4:12:49<5:16:41, 2.84s/it] {'loss': 0.5301, 'grad_norm': 3.801057038330998, 'learning_rate': 2.966463533388943e-06, 'epoch': 0.46} 46%|████▌ | 5628/12313 [4:12:49<5:16:41, 2.84s/it] 46%|████▌ | 5629/12313 [4:12:51<5:07:01, 2.76s/it] {'loss': 0.5236, 'grad_norm': 6.044511448542259, 'learning_rate': 2.9658174437552577e-06, 'epoch': 0.46} 46%|████▌ | 5629/12313 [4:12:51<5:07:01, 2.76s/it] 46%|████▌ | 5630/12313 [4:12:54<5:06:57, 2.76s/it] {'loss': 0.6417, 'grad_norm': 5.875396782939016, 'learning_rate': 2.9651713218894706e-06, 'epoch': 0.46} 46%|████▌ | 5630/12313 [4:12:54<5:06:57, 2.76s/it] 46%|████▌ | 5631/12313 [4:12:57<5:02:25, 2.72s/it] {'loss': 0.5186, 'grad_norm': 4.000366715370684, 'learning_rate': 2.96452516783629e-06, 'epoch': 0.46} 46%|████▌ | 5631/12313 [4:12:57<5:02:25, 2.72s/it] 46%|████▌ | 5632/12313 [4:12:59<5:00:57, 2.70s/it] {'loss': 0.5162, 'grad_norm': 4.69919331836541, 'learning_rate': 2.9638789816404264e-06, 'epoch': 0.46} 46%|████▌ | 5632/12313 [4:12:59<5:00:57, 2.70s/it] 46%|████▌ | 5633/12313 [4:13:02<5:00:02, 2.69s/it] {'loss': 0.4416, 'grad_norm': 7.3603896088494185, 'learning_rate': 2.9632327633465917e-06, 'epoch': 0.46} 46%|████▌ | 5633/12313 [4:13:02<5:00:02, 2.69s/it] 46%|████▌ | 5634/12313 [4:13:05<4:58:22, 2.68s/it] {'loss': 0.5364, 'grad_norm': 7.25776019771969, 'learning_rate': 2.9625865129995023e-06, 'epoch': 0.46} 46%|████▌ | 5634/12313 [4:13:05<4:58:22, 2.68s/it] 46%|████▌ | 5635/12313 [4:13:08<5:01:41, 2.71s/it] {'loss': 0.412, 'grad_norm': 5.174876153569623, 'learning_rate': 2.9619402306438738e-06, 'epoch': 0.46} 46%|████▌ | 5635/12313 [4:13:08<5:01:41, 2.71s/it] 46%|████▌ | 5636/12313 [4:13:10<5:01:02, 2.71s/it] {'loss': 0.4053, 'grad_norm': 5.4010502742692355, 'learning_rate': 2.9612939163244266e-06, 'epoch': 0.46} 46%|████▌ | 5636/12313 [4:13:10<5:01:02, 2.71s/it] 46%|████▌ | 5637/12313 [4:13:13<5:00:27, 2.70s/it] {'loss': 0.4996, 'grad_norm': 3.835214497106897, 'learning_rate': 2.960647570085881e-06, 'epoch': 0.46} 46%|████▌ | 5637/12313 [4:13:13<5:00:27, 2.70s/it] 46%|████▌ | 5638/12313 [4:13:16<4:58:41, 2.68s/it] {'loss': 0.7144, 'grad_norm': 6.055794052072197, 'learning_rate': 2.960001191972963e-06, 'epoch': 0.46} 46%|████▌ | 5638/12313 [4:13:16<4:58:41, 2.68s/it] 46%|████▌ | 5639/12313 [4:13:18<5:00:39, 2.70s/it] {'loss': 0.6862, 'grad_norm': 4.158855522188884, 'learning_rate': 2.9593547820303954e-06, 'epoch': 0.46} 46%|████▌ | 5639/12313 [4:13:18<5:00:39, 2.70s/it] 46%|████▌ | 5640/12313 [4:13:21<4:56:55, 2.67s/it] {'loss': 0.5555, 'grad_norm': 3.708348870716642, 'learning_rate': 2.958708340302908e-06, 'epoch': 0.46} 46%|████▌ | 5640/12313 [4:13:21<4:56:55, 2.67s/it] 46%|████▌ | 5641/12313 [4:13:24<4:58:36, 2.69s/it] {'loss': 0.4188, 'grad_norm': 5.983420776955866, 'learning_rate': 2.958061866835232e-06, 'epoch': 0.46} 46%|████▌ | 5641/12313 [4:13:24<4:58:36, 2.69s/it] 46%|████▌ | 5642/12313 [4:13:26<4:51:07, 2.62s/it] {'loss': 0.5177, 'grad_norm': 8.29550387074976, 'learning_rate': 2.9574153616720986e-06, 'epoch': 0.46} 46%|████▌ | 5642/12313 [4:13:26<4:51:07, 2.62s/it] 46%|████▌ | 5643/12313 [4:13:29<4:47:30, 2.59s/it] {'loss': 0.5237, 'grad_norm': 5.444764698582431, 'learning_rate': 2.9567688248582436e-06, 'epoch': 0.46} 46%|████▌ | 5643/12313 [4:13:29<4:47:30, 2.59s/it] 46%|████▌ | 5644/12313 [4:13:31<4:47:19, 2.58s/it] {'loss': 0.4966, 'grad_norm': 10.29419590630684, 'learning_rate': 2.956122256438403e-06, 'epoch': 0.46} 46%|████▌ | 5644/12313 [4:13:31<4:47:19, 2.58s/it] 46%|████▌ | 5645/12313 [4:13:34<4:51:36, 2.62s/it] {'loss': 0.5255, 'grad_norm': 6.484747843668005, 'learning_rate': 2.955475656457316e-06, 'epoch': 0.46} 46%|████▌ | 5645/12313 [4:13:34<4:51:36, 2.62s/it] 46%|████▌ | 5646/12313 [4:13:37<4:55:17, 2.66s/it] {'loss': 0.5482, 'grad_norm': 5.119025976889396, 'learning_rate': 2.9548290249597246e-06, 'epoch': 0.46} 46%|████▌ | 5646/12313 [4:13:37<4:55:17, 2.66s/it] 46%|████▌ | 5647/12313 [4:13:39<4:54:57, 2.65s/it] {'loss': 0.6823, 'grad_norm': 6.402280776009877, 'learning_rate': 2.9541823619903716e-06, 'epoch': 0.46} 46%|████▌ | 5647/12313 [4:13:39<4:54:57, 2.65s/it] 46%|████▌ | 5648/12313 [4:13:42<4:46:13, 2.58s/it] {'loss': 0.4238, 'grad_norm': 7.6633301689603375, 'learning_rate': 2.9535356675940023e-06, 'epoch': 0.46} 46%|████▌ | 5648/12313 [4:13:42<4:46:13, 2.58s/it] 46%|████▌ | 5649/12313 [4:13:44<4:45:26, 2.57s/it] {'loss': 0.6524, 'grad_norm': 4.171651081928115, 'learning_rate': 2.952888941815366e-06, 'epoch': 0.46} 46%|████▌ | 5649/12313 [4:13:44<4:45:26, 2.57s/it] 46%|████▌ | 5650/12313 [4:13:47<4:50:24, 2.62s/it] {'loss': 0.5472, 'grad_norm': 5.133593022664716, 'learning_rate': 2.952242184699211e-06, 'epoch': 0.46} 46%|████▌ | 5650/12313 [4:13:47<4:50:24, 2.62s/it] 46%|████▌ | 5651/12313 [4:13:50<4:52:02, 2.63s/it] {'loss': 0.5447, 'grad_norm': 5.9594049521898205, 'learning_rate': 2.9515953962902914e-06, 'epoch': 0.46} 46%|████▌ | 5651/12313 [4:13:50<4:52:02, 2.63s/it] 46%|████▌ | 5652/12313 [4:13:53<4:58:31, 2.69s/it] {'loss': 0.5729, 'grad_norm': 6.3931523146679785, 'learning_rate': 2.950948576633359e-06, 'epoch': 0.46} 46%|████▌ | 5652/12313 [4:13:53<4:58:31, 2.69s/it] 46%|████▌ | 5653/12313 [4:13:55<4:59:57, 2.70s/it] {'loss': 0.4821, 'grad_norm': 5.8713619633879, 'learning_rate': 2.9503017257731727e-06, 'epoch': 0.46} 46%|████▌ | 5653/12313 [4:13:55<4:59:57, 2.70s/it] 46%|████▌ | 5654/12313 [4:13:58<5:09:42, 2.79s/it] {'loss': 0.4733, 'grad_norm': 3.2931039167346743, 'learning_rate': 2.9496548437544905e-06, 'epoch': 0.46} 46%|████▌ | 5654/12313 [4:13:58<5:09:42, 2.79s/it] 46%|████▌ | 5655/12313 [4:14:01<5:04:32, 2.74s/it] {'loss': 0.3829, 'grad_norm': 5.1941445369327335, 'learning_rate': 2.9490079306220714e-06, 'epoch': 0.46} 46%|████▌ | 5655/12313 [4:14:01<5:04:32, 2.74s/it] 46%|████▌ | 5656/12313 [4:14:04<5:09:32, 2.79s/it] {'loss': 0.5581, 'grad_norm': 4.360689715599596, 'learning_rate': 2.9483609864206808e-06, 'epoch': 0.46} 46%|████▌ | 5656/12313 [4:14:04<5:09:32, 2.79s/it] 46%|████▌ | 5657/12313 [4:14:07<5:13:32, 2.83s/it] {'loss': 0.4832, 'grad_norm': 7.1909194878489195, 'learning_rate': 2.9477140111950834e-06, 'epoch': 0.46} 46%|████▌ | 5657/12313 [4:14:07<5:13:32, 2.83s/it] 46%|████▌ | 5658/12313 [4:14:09<5:05:18, 2.75s/it] {'loss': 0.458, 'grad_norm': 5.716565178813941, 'learning_rate': 2.947067004990045e-06, 'epoch': 0.46} 46%|████▌ | 5658/12313 [4:14:09<5:05:18, 2.75s/it] 46%|████▌ | 5659/12313 [4:14:12<5:02:30, 2.73s/it] {'loss': 0.5096, 'grad_norm': 5.12640741752529, 'learning_rate': 2.9464199678503364e-06, 'epoch': 0.46} 46%|████▌ | 5659/12313 [4:14:12<5:02:30, 2.73s/it] 46%|████▌ | 5660/12313 [4:14:15<4:59:33, 2.70s/it] {'loss': 0.46, 'grad_norm': 4.73475880030873, 'learning_rate': 2.9457728998207286e-06, 'epoch': 0.46} 46%|████▌ | 5660/12313 [4:14:15<4:59:33, 2.70s/it] 46%|████▌ | 5661/12313 [4:14:17<5:02:50, 2.73s/it] {'loss': 0.4947, 'grad_norm': 5.497100257560182, 'learning_rate': 2.9451258009459947e-06, 'epoch': 0.46} 46%|████▌ | 5661/12313 [4:14:17<5:02:50, 2.73s/it] 46%|████▌ | 5662/12313 [4:14:20<4:59:37, 2.70s/it] {'loss': 0.4633, 'grad_norm': 3.188631639464191, 'learning_rate': 2.9444786712709122e-06, 'epoch': 0.46} 46%|████▌ | 5662/12313 [4:14:20<4:59:37, 2.70s/it] 46%|████▌ | 5663/12313 [4:14:23<4:54:11, 2.65s/it] {'loss': 0.5155, 'grad_norm': 6.625940309443984, 'learning_rate': 2.943831510840257e-06, 'epoch': 0.46} 46%|████▌ | 5663/12313 [4:14:23<4:54:11, 2.65s/it] 46%|████▌ | 5664/12313 [4:14:25<4:56:25, 2.67s/it] {'loss': 0.4657, 'grad_norm': 6.021653211612759, 'learning_rate': 2.9431843196988107e-06, 'epoch': 0.46} 46%|████▌ | 5664/12313 [4:14:25<4:56:25, 2.67s/it] 46%|████▌ | 5665/12313 [4:14:29<5:18:38, 2.88s/it] {'loss': 0.5745, 'grad_norm': 4.027902151297978, 'learning_rate': 2.942537097891355e-06, 'epoch': 0.46} 46%|████▌ | 5665/12313 [4:14:29<5:18:38, 2.88s/it] 46%|████▌ | 5666/12313 [4:14:31<5:09:58, 2.80s/it] {'loss': 0.7196, 'grad_norm': 4.556130080376428, 'learning_rate': 2.9418898454626744e-06, 'epoch': 0.46} 46%|████▌ | 5666/12313 [4:14:31<5:09:58, 2.80s/it] 46%|████▌ | 5667/12313 [4:14:34<5:04:21, 2.75s/it] {'loss': 0.5629, 'grad_norm': 7.887493386006087, 'learning_rate': 2.9412425624575553e-06, 'epoch': 0.46} 46%|████▌ | 5667/12313 [4:14:34<5:04:21, 2.75s/it] 46%|████▌ | 5668/12313 [4:14:36<4:53:09, 2.65s/it] {'loss': 0.4118, 'grad_norm': 11.085374599142877, 'learning_rate': 2.9405952489207858e-06, 'epoch': 0.46} 46%|████▌ | 5668/12313 [4:14:36<4:53:09, 2.65s/it] 46%|████▌ | 5669/12313 [4:14:39<4:53:25, 2.65s/it] {'loss': 0.4832, 'grad_norm': 5.492625782917754, 'learning_rate': 2.9399479048971567e-06, 'epoch': 0.46} 46%|████▌ | 5669/12313 [4:14:39<4:53:25, 2.65s/it] 46%|████▌ | 5670/12313 [4:14:42<5:01:17, 2.72s/it] {'loss': 0.5736, 'grad_norm': 3.5056589438634216, 'learning_rate': 2.939300530431462e-06, 'epoch': 0.46} 46%|████▌ | 5670/12313 [4:14:42<5:01:17, 2.72s/it] 46%|████▌ | 5671/12313 [4:14:45<5:02:07, 2.73s/it] {'loss': 0.7238, 'grad_norm': 4.465558544811287, 'learning_rate': 2.9386531255684942e-06, 'epoch': 0.46} 46%|████▌ | 5671/12313 [4:14:45<5:02:07, 2.73s/it] 46%|████▌ | 5672/12313 [4:14:47<5:05:50, 2.76s/it] {'loss': 0.5249, 'grad_norm': 48.57316377462239, 'learning_rate': 2.938005690353052e-06, 'epoch': 0.46} 46%|████▌ | 5672/12313 [4:14:47<5:05:50, 2.76s/it] 46%|████▌ | 5673/12313 [4:14:50<5:06:27, 2.77s/it] {'loss': 0.5339, 'grad_norm': 5.242926405538982, 'learning_rate': 2.937358224829935e-06, 'epoch': 0.46} 46%|████▌ | 5673/12313 [4:14:50<5:06:27, 2.77s/it] 46%|████▌ | 5674/12313 [4:14:53<5:04:58, 2.76s/it] {'loss': 0.6889, 'grad_norm': 5.8937374649076695, 'learning_rate': 2.936710729043943e-06, 'epoch': 0.46} 46%|████▌ | 5674/12313 [4:14:53<5:04:58, 2.76s/it] 46%|████▌ | 5675/12313 [4:14:55<4:56:59, 2.68s/it] {'loss': 0.5676, 'grad_norm': 5.462741459326538, 'learning_rate': 2.936063203039879e-06, 'epoch': 0.46} 46%|████▌ | 5675/12313 [4:14:55<4:56:59, 2.68s/it] 46%|████▌ | 5676/12313 [4:14:58<4:54:09, 2.66s/it] {'loss': 0.526, 'grad_norm': 6.268297125768787, 'learning_rate': 2.93541564686255e-06, 'epoch': 0.46} 46%|████▌ | 5676/12313 [4:14:58<4:54:09, 2.66s/it] 46%|████▌ | 5677/12313 [4:15:00<4:46:48, 2.59s/it] {'loss': 0.4487, 'grad_norm': 3.614454345003604, 'learning_rate': 2.9347680605567624e-06, 'epoch': 0.46} 46%|████▌ | 5677/12313 [4:15:00<4:46:48, 2.59s/it] 46%|████▌ | 5678/12313 [4:15:03<4:45:39, 2.58s/it] {'loss': 0.4531, 'grad_norm': 8.297993020626132, 'learning_rate': 2.9341204441673267e-06, 'epoch': 0.46} 46%|████▌ | 5678/12313 [4:15:03<4:45:39, 2.58s/it] 46%|████▌ | 5679/12313 [4:15:06<4:50:35, 2.63s/it] {'loss': 0.6027, 'grad_norm': 4.364678727568973, 'learning_rate': 2.9334727977390526e-06, 'epoch': 0.46} 46%|████▌ | 5679/12313 [4:15:06<4:50:35, 2.63s/it] 46%|████▌ | 5680/12313 [4:15:09<4:57:31, 2.69s/it] {'loss': 0.5818, 'grad_norm': 8.596844992148577, 'learning_rate': 2.9328251213167557e-06, 'epoch': 0.46} 46%|████▌ | 5680/12313 [4:15:09<4:57:31, 2.69s/it] 46%|████▌ | 5681/12313 [4:15:12<5:06:07, 2.77s/it] {'loss': 0.4268, 'grad_norm': 4.6174846839314005, 'learning_rate': 2.9321774149452507e-06, 'epoch': 0.46} 46%|████▌ | 5681/12313 [4:15:12<5:06:07, 2.77s/it] 46%|████▌ | 5682/12313 [4:15:14<5:03:39, 2.75s/it] {'loss': 0.5332, 'grad_norm': 10.308065553433666, 'learning_rate': 2.9315296786693564e-06, 'epoch': 0.46} 46%|████▌ | 5682/12313 [4:15:14<5:03:39, 2.75s/it] 46%|████▌ | 5683/12313 [4:15:17<5:00:56, 2.72s/it] {'loss': 0.6391, 'grad_norm': 7.029864284315326, 'learning_rate': 2.9308819125338923e-06, 'epoch': 0.46} 46%|████▌ | 5683/12313 [4:15:17<5:00:56, 2.72s/it] 46%|████▌ | 5684/12313 [4:15:19<4:50:38, 2.63s/it] {'loss': 0.7152, 'grad_norm': 6.141294742654424, 'learning_rate': 2.9302341165836794e-06, 'epoch': 0.46} 46%|████▌ | 5684/12313 [4:15:19<4:50:38, 2.63s/it] 46%|████▌ | 5685/12313 [4:15:22<4:58:29, 2.70s/it] {'loss': 0.5101, 'grad_norm': 5.578226095205441, 'learning_rate': 2.9295862908635436e-06, 'epoch': 0.46} 46%|████▌ | 5685/12313 [4:15:22<4:58:29, 2.70s/it] 46%|████▌ | 5686/12313 [4:15:25<4:55:29, 2.68s/it] {'loss': 0.5236, 'grad_norm': 5.083560208470007, 'learning_rate': 2.92893843541831e-06, 'epoch': 0.46} 46%|████▌ | 5686/12313 [4:15:25<4:55:29, 2.68s/it] 46%|████▌ | 5687/12313 [4:15:28<5:00:39, 2.72s/it] {'loss': 0.6088, 'grad_norm': 3.7370396960027654, 'learning_rate': 2.928290550292806e-06, 'epoch': 0.46} 46%|████▌ | 5687/12313 [4:15:28<5:00:39, 2.72s/it] 46%|████▌ | 5688/12313 [4:15:30<4:57:17, 2.69s/it] {'loss': 0.4895, 'grad_norm': 8.921482639240043, 'learning_rate': 2.9276426355318625e-06, 'epoch': 0.46} 46%|████▌ | 5688/12313 [4:15:30<4:57:17, 2.69s/it] 46%|████▌ | 5689/12313 [4:15:33<4:59:22, 2.71s/it] {'loss': 0.5798, 'grad_norm': 3.2133566714283055, 'learning_rate': 2.9269946911803134e-06, 'epoch': 0.46} 46%|████▌ | 5689/12313 [4:15:33<4:59:22, 2.71s/it] 46%|████▌ | 5690/12313 [4:15:36<5:16:51, 2.87s/it] {'loss': 0.4417, 'grad_norm': 6.204381262030037, 'learning_rate': 2.92634671728299e-06, 'epoch': 0.46} 46%|████▌ | 5690/12313 [4:15:36<5:16:51, 2.87s/it] 46%|████▌ | 5691/12313 [4:15:39<5:06:58, 2.78s/it] {'loss': 0.5512, 'grad_norm': 5.691308583371654, 'learning_rate': 2.9256987138847302e-06, 'epoch': 0.46} 46%|████▌ | 5691/12313 [4:15:39<5:06:58, 2.78s/it] 46%|████▌ | 5692/12313 [4:15:41<4:53:35, 2.66s/it] {'loss': 0.5221, 'grad_norm': 4.319078154150966, 'learning_rate': 2.925050681030373e-06, 'epoch': 0.46} 46%|████▌ | 5692/12313 [4:15:41<4:53:35, 2.66s/it] 46%|████▌ | 5693/12313 [4:15:44<4:53:45, 2.66s/it] {'loss': 0.647, 'grad_norm': 5.04058444742641, 'learning_rate': 2.9244026187647584e-06, 'epoch': 0.46} 46%|████▌ | 5693/12313 [4:15:44<4:53:45, 2.66s/it] 46%|████▌ | 5694/12313 [4:15:47<4:54:30, 2.67s/it] {'loss': 0.4942, 'grad_norm': 5.620845349439251, 'learning_rate': 2.923754527132728e-06, 'epoch': 0.46} 46%|████▌ | 5694/12313 [4:15:47<4:54:30, 2.67s/it] 46%|████▋ | 5695/12313 [4:15:49<4:45:37, 2.59s/it] {'loss': 0.3824, 'grad_norm': 5.022513745576825, 'learning_rate': 2.9231064061791277e-06, 'epoch': 0.46} 46%|████▋ | 5695/12313 [4:15:49<4:45:37, 2.59s/it] 46%|████▋ | 5696/12313 [4:15:52<4:46:19, 2.60s/it] {'loss': 0.4568, 'grad_norm': 4.5284930571538835, 'learning_rate': 2.922458255948803e-06, 'epoch': 0.46} 46%|████▋ | 5696/12313 [4:15:52<4:46:19, 2.60s/it] 46%|████▋ | 5697/12313 [4:15:54<4:41:35, 2.55s/it] {'loss': 0.6157, 'grad_norm': 4.600857423481818, 'learning_rate': 2.9218100764866025e-06, 'epoch': 0.46} 46%|████▋ | 5697/12313 [4:15:54<4:41:35, 2.55s/it] 46%|████▋ | 5698/12313 [4:15:57<4:54:01, 2.67s/it] {'loss': 0.4392, 'grad_norm': 3.265626698109701, 'learning_rate': 2.9211618678373775e-06, 'epoch': 0.46} 46%|████▋ | 5698/12313 [4:15:57<4:54:01, 2.67s/it] 46%|████▋ | 5699/12313 [4:16:00<4:52:45, 2.66s/it] {'loss': 0.463, 'grad_norm': 4.624889936593594, 'learning_rate': 2.9205136300459803e-06, 'epoch': 0.46} 46%|████▋ | 5699/12313 [4:16:00<4:52:45, 2.66s/it] 46%|████▋ | 5700/12313 [4:16:02<4:53:24, 2.66s/it] {'loss': 0.6943, 'grad_norm': 5.8554370873391655, 'learning_rate': 2.919865363157265e-06, 'epoch': 0.46} 46%|████▋ | 5700/12313 [4:16:02<4:53:24, 2.66s/it] 46%|████▋ | 5701/12313 [4:16:05<4:55:12, 2.68s/it] {'loss': 0.5849, 'grad_norm': 10.564403475889055, 'learning_rate': 2.9192170672160892e-06, 'epoch': 0.46} 46%|████▋ | 5701/12313 [4:16:05<4:55:12, 2.68s/it] 46%|████▋ | 5702/12313 [4:16:08<4:51:53, 2.65s/it] {'loss': 0.5091, 'grad_norm': 4.570004688540801, 'learning_rate': 2.9185687422673103e-06, 'epoch': 0.46} 46%|████▋ | 5702/12313 [4:16:08<4:51:53, 2.65s/it] 46%|████▋ | 5703/12313 [4:16:10<4:57:20, 2.70s/it] {'loss': 0.4578, 'grad_norm': 7.370321832435026, 'learning_rate': 2.917920388355791e-06, 'epoch': 0.46} 46%|████▋ | 5703/12313 [4:16:10<4:57:20, 2.70s/it] 46%|████▋ | 5704/12313 [4:16:13<4:57:42, 2.70s/it] {'loss': 0.3637, 'grad_norm': 4.488088245228176, 'learning_rate': 2.9172720055263916e-06, 'epoch': 0.46} 46%|████▋ | 5704/12313 [4:16:13<4:57:42, 2.70s/it] 46%|████▋ | 5705/12313 [4:16:16<4:58:03, 2.71s/it] {'loss': 0.4415, 'grad_norm': 3.3278944590303086, 'learning_rate': 2.9166235938239785e-06, 'epoch': 0.46} 46%|████▋ | 5705/12313 [4:16:16<4:58:03, 2.71s/it] 46%|████▋ | 5706/12313 [4:16:18<4:55:44, 2.69s/it] {'loss': 0.4605, 'grad_norm': 25.965744402127452, 'learning_rate': 2.9159751532934165e-06, 'epoch': 0.46} 46%|████▋ | 5706/12313 [4:16:18<4:55:44, 2.69s/it] 46%|████▋ | 5707/12313 [4:16:21<4:48:45, 2.62s/it] {'loss': 0.5707, 'grad_norm': 5.292291969576549, 'learning_rate': 2.9153266839795756e-06, 'epoch': 0.46} 46%|████▋ | 5707/12313 [4:16:21<4:48:45, 2.62s/it] 46%|████▋ | 5708/12313 [4:16:23<4:42:26, 2.57s/it] {'loss': 0.4891, 'grad_norm': 7.540226421886441, 'learning_rate': 2.9146781859273276e-06, 'epoch': 0.46} 46%|████▋ | 5708/12313 [4:16:23<4:42:26, 2.57s/it] 46%|████▋ | 5709/12313 [4:16:26<4:46:22, 2.60s/it] {'loss': 0.5333, 'grad_norm': 3.896646180404308, 'learning_rate': 2.9140296591815425e-06, 'epoch': 0.46} 46%|████▋ | 5709/12313 [4:16:26<4:46:22, 2.60s/it] 46%|████▋ | 5710/12313 [4:16:29<4:49:37, 2.63s/it] {'loss': 0.391, 'grad_norm': 5.700465799524542, 'learning_rate': 2.913381103787097e-06, 'epoch': 0.46} 46%|████▋ | 5710/12313 [4:16:29<4:49:37, 2.63s/it] 46%|████▋ | 5711/12313 [4:16:31<4:52:40, 2.66s/it] {'loss': 0.404, 'grad_norm': 4.798705492265267, 'learning_rate': 2.9127325197888663e-06, 'epoch': 0.46} 46%|████▋ | 5711/12313 [4:16:31<4:52:40, 2.66s/it] 46%|████▋ | 5712/12313 [4:16:34<4:51:52, 2.65s/it] {'loss': 0.7544, 'grad_norm': 3.5471558345306877, 'learning_rate': 2.91208390723173e-06, 'epoch': 0.46} 46%|████▋ | 5712/12313 [4:16:34<4:51:52, 2.65s/it] 46%|████▋ | 5713/12313 [4:16:37<4:56:14, 2.69s/it] {'loss': 0.4801, 'grad_norm': 6.891850689361664, 'learning_rate': 2.911435266160568e-06, 'epoch': 0.46} 46%|████▋ | 5713/12313 [4:16:37<4:56:14, 2.69s/it] 46%|████▋ | 5714/12313 [4:16:40<5:05:36, 2.78s/it] {'loss': 0.4848, 'grad_norm': 4.753158541053688, 'learning_rate': 2.910786596620263e-06, 'epoch': 0.46} 46%|████▋ | 5714/12313 [4:16:40<5:05:36, 2.78s/it] 46%|████▋ | 5715/12313 [4:16:43<5:05:20, 2.78s/it] {'loss': 0.4833, 'grad_norm': 5.463566847148066, 'learning_rate': 2.9101378986556996e-06, 'epoch': 0.46} 46%|████▋ | 5715/12313 [4:16:43<5:05:20, 2.78s/it] 46%|████▋ | 5716/12313 [4:16:45<4:52:14, 2.66s/it] {'loss': 0.7555, 'grad_norm': 3.542984250968721, 'learning_rate': 2.909489172311765e-06, 'epoch': 0.46} 46%|████▋ | 5716/12313 [4:16:45<4:52:14, 2.66s/it] 46%|████▋ | 5717/12313 [4:16:48<4:52:20, 2.66s/it] {'loss': 0.5655, 'grad_norm': 4.081018198125554, 'learning_rate': 2.9088404176333456e-06, 'epoch': 0.46} 46%|████▋ | 5717/12313 [4:16:48<4:52:20, 2.66s/it] 46%|████▋ | 5718/12313 [4:16:50<4:44:46, 2.59s/it] {'loss': 0.6157, 'grad_norm': 5.0004125127548, 'learning_rate': 2.9081916346653333e-06, 'epoch': 0.46} 46%|████▋ | 5718/12313 [4:16:50<4:44:46, 2.59s/it] 46%|████▋ | 5719/12313 [4:16:53<4:47:36, 2.62s/it] {'loss': 0.404, 'grad_norm': 6.309500154267012, 'learning_rate': 2.9075428234526215e-06, 'epoch': 0.46} 46%|████▋ | 5719/12313 [4:16:53<4:47:36, 2.62s/it] 46%|████▋ | 5720/12313 [4:16:56<4:53:41, 2.67s/it] {'loss': 0.4213, 'grad_norm': 5.199891325095606, 'learning_rate': 2.9068939840401018e-06, 'epoch': 0.46} 46%|████▋ | 5720/12313 [4:16:56<4:53:41, 2.67s/it] 46%|████▋ | 5721/12313 [4:16:58<4:52:34, 2.66s/it] {'loss': 0.5366, 'grad_norm': 3.3927116646573685, 'learning_rate': 2.906245116472672e-06, 'epoch': 0.46} 46%|████▋ | 5721/12313 [4:16:58<4:52:34, 2.66s/it] 46%|████▋ | 5722/12313 [4:17:01<4:52:43, 2.66s/it] {'loss': 0.5221, 'grad_norm': 5.355679239024366, 'learning_rate': 2.905596220795231e-06, 'epoch': 0.46} 46%|████▋ | 5722/12313 [4:17:01<4:52:43, 2.66s/it] 46%|████▋ | 5723/12313 [4:17:04<4:54:50, 2.68s/it] {'loss': 0.49, 'grad_norm': 4.733947923123343, 'learning_rate': 2.9049472970526777e-06, 'epoch': 0.46} 46%|████▋ | 5723/12313 [4:17:04<4:54:50, 2.68s/it] 46%|████▋ | 5724/12313 [4:17:07<5:03:44, 2.77s/it] {'loss': 0.4846, 'grad_norm': 6.624090170841801, 'learning_rate': 2.904298345289914e-06, 'epoch': 0.46} 46%|████▋ | 5724/12313 [4:17:07<5:03:44, 2.77s/it] 46%|████▋ | 5725/12313 [4:17:09<4:58:55, 2.72s/it] {'loss': 0.4319, 'grad_norm': 5.72885175850521, 'learning_rate': 2.9036493655518456e-06, 'epoch': 0.46} 46%|████▋ | 5725/12313 [4:17:09<4:58:55, 2.72s/it] 47%|████▋ | 5726/12313 [4:17:12<4:56:01, 2.70s/it] {'loss': 0.5975, 'grad_norm': 4.332865010791888, 'learning_rate': 2.9030003578833765e-06, 'epoch': 0.47} 47%|████▋ | 5726/12313 [4:17:12<4:56:01, 2.70s/it] 47%|████▋ | 5727/12313 [4:17:15<4:57:24, 2.71s/it] {'loss': 0.5159, 'grad_norm': 6.313800965150897, 'learning_rate': 2.902351322329416e-06, 'epoch': 0.47} 47%|████▋ | 5727/12313 [4:17:15<4:57:24, 2.71s/it] 47%|████▋ | 5728/12313 [4:17:17<4:58:13, 2.72s/it] {'loss': 0.393, 'grad_norm': 5.227382270145744, 'learning_rate': 2.9017022589348733e-06, 'epoch': 0.47} 47%|████▋ | 5728/12313 [4:17:17<4:58:13, 2.72s/it] 47%|████▋ | 5729/12313 [4:17:20<4:55:55, 2.70s/it] {'loss': 0.52, 'grad_norm': 2.840558709344795, 'learning_rate': 2.9010531677446602e-06, 'epoch': 0.47} 47%|████▋ | 5729/12313 [4:17:20<4:55:55, 2.70s/it] 47%|████▋ | 5730/12313 [4:17:22<4:46:41, 2.61s/it] {'loss': 0.506, 'grad_norm': 5.366773140432362, 'learning_rate': 2.90040404880369e-06, 'epoch': 0.47} 47%|████▋ | 5730/12313 [4:17:22<4:46:41, 2.61s/it] 47%|████▋ | 5731/12313 [4:17:25<4:49:28, 2.64s/it] {'loss': 0.6582, 'grad_norm': 3.2313831572145193, 'learning_rate': 2.8997549021568792e-06, 'epoch': 0.47} 47%|████▋ | 5731/12313 [4:17:25<4:49:28, 2.64s/it] 47%|████▋ | 5732/12313 [4:17:28<4:43:45, 2.59s/it] {'loss': 0.4414, 'grad_norm': 5.926467612756465, 'learning_rate': 2.899105727849145e-06, 'epoch': 0.47} 47%|████▋ | 5732/12313 [4:17:28<4:43:45, 2.59s/it] 47%|████▋ | 5733/12313 [4:17:30<4:44:04, 2.59s/it] {'loss': 0.54, 'grad_norm': 4.447511140460557, 'learning_rate': 2.898456525925406e-06, 'epoch': 0.47} 47%|████▋ | 5733/12313 [4:17:30<4:44:04, 2.59s/it] 47%|████▋ | 5734/12313 [4:17:33<4:42:11, 2.57s/it] {'loss': 0.4579, 'grad_norm': 5.303134033820951, 'learning_rate': 2.8978072964305848e-06, 'epoch': 0.47} 47%|████▋ | 5734/12313 [4:17:33<4:42:11, 2.57s/it] 47%|████▋ | 5735/12313 [4:17:36<4:49:41, 2.64s/it] {'loss': 0.4434, 'grad_norm': 5.195062102266287, 'learning_rate': 2.8971580394096043e-06, 'epoch': 0.47} 47%|████▋ | 5735/12313 [4:17:36<4:49:41, 2.64s/it] 47%|████▋ | 5736/12313 [4:17:38<4:53:57, 2.68s/it] {'loss': 0.5036, 'grad_norm': 5.8469679642289165, 'learning_rate': 2.896508754907389e-06, 'epoch': 0.47} 47%|████▋ | 5736/12313 [4:17:38<4:53:57, 2.68s/it] 47%|████▋ | 5737/12313 [4:17:41<5:05:37, 2.79s/it] {'loss': 0.483, 'grad_norm': 3.484649611013429, 'learning_rate': 2.8958594429688656e-06, 'epoch': 0.47} 47%|████▋ | 5737/12313 [4:17:41<5:05:37, 2.79s/it] 47%|████▋ | 5738/12313 [4:17:44<5:00:20, 2.74s/it] {'loss': 0.5142, 'grad_norm': 7.276763839964852, 'learning_rate': 2.895210103638966e-06, 'epoch': 0.47} 47%|████▋ | 5738/12313 [4:17:44<5:00:20, 2.74s/it] 47%|████▋ | 5739/12313 [4:17:47<4:58:35, 2.73s/it] {'loss': 0.6067, 'grad_norm': 4.316468383561267, 'learning_rate': 2.894560736962617e-06, 'epoch': 0.47} 47%|████▋ | 5739/12313 [4:17:47<4:58:35, 2.73s/it] 47%|████▋ | 5740/12313 [4:17:49<4:59:47, 2.74s/it] {'loss': 0.7335, 'grad_norm': 3.8962363928693184, 'learning_rate': 2.893911342984754e-06, 'epoch': 0.47} 47%|████▋ | 5740/12313 [4:17:49<4:59:47, 2.74s/it] 47%|████▋ | 5741/12313 [4:17:52<5:01:54, 2.76s/it] {'loss': 0.4862, 'grad_norm': 4.124297053875989, 'learning_rate': 2.89326192175031e-06, 'epoch': 0.47} 47%|████▋ | 5741/12313 [4:17:52<5:01:54, 2.76s/it] 47%|████▋ | 5742/12313 [4:17:55<4:57:35, 2.72s/it] {'loss': 0.3562, 'grad_norm': 5.37302630861064, 'learning_rate': 2.8926124733042228e-06, 'epoch': 0.47} 47%|████▋ | 5742/12313 [4:17:55<4:57:35, 2.72s/it] 47%|████▋ | 5743/12313 [4:17:58<5:07:43, 2.81s/it] {'loss': 0.7901, 'grad_norm': 5.046203124192214, 'learning_rate': 2.89196299769143e-06, 'epoch': 0.47} 47%|████▋ | 5743/12313 [4:17:58<5:07:43, 2.81s/it] 47%|████▋ | 5744/12313 [4:18:00<4:56:35, 2.71s/it] {'loss': 0.5199, 'grad_norm': 8.053971161102101, 'learning_rate': 2.8913134949568726e-06, 'epoch': 0.47} 47%|████▋ | 5744/12313 [4:18:00<4:56:35, 2.71s/it] 47%|████▋ | 5745/12313 [4:18:03<5:05:37, 2.79s/it] {'loss': 0.5486, 'grad_norm': 3.327749003210284, 'learning_rate': 2.890663965145492e-06, 'epoch': 0.47} 47%|████▋ | 5745/12313 [4:18:03<5:05:37, 2.79s/it] 47%|████▋ | 5746/12313 [4:18:06<5:03:09, 2.77s/it] {'loss': 0.6335, 'grad_norm': 5.0353868986052674, 'learning_rate': 2.890014408302233e-06, 'epoch': 0.47} 47%|████▋ | 5746/12313 [4:18:06<5:03:09, 2.77s/it] 47%|████▋ | 5747/12313 [4:18:09<5:01:41, 2.76s/it] {'loss': 0.4263, 'grad_norm': 6.139502965392141, 'learning_rate': 2.8893648244720406e-06, 'epoch': 0.47} 47%|████▋ | 5747/12313 [4:18:09<5:01:41, 2.76s/it] 47%|████▋ | 5748/12313 [4:18:12<5:02:54, 2.77s/it] {'loss': 0.5657, 'grad_norm': 5.019570855865613, 'learning_rate': 2.8887152136998644e-06, 'epoch': 0.47} 47%|████▋ | 5748/12313 [4:18:12<5:02:54, 2.77s/it] 47%|████▋ | 5749/12313 [4:18:14<5:01:32, 2.76s/it] {'loss': 0.3836, 'grad_norm': 16.504398142409077, 'learning_rate': 2.8880655760306507e-06, 'epoch': 0.47} 47%|████▋ | 5749/12313 [4:18:14<5:01:32, 2.76s/it] 47%|████▋ | 5750/12313 [4:18:17<5:04:34, 2.78s/it] {'loss': 0.6306, 'grad_norm': 5.525848295027518, 'learning_rate': 2.887415911509354e-06, 'epoch': 0.47} 47%|████▋ | 5750/12313 [4:18:17<5:04:34, 2.78s/it] 47%|████▋ | 5751/12313 [4:18:20<5:00:41, 2.75s/it] {'loss': 0.6479, 'grad_norm': 4.288458640459229, 'learning_rate': 2.8867662201809266e-06, 'epoch': 0.47} 47%|████▋ | 5751/12313 [4:18:20<5:00:41, 2.75s/it] 47%|████▋ | 5752/12313 [4:18:23<4:59:13, 2.74s/it] {'loss': 0.5476, 'grad_norm': 8.844918615051897, 'learning_rate': 2.8861165020903235e-06, 'epoch': 0.47} 47%|████▋ | 5752/12313 [4:18:23<4:59:13, 2.74s/it] 47%|████▋ | 5753/12313 [4:18:25<4:59:20, 2.74s/it] {'loss': 0.4022, 'grad_norm': 8.708089067131917, 'learning_rate': 2.8854667572825013e-06, 'epoch': 0.47} 47%|████▋ | 5753/12313 [4:18:25<4:59:20, 2.74s/it] 47%|████▋ | 5754/12313 [4:18:28<4:51:06, 2.66s/it] {'loss': 0.4588, 'grad_norm': 3.9421995828081897, 'learning_rate': 2.8848169858024206e-06, 'epoch': 0.47} 47%|████▋ | 5754/12313 [4:18:28<4:51:06, 2.66s/it] 47%|████▋ | 5755/12313 [4:18:31<4:59:52, 2.74s/it] {'loss': 0.4142, 'grad_norm': 3.8977256522289374, 'learning_rate': 2.8841671876950404e-06, 'epoch': 0.47} 47%|████▋ | 5755/12313 [4:18:31<4:59:52, 2.74s/it] 47%|████▋ | 5756/12313 [4:18:33<4:58:42, 2.73s/it] {'loss': 0.4468, 'grad_norm': 8.084593014494807, 'learning_rate': 2.8835173630053244e-06, 'epoch': 0.47} 47%|████▋ | 5756/12313 [4:18:33<4:58:42, 2.73s/it] 47%|████▋ | 5757/12313 [4:18:36<4:57:29, 2.72s/it] {'loss': 0.5784, 'grad_norm': 3.533484576984449, 'learning_rate': 2.882867511778237e-06, 'epoch': 0.47} 47%|████▋ | 5757/12313 [4:18:36<4:57:29, 2.72s/it] 47%|████▋ | 5758/12313 [4:18:39<4:57:04, 2.72s/it] {'loss': 0.6078, 'grad_norm': 4.263459871167963, 'learning_rate': 2.8822176340587434e-06, 'epoch': 0.47} 47%|████▋ | 5758/12313 [4:18:39<4:57:04, 2.72s/it] 47%|████▋ | 5759/12313 [4:18:41<4:51:06, 2.67s/it] {'loss': 0.4972, 'grad_norm': 3.9845148250837865, 'learning_rate': 2.881567729891812e-06, 'epoch': 0.47} 47%|████▋ | 5759/12313 [4:18:41<4:51:06, 2.67s/it] 47%|████▋ | 5760/12313 [4:18:44<4:50:54, 2.66s/it] {'loss': 0.5564, 'grad_norm': 7.062828717835998, 'learning_rate': 2.8809177993224143e-06, 'epoch': 0.47} 47%|████▋ | 5760/12313 [4:18:44<4:50:54, 2.66s/it] 47%|████▋ | 5761/12313 [4:18:47<4:57:07, 2.72s/it] {'loss': 0.573, 'grad_norm': 4.435936639131757, 'learning_rate': 2.88026784239552e-06, 'epoch': 0.47} 47%|████▋ | 5761/12313 [4:18:47<4:57:07, 2.72s/it] 47%|████▋ | 5762/12313 [4:18:49<4:48:31, 2.64s/it] {'loss': 0.4828, 'grad_norm': 4.512414174366236, 'learning_rate': 2.8796178591561035e-06, 'epoch': 0.47} 47%|████▋ | 5762/12313 [4:18:49<4:48:31, 2.64s/it] 47%|████▋ | 5763/12313 [4:18:52<4:48:06, 2.64s/it] {'loss': 0.5475, 'grad_norm': 5.094918568491417, 'learning_rate': 2.8789678496491407e-06, 'epoch': 0.47} 47%|████▋ | 5763/12313 [4:18:52<4:48:06, 2.64s/it] 47%|████▋ | 5764/12313 [4:18:55<4:46:56, 2.63s/it] {'loss': 0.5026, 'grad_norm': 2.6728500983692514, 'learning_rate': 2.878317813919608e-06, 'epoch': 0.47} 47%|████▋ | 5764/12313 [4:18:55<4:46:56, 2.63s/it] 47%|████▋ | 5765/12313 [4:18:57<4:45:48, 2.62s/it] {'loss': 0.7266, 'grad_norm': 4.7597544395019495, 'learning_rate': 2.877667752012485e-06, 'epoch': 0.47} 47%|████▋ | 5765/12313 [4:18:57<4:45:48, 2.62s/it] 47%|████▋ | 5766/12313 [4:19:00<4:50:32, 2.66s/it] {'loss': 0.4941, 'grad_norm': 6.8641010734258, 'learning_rate': 2.877017663972752e-06, 'epoch': 0.47} 47%|████▋ | 5766/12313 [4:19:00<4:50:32, 2.66s/it] 47%|████▋ | 5767/12313 [4:19:02<4:44:25, 2.61s/it] {'loss': 0.5093, 'grad_norm': 5.927018173381962, 'learning_rate': 2.876367549845393e-06, 'epoch': 0.47} 47%|████▋ | 5767/12313 [4:19:02<4:44:25, 2.61s/it] 47%|████▋ | 5768/12313 [4:19:05<4:47:32, 2.64s/it] {'loss': 0.5454, 'grad_norm': 3.4289610085754383, 'learning_rate': 2.875717409675391e-06, 'epoch': 0.47} 47%|████▋ | 5768/12313 [4:19:05<4:47:32, 2.64s/it] 47%|████▋ | 5769/12313 [4:19:08<4:45:43, 2.62s/it] {'loss': 0.4647, 'grad_norm': 7.9437461607693, 'learning_rate': 2.875067243507732e-06, 'epoch': 0.47} 47%|████▋ | 5769/12313 [4:19:08<4:45:43, 2.62s/it] 47%|████▋ | 5770/12313 [4:19:10<4:46:04, 2.62s/it] {'loss': 0.4881, 'grad_norm': 3.7116341834744078, 'learning_rate': 2.8744170513874054e-06, 'epoch': 0.47} 47%|████▋ | 5770/12313 [4:19:10<4:46:04, 2.62s/it] 47%|████▋ | 5771/12313 [4:19:13<4:49:06, 2.65s/it] {'loss': 0.4672, 'grad_norm': 4.2367362028156, 'learning_rate': 2.8737668333594005e-06, 'epoch': 0.47} 47%|████▋ | 5771/12313 [4:19:13<4:49:06, 2.65s/it] 47%|████▋ | 5772/12313 [4:19:16<4:49:10, 2.65s/it] {'loss': 0.4611, 'grad_norm': 4.273237686231666, 'learning_rate': 2.873116589468708e-06, 'epoch': 0.47} 47%|████▋ | 5772/12313 [4:19:16<4:49:10, 2.65s/it] 47%|████▋ | 5773/12313 [4:19:18<4:49:32, 2.66s/it] {'loss': 0.5954, 'grad_norm': 5.096615588091853, 'learning_rate': 2.872466319760323e-06, 'epoch': 0.47} 47%|████▋ | 5773/12313 [4:19:18<4:49:32, 2.66s/it] 47%|████▋ | 5774/12313 [4:19:21<4:51:50, 2.68s/it] {'loss': 0.5718, 'grad_norm': 4.825210645748332, 'learning_rate': 2.87181602427924e-06, 'epoch': 0.47} 47%|████▋ | 5774/12313 [4:19:21<4:51:50, 2.68s/it] 47%|████▋ | 5775/12313 [4:19:24<4:56:50, 2.72s/it] {'loss': 0.4037, 'grad_norm': 5.991461149755983, 'learning_rate': 2.8711657030704553e-06, 'epoch': 0.47} 47%|████▋ | 5775/12313 [4:19:24<4:56:50, 2.72s/it] 47%|████▋ | 5776/12313 [4:19:27<5:04:10, 2.79s/it] {'loss': 0.513, 'grad_norm': 3.4800097173343776, 'learning_rate': 2.870515356178969e-06, 'epoch': 0.47} 47%|████▋ | 5776/12313 [4:19:27<5:04:10, 2.79s/it] 47%|████▋ | 5777/12313 [4:19:30<5:06:15, 2.81s/it] {'loss': 0.6655, 'grad_norm': 6.256528064000865, 'learning_rate': 2.8698649836497805e-06, 'epoch': 0.47} 47%|████▋ | 5777/12313 [4:19:30<5:06:15, 2.81s/it] 47%|████▋ | 5778/12313 [4:19:33<5:12:39, 2.87s/it] {'loss': 0.4901, 'grad_norm': 5.04158800857517, 'learning_rate': 2.869214585527893e-06, 'epoch': 0.47} 47%|████▋ | 5778/12313 [4:19:33<5:12:39, 2.87s/it] 47%|████▋ | 5779/12313 [4:19:35<5:09:59, 2.85s/it] {'loss': 0.4887, 'grad_norm': 4.617317169571152, 'learning_rate': 2.8685641618583098e-06, 'epoch': 0.47} 47%|████▋ | 5779/12313 [4:19:35<5:09:59, 2.85s/it] 47%|████▋ | 5780/12313 [4:19:38<5:02:02, 2.77s/it] {'loss': 0.647, 'grad_norm': 3.9855464372411147, 'learning_rate': 2.8679137126860373e-06, 'epoch': 0.47} 47%|████▋ | 5780/12313 [4:19:38<5:02:02, 2.77s/it] 47%|████▋ | 5781/12313 [4:19:41<4:57:37, 2.73s/it] {'loss': 0.5514, 'grad_norm': 4.5232125393874965, 'learning_rate': 2.867263238056084e-06, 'epoch': 0.47} 47%|████▋ | 5781/12313 [4:19:41<4:57:37, 2.73s/it] 47%|████▋ | 5782/12313 [4:19:43<4:57:32, 2.73s/it] {'loss': 0.6096, 'grad_norm': 6.069972056894211, 'learning_rate': 2.866612738013457e-06, 'epoch': 0.47} 47%|████▋ | 5782/12313 [4:19:43<4:57:32, 2.73s/it] 47%|████▋ | 5783/12313 [4:19:46<4:54:10, 2.70s/it] {'loss': 0.5519, 'grad_norm': 4.326041253791683, 'learning_rate': 2.8659622126031687e-06, 'epoch': 0.47} 47%|████▋ | 5783/12313 [4:19:46<4:54:10, 2.70s/it] 47%|████▋ | 5784/12313 [4:19:49<4:44:59, 2.62s/it] {'loss': 0.5394, 'grad_norm': 6.02872157760923, 'learning_rate': 2.8653116618702338e-06, 'epoch': 0.47} 47%|████▋ | 5784/12313 [4:19:49<4:44:59, 2.62s/it] 47%|████▋ | 5785/12313 [4:19:51<4:52:16, 2.69s/it] {'loss': 0.5991, 'grad_norm': 12.764955573129805, 'learning_rate': 2.8646610858596635e-06, 'epoch': 0.47} 47%|████▋ | 5785/12313 [4:19:51<4:52:16, 2.69s/it] 47%|████▋ | 5786/12313 [4:19:54<4:53:15, 2.70s/it] {'loss': 0.54, 'grad_norm': 5.345811501835503, 'learning_rate': 2.864010484616477e-06, 'epoch': 0.47} 47%|████▋ | 5786/12313 [4:19:54<4:53:15, 2.70s/it] 47%|████▋ | 5787/12313 [4:19:56<4:42:51, 2.60s/it] {'loss': 0.515, 'grad_norm': 4.1825362641291415, 'learning_rate': 2.8633598581856915e-06, 'epoch': 0.47} 47%|████▋ | 5787/12313 [4:19:56<4:42:51, 2.60s/it] 47%|████▋ | 5788/12313 [4:19:59<4:49:18, 2.66s/it] {'loss': 0.4081, 'grad_norm': 4.283001560216305, 'learning_rate': 2.8627092066123263e-06, 'epoch': 0.47} 47%|████▋ | 5788/12313 [4:19:59<4:49:18, 2.66s/it] 47%|████▋ | 5789/12313 [4:20:02<4:47:50, 2.65s/it] {'loss': 0.5021, 'grad_norm': 4.308589294597862, 'learning_rate': 2.8620585299414038e-06, 'epoch': 0.47} 47%|████▋ | 5789/12313 [4:20:02<4:47:50, 2.65s/it] 47%|████▋ | 5790/12313 [4:20:04<4:45:40, 2.63s/it] {'loss': 0.4754, 'grad_norm': 8.01361742978466, 'learning_rate': 2.861407828217947e-06, 'epoch': 0.47} 47%|████▋ | 5790/12313 [4:20:04<4:45:40, 2.63s/it] 47%|████▋ | 5791/12313 [4:20:07<4:42:50, 2.60s/it] {'loss': 0.5081, 'grad_norm': 5.074919316163974, 'learning_rate': 2.8607571014869816e-06, 'epoch': 0.47} 47%|████▋ | 5791/12313 [4:20:07<4:42:50, 2.60s/it] 47%|████▋ | 5792/12313 [4:20:10<4:48:20, 2.65s/it] {'loss': 0.6083, 'grad_norm': 4.9471885543147325, 'learning_rate': 2.860106349793534e-06, 'epoch': 0.47} 47%|████▋ | 5792/12313 [4:20:10<4:48:20, 2.65s/it] 47%|████▋ | 5793/12313 [4:20:12<4:43:08, 2.61s/it] {'loss': 0.5181, 'grad_norm': 5.520036465193203, 'learning_rate': 2.859455573182632e-06, 'epoch': 0.47} 47%|████▋ | 5793/12313 [4:20:12<4:43:08, 2.61s/it] 47%|████▋ | 5794/12313 [4:20:15<4:46:16, 2.63s/it] {'loss': 0.402, 'grad_norm': 8.857086777181047, 'learning_rate': 2.8588047716993084e-06, 'epoch': 0.47} 47%|████▋ | 5794/12313 [4:20:15<4:46:16, 2.63s/it] 47%|████▋ | 5795/12313 [4:20:17<4:40:21, 2.58s/it] {'loss': 0.4228, 'grad_norm': 5.525749548650858, 'learning_rate': 2.858153945388592e-06, 'epoch': 0.47} 47%|████▋ | 5795/12313 [4:20:17<4:40:21, 2.58s/it] 47%|████▋ | 5796/12313 [4:20:20<4:40:50, 2.59s/it] {'loss': 0.3335, 'grad_norm': 4.365954557012976, 'learning_rate': 2.8575030942955185e-06, 'epoch': 0.47} 47%|████▋ | 5796/12313 [4:20:20<4:40:50, 2.59s/it] 47%|████▋ | 5797/12313 [4:20:23<4:41:50, 2.60s/it] {'loss': 0.5517, 'grad_norm': 5.357132843700224, 'learning_rate': 2.856852218465124e-06, 'epoch': 0.47} 47%|████▋ | 5797/12313 [4:20:23<4:41:50, 2.60s/it] 47%|████▋ | 5798/12313 [4:20:25<4:44:14, 2.62s/it] {'loss': 0.4705, 'grad_norm': 5.2190595114312694, 'learning_rate': 2.856201317942443e-06, 'epoch': 0.47} 47%|████▋ | 5798/12313 [4:20:25<4:44:14, 2.62s/it] 47%|████▋ | 5799/12313 [4:20:28<4:41:02, 2.59s/it] {'loss': 0.4894, 'grad_norm': 3.424227721385115, 'learning_rate': 2.8555503927725164e-06, 'epoch': 0.47} 47%|████▋ | 5799/12313 [4:20:28<4:41:02, 2.59s/it] 47%|████▋ | 5800/12313 [4:20:31<4:51:20, 2.68s/it] {'loss': 0.4604, 'grad_norm': 5.0569787088343325, 'learning_rate': 2.854899443000385e-06, 'epoch': 0.47} 47%|████▋ | 5800/12313 [4:20:31<4:51:20, 2.68s/it] 47%|████▋ | 5801/12313 [4:20:34<4:55:00, 2.72s/it] {'loss': 0.5102, 'grad_norm': 8.187150772848494, 'learning_rate': 2.8542484686710896e-06, 'epoch': 0.47} 47%|████▋ | 5801/12313 [4:20:34<4:55:00, 2.72s/it] 47%|████▋ | 5802/12313 [4:20:36<4:50:50, 2.68s/it] {'loss': 0.3528, 'grad_norm': 6.207010087185806, 'learning_rate': 2.8535974698296765e-06, 'epoch': 0.47} 47%|████▋ | 5802/12313 [4:20:36<4:50:50, 2.68s/it] 47%|████▋ | 5803/12313 [4:20:39<4:41:43, 2.60s/it] {'loss': 0.6394, 'grad_norm': 4.227888017544308, 'learning_rate': 2.8529464465211886e-06, 'epoch': 0.47} 47%|████▋ | 5803/12313 [4:20:39<4:41:43, 2.60s/it] 47%|████▋ | 5804/12313 [4:20:41<4:43:23, 2.61s/it] {'loss': 0.6159, 'grad_norm': 4.665116991045132, 'learning_rate': 2.852295398790675e-06, 'epoch': 0.47} 47%|████▋ | 5804/12313 [4:20:41<4:43:23, 2.61s/it] 47%|████▋ | 5805/12313 [4:20:44<4:55:27, 2.72s/it] {'loss': 0.673, 'grad_norm': 3.8703424664055857, 'learning_rate': 2.8516443266831837e-06, 'epoch': 0.47} 47%|████▋ | 5805/12313 [4:20:44<4:55:27, 2.72s/it] 47%|████▋ | 5806/12313 [4:20:47<4:49:10, 2.67s/it] {'loss': 0.5117, 'grad_norm': 3.9676189612736388, 'learning_rate': 2.8509932302437665e-06, 'epoch': 0.47} 47%|████▋ | 5806/12313 [4:20:47<4:49:10, 2.67s/it] 47%|████▋ | 5807/12313 [4:20:49<4:50:39, 2.68s/it] {'loss': 0.3692, 'grad_norm': 4.329273160183401, 'learning_rate': 2.850342109517475e-06, 'epoch': 0.47} 47%|████▋ | 5807/12313 [4:20:49<4:50:39, 2.68s/it] 47%|████▋ | 5808/12313 [4:20:52<4:48:15, 2.66s/it] {'loss': 0.4246, 'grad_norm': 11.340360099111553, 'learning_rate': 2.8496909645493642e-06, 'epoch': 0.47} 47%|████▋ | 5808/12313 [4:20:52<4:48:15, 2.66s/it] 47%|████▋ | 5809/12313 [4:20:54<4:39:26, 2.58s/it] {'loss': 0.5581, 'grad_norm': 3.8942256837186626, 'learning_rate': 2.849039795384489e-06, 'epoch': 0.47} 47%|████▋ | 5809/12313 [4:20:54<4:39:26, 2.58s/it] 47%|████▋ | 5810/12313 [4:20:57<4:49:58, 2.68s/it] {'loss': 0.4519, 'grad_norm': 9.155900656113834, 'learning_rate': 2.8483886020679075e-06, 'epoch': 0.47} 47%|████▋ | 5810/12313 [4:20:57<4:49:58, 2.68s/it] 47%|████▋ | 5811/12313 [4:21:00<4:47:12, 2.65s/it] {'loss': 0.5612, 'grad_norm': 6.0033962365746465, 'learning_rate': 2.847737384644678e-06, 'epoch': 0.47} 47%|████▋ | 5811/12313 [4:21:00<4:47:12, 2.65s/it] 47%|████▋ | 5812/12313 [4:21:03<4:54:59, 2.72s/it] {'loss': 0.6559, 'grad_norm': 5.889887605386142, 'learning_rate': 2.8470861431598623e-06, 'epoch': 0.47} 47%|████▋ | 5812/12313 [4:21:03<4:54:59, 2.72s/it] 47%|████▋ | 5813/12313 [4:21:05<4:51:31, 2.69s/it] {'loss': 0.635, 'grad_norm': 4.599377492190549, 'learning_rate': 2.8464348776585234e-06, 'epoch': 0.47} 47%|████▋ | 5813/12313 [4:21:05<4:51:31, 2.69s/it] 47%|████▋ | 5814/12313 [4:21:08<4:52:36, 2.70s/it] {'loss': 0.4829, 'grad_norm': 10.76890029919621, 'learning_rate': 2.8457835881857227e-06, 'epoch': 0.47} 47%|████▋ | 5814/12313 [4:21:08<4:52:36, 2.70s/it] 47%|████▋ | 5815/12313 [4:21:11<4:54:04, 2.72s/it] {'loss': 0.5789, 'grad_norm': 7.220464203607489, 'learning_rate': 2.8451322747865286e-06, 'epoch': 0.47} 47%|████▋ | 5815/12313 [4:21:11<4:54:04, 2.72s/it] 47%|████▋ | 5816/12313 [4:21:14<4:53:11, 2.71s/it] {'loss': 0.609, 'grad_norm': 4.1108394338985725, 'learning_rate': 2.844480937506008e-06, 'epoch': 0.47} 47%|████▋ | 5816/12313 [4:21:14<4:53:11, 2.71s/it] 47%|████▋ | 5817/12313 [4:21:16<4:51:57, 2.70s/it] {'loss': 0.4926, 'grad_norm': 9.04222984053817, 'learning_rate': 2.843829576389229e-06, 'epoch': 0.47} 47%|████▋ | 5817/12313 [4:21:16<4:51:57, 2.70s/it] 47%|████▋ | 5818/12313 [4:21:19<5:00:24, 2.78s/it] {'loss': 0.6624, 'grad_norm': 4.528427577254921, 'learning_rate': 2.843178191481263e-06, 'epoch': 0.47} 47%|████▋ | 5818/12313 [4:21:19<5:00:24, 2.78s/it] 47%|████▋ | 5819/12313 [4:21:22<5:02:58, 2.80s/it] {'loss': 0.5929, 'grad_norm': 3.3028903657064825, 'learning_rate': 2.842526782827183e-06, 'epoch': 0.47} 47%|████▋ | 5819/12313 [4:21:22<5:02:58, 2.80s/it] 47%|████▋ | 5820/12313 [4:21:25<5:14:42, 2.91s/it] {'loss': 0.5268, 'grad_norm': 5.026903257750425, 'learning_rate': 2.841875350472062e-06, 'epoch': 0.47} 47%|████▋ | 5820/12313 [4:21:25<5:14:42, 2.91s/it] 47%|████▋ | 5821/12313 [4:21:28<5:05:14, 2.82s/it] {'loss': 0.6567, 'grad_norm': 6.5104574690823975, 'learning_rate': 2.841223894460976e-06, 'epoch': 0.47} 47%|████▋ | 5821/12313 [4:21:28<5:05:14, 2.82s/it] 47%|████▋ | 5822/12313 [4:21:31<5:01:03, 2.78s/it] {'loss': 0.449, 'grad_norm': 4.314065979269395, 'learning_rate': 2.8405724148390023e-06, 'epoch': 0.47} 47%|████▋ | 5822/12313 [4:21:31<5:01:03, 2.78s/it] 47%|████▋ | 5823/12313 [4:21:33<4:48:38, 2.67s/it] {'loss': 0.5964, 'grad_norm': 3.312866780207277, 'learning_rate': 2.8399209116512204e-06, 'epoch': 0.47} 47%|████▋ | 5823/12313 [4:21:33<4:48:38, 2.67s/it] 47%|████▋ | 5824/12313 [4:21:36<4:57:27, 2.75s/it] {'loss': 0.5702, 'grad_norm': 5.37935898304279, 'learning_rate': 2.83926938494271e-06, 'epoch': 0.47} 47%|████▋ | 5824/12313 [4:21:36<4:57:27, 2.75s/it] 47%|████▋ | 5825/12313 [4:21:38<4:53:17, 2.71s/it] {'loss': 0.4697, 'grad_norm': 4.881465697522925, 'learning_rate': 2.838617834758554e-06, 'epoch': 0.47} 47%|████▋ | 5825/12313 [4:21:39<4:53:17, 2.71s/it] 47%|████▋ | 5826/12313 [4:21:41<4:43:34, 2.62s/it] {'loss': 0.4512, 'grad_norm': 5.744126603133613, 'learning_rate': 2.8379662611438356e-06, 'epoch': 0.47} 47%|████▋ | 5826/12313 [4:21:41<4:43:34, 2.62s/it] 47%|████▋ | 5827/12313 [4:21:43<4:39:26, 2.59s/it] {'loss': 0.5276, 'grad_norm': 5.291309755595148, 'learning_rate': 2.8373146641436413e-06, 'epoch': 0.47} 47%|████▋ | 5827/12313 [4:21:43<4:39:26, 2.59s/it] 47%|████▋ | 5828/12313 [4:21:46<4:42:40, 2.62s/it] {'loss': 0.5143, 'grad_norm': 3.9537639542161407, 'learning_rate': 2.836663043803057e-06, 'epoch': 0.47} 47%|████▋ | 5828/12313 [4:21:46<4:42:40, 2.62s/it] 47%|████▋ | 5829/12313 [4:21:49<4:36:23, 2.56s/it] {'loss': 0.4687, 'grad_norm': 7.745428674706201, 'learning_rate': 2.8360114001671724e-06, 'epoch': 0.47} 47%|████▋ | 5829/12313 [4:21:49<4:36:23, 2.56s/it] 47%|████▋ | 5830/12313 [4:21:51<4:40:04, 2.59s/it] {'loss': 0.4782, 'grad_norm': 4.510709091006437, 'learning_rate': 2.835359733281077e-06, 'epoch': 0.47} 47%|████▋ | 5830/12313 [4:21:51<4:40:04, 2.59s/it] 47%|████▋ | 5831/12313 [4:21:54<4:44:26, 2.63s/it] {'loss': 0.4816, 'grad_norm': 7.03651123334896, 'learning_rate': 2.834708043189862e-06, 'epoch': 0.47} 47%|████▋ | 5831/12313 [4:21:54<4:44:26, 2.63s/it] 47%|████▋ | 5832/12313 [4:21:57<4:50:37, 2.69s/it] {'loss': 0.3454, 'grad_norm': 4.581264549358578, 'learning_rate': 2.8340563299386226e-06, 'epoch': 0.47} 47%|████▋ | 5832/12313 [4:21:57<4:50:37, 2.69s/it] 47%|████▋ | 5833/12313 [4:22:00<5:00:59, 2.79s/it] {'loss': 0.3696, 'grad_norm': 3.39063784376629, 'learning_rate': 2.833404593572453e-06, 'epoch': 0.47} 47%|████▋ | 5833/12313 [4:22:00<5:00:59, 2.79s/it] 47%|████▋ | 5834/12313 [4:22:02<4:54:24, 2.73s/it] {'loss': 0.4435, 'grad_norm': 5.734013115764664, 'learning_rate': 2.832752834136449e-06, 'epoch': 0.47} 47%|████▋ | 5834/12313 [4:22:02<4:54:24, 2.73s/it] 47%|████▋ | 5835/12313 [4:22:05<4:44:35, 2.64s/it] {'loss': 0.6581, 'grad_norm': 4.343417166849715, 'learning_rate': 2.832101051675712e-06, 'epoch': 0.47} 47%|████▋ | 5835/12313 [4:22:05<4:44:35, 2.64s/it] 47%|████▋ | 5836/12313 [4:22:07<4:44:26, 2.63s/it] {'loss': 0.5384, 'grad_norm': 3.661535617789465, 'learning_rate': 2.8314492462353386e-06, 'epoch': 0.47} 47%|████▋ | 5836/12313 [4:22:07<4:44:26, 2.63s/it] 47%|████▋ | 5837/12313 [4:22:10<4:46:17, 2.65s/it] {'loss': 0.5199, 'grad_norm': 4.785958660433438, 'learning_rate': 2.8307974178604312e-06, 'epoch': 0.47} 47%|████▋ | 5837/12313 [4:22:10<4:46:17, 2.65s/it] 47%|████▋ | 5838/12313 [4:22:13<4:48:42, 2.68s/it] {'loss': 0.5311, 'grad_norm': 7.777693529299737, 'learning_rate': 2.830145566596094e-06, 'epoch': 0.47} 47%|████▋ | 5838/12313 [4:22:13<4:48:42, 2.68s/it] 47%|████▋ | 5839/12313 [4:22:16<4:49:31, 2.68s/it] {'loss': 0.6261, 'grad_norm': 6.275994532015221, 'learning_rate': 2.8294936924874304e-06, 'epoch': 0.47} 47%|████▋ | 5839/12313 [4:22:16<4:49:31, 2.68s/it] 47%|████▋ | 5840/12313 [4:22:18<4:41:14, 2.61s/it] {'loss': 0.5628, 'grad_norm': 5.198840400244198, 'learning_rate': 2.8288417955795476e-06, 'epoch': 0.47} 47%|████▋ | 5840/12313 [4:22:18<4:41:14, 2.61s/it] 47%|████▋ | 5841/12313 [4:22:21<4:44:25, 2.64s/it] {'loss': 0.6039, 'grad_norm': 3.7789894977053615, 'learning_rate': 2.828189875917553e-06, 'epoch': 0.47} 47%|████▋ | 5841/12313 [4:22:21<4:44:25, 2.64s/it] 47%|████▋ | 5842/12313 [4:22:23<4:50:26, 2.69s/it] {'loss': 0.6307, 'grad_norm': 4.356995519309679, 'learning_rate': 2.827537933546555e-06, 'epoch': 0.47} 47%|████▋ | 5842/12313 [4:22:23<4:50:26, 2.69s/it] 47%|████▋ | 5843/12313 [4:22:26<4:45:36, 2.65s/it] {'loss': 0.5372, 'grad_norm': 4.052004440671419, 'learning_rate': 2.8268859685116663e-06, 'epoch': 0.47} 47%|████▋ | 5843/12313 [4:22:26<4:45:36, 2.65s/it] 47%|████▋ | 5844/12313 [4:22:29<5:03:06, 2.81s/it] {'loss': 0.534, 'grad_norm': 3.622092711482535, 'learning_rate': 2.826233980857998e-06, 'epoch': 0.47} 47%|████▋ | 5844/12313 [4:22:29<5:03:06, 2.81s/it] 47%|████▋ | 5845/12313 [4:22:32<4:51:16, 2.70s/it] {'loss': 0.4377, 'grad_norm': 6.265338779708121, 'learning_rate': 2.8255819706306653e-06, 'epoch': 0.47} 47%|████▋ | 5845/12313 [4:22:32<4:51:16, 2.70s/it] 47%|████▋ | 5846/12313 [4:22:34<4:49:14, 2.68s/it] {'loss': 0.3812, 'grad_norm': 5.902804627865366, 'learning_rate': 2.8249299378747833e-06, 'epoch': 0.47} 47%|████▋ | 5846/12313 [4:22:34<4:49:14, 2.68s/it] 47%|████▋ | 5847/12313 [4:22:37<4:45:09, 2.65s/it] {'loss': 0.5673, 'grad_norm': 7.988753927468825, 'learning_rate': 2.824277882635469e-06, 'epoch': 0.47} 47%|████▋ | 5847/12313 [4:22:37<4:45:09, 2.65s/it] 47%|████▋ | 5848/12313 [4:22:40<4:52:39, 2.72s/it] {'loss': 0.5389, 'grad_norm': 10.87808274886899, 'learning_rate': 2.8236258049578418e-06, 'epoch': 0.47} 47%|████▋ | 5848/12313 [4:22:40<4:52:39, 2.72s/it] 48%|████▊ | 5849/12313 [4:22:42<4:41:08, 2.61s/it] {'loss': 0.4301, 'grad_norm': 5.350053974435865, 'learning_rate': 2.8229737048870216e-06, 'epoch': 0.48} 48%|████▊ | 5849/12313 [4:22:42<4:41:08, 2.61s/it] 48%|████▊ | 5850/12313 [4:22:45<4:41:12, 2.61s/it] {'loss': 0.5695, 'grad_norm': 5.918757588156862, 'learning_rate': 2.8223215824681295e-06, 'epoch': 0.48} 48%|████▊ | 5850/12313 [4:22:45<4:41:12, 2.61s/it] 48%|████▊ | 5851/12313 [4:22:47<4:35:12, 2.56s/it] {'loss': 0.6008, 'grad_norm': 12.49565829789524, 'learning_rate': 2.821669437746291e-06, 'epoch': 0.48} 48%|████▊ | 5851/12313 [4:22:47<4:35:12, 2.56s/it] 48%|████▊ | 5852/12313 [4:22:50<4:49:55, 2.69s/it] {'loss': 0.5504, 'grad_norm': 7.165842116453641, 'learning_rate': 2.8210172707666296e-06, 'epoch': 0.48} 48%|████▊ | 5852/12313 [4:22:50<4:49:55, 2.69s/it] 48%|████▊ | 5853/12313 [4:22:53<4:48:55, 2.68s/it] {'loss': 0.5192, 'grad_norm': 5.786887937045634, 'learning_rate': 2.820365081574271e-06, 'epoch': 0.48} 48%|████▊ | 5853/12313 [4:22:53<4:48:55, 2.68s/it] 48%|████▊ | 5854/12313 [4:22:55<4:43:51, 2.64s/it] {'loss': 0.5769, 'grad_norm': 6.157943347201467, 'learning_rate': 2.819712870214345e-06, 'epoch': 0.48} 48%|████▊ | 5854/12313 [4:22:55<4:43:51, 2.64s/it] 48%|████▊ | 5855/12313 [4:22:58<4:42:12, 2.62s/it] {'loss': 0.4668, 'grad_norm': 15.746368000393133, 'learning_rate': 2.8190606367319806e-06, 'epoch': 0.48} 48%|████▊ | 5855/12313 [4:22:58<4:42:12, 2.62s/it] 48%|████▊ | 5856/12313 [4:23:01<4:44:29, 2.64s/it] {'loss': 0.4084, 'grad_norm': 7.868624147224612, 'learning_rate': 2.8184083811723083e-06, 'epoch': 0.48} 48%|████▊ | 5856/12313 [4:23:01<4:44:29, 2.64s/it] 48%|████▊ | 5857/12313 [4:23:03<4:44:55, 2.65s/it] {'loss': 0.588, 'grad_norm': 16.215616004236644, 'learning_rate': 2.817756103580461e-06, 'epoch': 0.48} 48%|████▊ | 5857/12313 [4:23:03<4:44:55, 2.65s/it] 48%|████▊ | 5858/12313 [4:23:06<4:50:25, 2.70s/it] {'loss': 0.5907, 'grad_norm': 4.747406090572309, 'learning_rate': 2.8171038040015737e-06, 'epoch': 0.48} 48%|████▊ | 5858/12313 [4:23:06<4:50:25, 2.70s/it] 48%|████▊ | 5859/12313 [4:23:09<4:47:21, 2.67s/it] {'loss': 0.5015, 'grad_norm': 4.9449474953603305, 'learning_rate': 2.8164514824807814e-06, 'epoch': 0.48} 48%|████▊ | 5859/12313 [4:23:09<4:47:21, 2.67s/it] 48%|████▊ | 5860/12313 [4:23:11<4:43:48, 2.64s/it] {'loss': 0.6872, 'grad_norm': 13.074384603626209, 'learning_rate': 2.8157991390632206e-06, 'epoch': 0.48} 48%|████▊ | 5860/12313 [4:23:11<4:43:48, 2.64s/it] 48%|████▊ | 5861/12313 [4:23:14<4:49:27, 2.69s/it] {'loss': 0.4646, 'grad_norm': 6.893964042250645, 'learning_rate': 2.8151467737940312e-06, 'epoch': 0.48} 48%|████▊ | 5861/12313 [4:23:14<4:49:27, 2.69s/it] 48%|████▊ | 5862/12313 [4:23:17<4:49:59, 2.70s/it] {'loss': 0.6181, 'grad_norm': 11.048526461851749, 'learning_rate': 2.8144943867183535e-06, 'epoch': 0.48} 48%|████▊ | 5862/12313 [4:23:17<4:49:59, 2.70s/it] 48%|████▊ | 5863/12313 [4:23:20<4:53:09, 2.73s/it] {'loss': 0.5295, 'grad_norm': 5.148676616822644, 'learning_rate': 2.8138419778813274e-06, 'epoch': 0.48} 48%|████▊ | 5863/12313 [4:23:20<4:53:09, 2.73s/it] 48%|████▊ | 5864/12313 [4:23:22<4:44:47, 2.65s/it] {'loss': 0.6112, 'grad_norm': 7.31376261027769, 'learning_rate': 2.8131895473280985e-06, 'epoch': 0.48} 48%|████▊ | 5864/12313 [4:23:22<4:44:47, 2.65s/it] 48%|████▊ | 5865/12313 [4:23:25<4:50:45, 2.71s/it] {'loss': 0.5427, 'grad_norm': 8.847836919115894, 'learning_rate': 2.81253709510381e-06, 'epoch': 0.48} 48%|████▊ | 5865/12313 [4:23:25<4:50:45, 2.71s/it] 48%|████▊ | 5866/12313 [4:23:28<4:54:11, 2.74s/it] {'loss': 0.5135, 'grad_norm': 3.549876129014136, 'learning_rate': 2.811884621253608e-06, 'epoch': 0.48} 48%|████▊ | 5866/12313 [4:23:28<4:54:11, 2.74s/it] 48%|████▊ | 5867/12313 [4:23:30<4:52:18, 2.72s/it] {'loss': 0.6499, 'grad_norm': 4.676010284588651, 'learning_rate': 2.811232125822642e-06, 'epoch': 0.48} 48%|████▊ | 5867/12313 [4:23:30<4:52:18, 2.72s/it] 48%|████▊ | 5868/12313 [4:23:33<4:48:43, 2.69s/it] {'loss': 0.4845, 'grad_norm': 4.475545565711852, 'learning_rate': 2.81057960885606e-06, 'epoch': 0.48} 48%|████▊ | 5868/12313 [4:23:33<4:48:43, 2.69s/it] 48%|████▊ | 5869/12313 [4:23:36<4:49:40, 2.70s/it] {'loss': 0.4768, 'grad_norm': 5.150829015821627, 'learning_rate': 2.8099270703990124e-06, 'epoch': 0.48} 48%|████▊ | 5869/12313 [4:23:36<4:49:40, 2.70s/it] 48%|████▊ | 5870/12313 [4:23:38<4:41:50, 2.62s/it] {'loss': 0.5269, 'grad_norm': 6.981268549767182, 'learning_rate': 2.8092745104966514e-06, 'epoch': 0.48} 48%|████▊ | 5870/12313 [4:23:38<4:41:50, 2.62s/it] 48%|████▊ | 5871/12313 [4:23:41<4:44:14, 2.65s/it] {'loss': 0.5675, 'grad_norm': 4.317381690472616, 'learning_rate': 2.8086219291941314e-06, 'epoch': 0.48} 48%|████▊ | 5871/12313 [4:23:41<4:44:14, 2.65s/it] 48%|████▊ | 5872/12313 [4:23:44<4:48:49, 2.69s/it] {'loss': 0.4985, 'grad_norm': 6.190644029515262, 'learning_rate': 2.807969326536607e-06, 'epoch': 0.48} 48%|████▊ | 5872/12313 [4:23:44<4:48:49, 2.69s/it] 48%|████▊ | 5873/12313 [4:23:47<4:55:29, 2.75s/it] {'loss': 0.5183, 'grad_norm': 3.4066085952901073, 'learning_rate': 2.8073167025692354e-06, 'epoch': 0.48} 48%|████▊ | 5873/12313 [4:23:47<4:55:29, 2.75s/it] 48%|████▊ | 5874/12313 [4:23:50<5:09:06, 2.88s/it] {'loss': 0.5322, 'grad_norm': 5.692886429945919, 'learning_rate': 2.8066640573371747e-06, 'epoch': 0.48} 48%|████▊ | 5874/12313 [4:23:50<5:09:06, 2.88s/it] 48%|████▊ | 5875/12313 [4:23:52<5:00:03, 2.80s/it] {'loss': 0.6323, 'grad_norm': 7.642876548549243, 'learning_rate': 2.8060113908855847e-06, 'epoch': 0.48} 48%|████▊ | 5875/12313 [4:23:52<5:00:03, 2.80s/it] 48%|████▊ | 5876/12313 [4:23:55<4:49:01, 2.69s/it] {'loss': 0.5077, 'grad_norm': 17.72297787277452, 'learning_rate': 2.805358703259624e-06, 'epoch': 0.48} 48%|████▊ | 5876/12313 [4:23:55<4:49:01, 2.69s/it] 48%|████▊ | 5877/12313 [4:23:57<4:45:21, 2.66s/it] {'loss': 0.4381, 'grad_norm': 5.581364142761338, 'learning_rate': 2.8047059945044585e-06, 'epoch': 0.48} 48%|████▊ | 5877/12313 [4:23:57<4:45:21, 2.66s/it] 48%|████▊ | 5878/12313 [4:24:00<4:46:47, 2.67s/it] {'loss': 0.4763, 'grad_norm': 9.67091630066501, 'learning_rate': 2.8040532646652515e-06, 'epoch': 0.48} 48%|████▊ | 5878/12313 [4:24:00<4:46:47, 2.67s/it] 48%|████▊ | 5879/12313 [4:24:03<4:46:17, 2.67s/it] {'loss': 0.5316, 'grad_norm': 4.1891971217435575, 'learning_rate': 2.803400513787166e-06, 'epoch': 0.48} 48%|████▊ | 5879/12313 [4:24:03<4:46:17, 2.67s/it] 48%|████▊ | 5880/12313 [4:24:05<4:47:26, 2.68s/it] {'loss': 0.5111, 'grad_norm': 6.17817844727735, 'learning_rate': 2.802747741915372e-06, 'epoch': 0.48} 48%|████▊ | 5880/12313 [4:24:05<4:47:26, 2.68s/it] 48%|████▊ | 5881/12313 [4:24:08<4:54:48, 2.75s/it] {'loss': 0.5109, 'grad_norm': 4.9847568946301015, 'learning_rate': 2.8020949490950367e-06, 'epoch': 0.48} 48%|████▊ | 5881/12313 [4:24:08<4:54:48, 2.75s/it] 48%|████▊ | 5882/12313 [4:24:11<4:53:54, 2.74s/it] {'loss': 0.6185, 'grad_norm': 4.473462329646816, 'learning_rate': 2.801442135371329e-06, 'epoch': 0.48} 48%|████▊ | 5882/12313 [4:24:11<4:53:54, 2.74s/it] 48%|████▊ | 5883/12313 [4:24:13<4:42:58, 2.64s/it] {'loss': 0.4709, 'grad_norm': 6.199201466381722, 'learning_rate': 2.800789300789421e-06, 'epoch': 0.48} 48%|████▊ | 5883/12313 [4:24:13<4:42:58, 2.64s/it] 48%|████▊ | 5884/12313 [4:24:16<4:54:24, 2.75s/it] {'loss': 0.658, 'grad_norm': 3.4927971670876157, 'learning_rate': 2.8001364453944853e-06, 'epoch': 0.48} 48%|████▊ | 5884/12313 [4:24:16<4:54:24, 2.75s/it] 48%|████▊ | 5885/12313 [4:24:19<4:46:37, 2.68s/it] {'loss': 0.3258, 'grad_norm': 9.252290801408948, 'learning_rate': 2.799483569231696e-06, 'epoch': 0.48} 48%|████▊ | 5885/12313 [4:24:19<4:46:37, 2.68s/it] 48%|████▊ | 5886/12313 [4:24:22<4:44:00, 2.65s/it] {'loss': 0.7327, 'grad_norm': 5.159111959600434, 'learning_rate': 2.798830672346229e-06, 'epoch': 0.48} 48%|████▊ | 5886/12313 [4:24:22<4:44:00, 2.65s/it] 48%|████▊ | 5887/12313 [4:24:24<4:44:24, 2.66s/it] {'loss': 0.5373, 'grad_norm': 4.997715592914455, 'learning_rate': 2.7981777547832604e-06, 'epoch': 0.48} 48%|████▊ | 5887/12313 [4:24:24<4:44:24, 2.66s/it] 48%|████▊ | 5888/12313 [4:24:27<4:37:34, 2.59s/it] {'loss': 0.4249, 'grad_norm': 7.611885925164203, 'learning_rate': 2.7975248165879697e-06, 'epoch': 0.48} 48%|████▊ | 5888/12313 [4:24:27<4:37:34, 2.59s/it] 48%|████▊ | 5889/12313 [4:24:29<4:40:30, 2.62s/it] {'loss': 0.5686, 'grad_norm': 8.402898696861591, 'learning_rate': 2.7968718578055365e-06, 'epoch': 0.48} 48%|████▊ | 5889/12313 [4:24:29<4:40:30, 2.62s/it] 48%|████▊ | 5890/12313 [4:24:32<4:46:09, 2.67s/it] {'loss': 0.5125, 'grad_norm': 4.514667421391678, 'learning_rate': 2.796218878481142e-06, 'epoch': 0.48} 48%|████▊ | 5890/12313 [4:24:32<4:46:09, 2.67s/it] 48%|████▊ | 5891/12313 [4:24:35<4:42:43, 2.64s/it] {'loss': 0.4591, 'grad_norm': 6.200592103587213, 'learning_rate': 2.7955658786599688e-06, 'epoch': 0.48} 48%|████▊ | 5891/12313 [4:24:35<4:42:43, 2.64s/it] 48%|████▊ | 5892/12313 [4:24:37<4:36:37, 2.58s/it] {'loss': 0.6035, 'grad_norm': 3.9141827656240262, 'learning_rate': 2.7949128583872e-06, 'epoch': 0.48} 48%|████▊ | 5892/12313 [4:24:37<4:36:37, 2.58s/it] 48%|████▊ | 5893/12313 [4:24:40<4:44:35, 2.66s/it] {'loss': 0.5105, 'grad_norm': 2.881200756727726, 'learning_rate': 2.7942598177080233e-06, 'epoch': 0.48} 48%|████▊ | 5893/12313 [4:24:40<4:44:35, 2.66s/it] 48%|████▊ | 5894/12313 [4:24:43<4:43:09, 2.65s/it] {'loss': 0.4788, 'grad_norm': 4.876865531343913, 'learning_rate': 2.7936067566676244e-06, 'epoch': 0.48} 48%|████▊ | 5894/12313 [4:24:43<4:43:09, 2.65s/it] 48%|████▊ | 5895/12313 [4:24:45<4:36:14, 2.58s/it] {'loss': 0.4933, 'grad_norm': 4.129321830180821, 'learning_rate': 2.792953675311192e-06, 'epoch': 0.48} 48%|████▊ | 5895/12313 [4:24:45<4:36:14, 2.58s/it] 48%|████▊ | 5896/12313 [4:24:48<4:36:04, 2.58s/it] {'loss': 0.6777, 'grad_norm': 4.975168884090526, 'learning_rate': 2.792300573683915e-06, 'epoch': 0.48} 48%|████▊ | 5896/12313 [4:24:48<4:36:04, 2.58s/it] 48%|████▊ | 5897/12313 [4:24:50<4:41:13, 2.63s/it] {'loss': 0.6101, 'grad_norm': 4.373706953851931, 'learning_rate': 2.7916474518309854e-06, 'epoch': 0.48} 48%|████▊ | 5897/12313 [4:24:50<4:41:13, 2.63s/it] 48%|████▊ | 5898/12313 [4:24:53<4:45:38, 2.67s/it] {'loss': 0.5524, 'grad_norm': 6.787308432476194, 'learning_rate': 2.790994309797596e-06, 'epoch': 0.48} 48%|████▊ | 5898/12313 [4:24:53<4:45:38, 2.67s/it] 48%|████▊ | 5899/12313 [4:24:56<4:37:53, 2.60s/it] {'loss': 0.5455, 'grad_norm': 3.130380915868563, 'learning_rate': 2.79034114762894e-06, 'epoch': 0.48} 48%|████▊ | 5899/12313 [4:24:56<4:37:53, 2.60s/it] 48%|████▊ | 5900/12313 [4:24:58<4:30:32, 2.53s/it] {'loss': 0.405, 'grad_norm': 5.155056558227334, 'learning_rate': 2.789687965370214e-06, 'epoch': 0.48} 48%|████▊ | 5900/12313 [4:24:58<4:30:32, 2.53s/it] 48%|████▊ | 5901/12313 [4:25:01<4:33:14, 2.56s/it] {'loss': 0.498, 'grad_norm': 5.772687941455542, 'learning_rate': 2.7890347630666135e-06, 'epoch': 0.48} 48%|████▊ | 5901/12313 [4:25:01<4:33:14, 2.56s/it] 48%|████▊ | 5902/12313 [4:25:03<4:36:20, 2.59s/it] {'loss': 0.4895, 'grad_norm': 4.464910286196713, 'learning_rate': 2.788381540763337e-06, 'epoch': 0.48} 48%|████▊ | 5902/12313 [4:25:03<4:36:20, 2.59s/it] 48%|████▊ | 5903/12313 [4:25:06<4:36:45, 2.59s/it] {'loss': 0.456, 'grad_norm': 6.222904764201984, 'learning_rate': 2.787728298505584e-06, 'epoch': 0.48} 48%|████▊ | 5903/12313 [4:25:06<4:36:45, 2.59s/it] 48%|████▊ | 5904/12313 [4:25:09<4:46:01, 2.68s/it] {'loss': 0.4607, 'grad_norm': 30.493510617091204, 'learning_rate': 2.787075036338556e-06, 'epoch': 0.48} 48%|████▊ | 5904/12313 [4:25:09<4:46:01, 2.68s/it] 48%|████▊ | 5905/12313 [4:25:11<4:42:11, 2.64s/it] {'loss': 0.545, 'grad_norm': 12.321251377438697, 'learning_rate': 2.7864217543074544e-06, 'epoch': 0.48} 48%|████▊ | 5905/12313 [4:25:11<4:42:11, 2.64s/it] 48%|████▊ | 5906/12313 [4:25:14<4:43:33, 2.66s/it] {'loss': 0.6029, 'grad_norm': 5.80807114653115, 'learning_rate': 2.7857684524574833e-06, 'epoch': 0.48} 48%|████▊ | 5906/12313 [4:25:14<4:43:33, 2.66s/it] 48%|████▊ | 5907/12313 [4:25:17<4:40:49, 2.63s/it] {'loss': 0.5074, 'grad_norm': 7.259661847105047, 'learning_rate': 2.7851151308338483e-06, 'epoch': 0.48} 48%|████▊ | 5907/12313 [4:25:17<4:40:49, 2.63s/it] 48%|████▊ | 5908/12313 [4:25:19<4:50:41, 2.72s/it] {'loss': 0.4931, 'grad_norm': 5.490831475001269, 'learning_rate': 2.784461789481754e-06, 'epoch': 0.48} 48%|████▊ | 5908/12313 [4:25:19<4:50:41, 2.72s/it] 48%|████▊ | 5909/12313 [4:25:22<4:45:56, 2.68s/it] {'loss': 0.3872, 'grad_norm': 10.451899945538207, 'learning_rate': 2.7838084284464105e-06, 'epoch': 0.48} 48%|████▊ | 5909/12313 [4:25:22<4:45:56, 2.68s/it] 48%|████▊ | 5910/12313 [4:25:24<4:37:35, 2.60s/it] {'loss': 0.6436, 'grad_norm': 7.293146517834257, 'learning_rate': 2.7831550477730255e-06, 'epoch': 0.48} 48%|████▊ | 5910/12313 [4:25:24<4:37:35, 2.60s/it] 48%|████▊ | 5911/12313 [4:25:27<4:38:20, 2.61s/it] {'loss': 0.4786, 'grad_norm': 6.817978018689368, 'learning_rate': 2.78250164750681e-06, 'epoch': 0.48} 48%|████▊ | 5911/12313 [4:25:27<4:38:20, 2.61s/it] 48%|████▊ | 5912/12313 [4:25:30<4:44:05, 2.66s/it] {'loss': 0.316, 'grad_norm': 5.17758792887419, 'learning_rate': 2.781848227692974e-06, 'epoch': 0.48} 48%|████▊ | 5912/12313 [4:25:30<4:44:05, 2.66s/it] 48%|████▊ | 5913/12313 [4:25:33<4:42:16, 2.65s/it] {'loss': 0.4606, 'grad_norm': 9.128124210915745, 'learning_rate': 2.7811947883767343e-06, 'epoch': 0.48} 48%|████▊ | 5913/12313 [4:25:33<4:42:16, 2.65s/it] 48%|████▊ | 5914/12313 [4:25:35<4:41:36, 2.64s/it] {'loss': 0.4745, 'grad_norm': 6.296126142033211, 'learning_rate': 2.780541329603303e-06, 'epoch': 0.48} 48%|████▊ | 5914/12313 [4:25:35<4:41:36, 2.64s/it] 48%|████▊ | 5915/12313 [4:25:38<4:39:57, 2.63s/it] {'loss': 0.5816, 'grad_norm': 6.296563355774902, 'learning_rate': 2.7798878514178955e-06, 'epoch': 0.48} 48%|████▊ | 5915/12313 [4:25:38<4:39:57, 2.63s/it] 48%|████▊ | 5916/12313 [4:25:40<4:39:51, 2.62s/it] {'loss': 0.4533, 'grad_norm': 4.08831948588252, 'learning_rate': 2.779234353865731e-06, 'epoch': 0.48} 48%|████▊ | 5916/12313 [4:25:40<4:39:51, 2.62s/it] 48%|████▊ | 5917/12313 [4:25:43<4:40:25, 2.63s/it] {'loss': 0.52, 'grad_norm': 6.198220168009877, 'learning_rate': 2.7785808369920263e-06, 'epoch': 0.48} 48%|████▊ | 5917/12313 [4:25:43<4:40:25, 2.63s/it] 48%|████▊ | 5918/12313 [4:25:46<4:42:47, 2.65s/it] {'loss': 0.4988, 'grad_norm': 7.348946572016875, 'learning_rate': 2.777927300842003e-06, 'epoch': 0.48} 48%|████▊ | 5918/12313 [4:25:46<4:42:47, 2.65s/it] 48%|████▊ | 5919/12313 [4:25:48<4:35:34, 2.59s/it] {'loss': 0.5427, 'grad_norm': 7.4243657712578, 'learning_rate': 2.7772737454608804e-06, 'epoch': 0.48} 48%|████▊ | 5919/12313 [4:25:48<4:35:34, 2.59s/it] 48%|████▊ | 5920/12313 [4:25:51<4:38:05, 2.61s/it] {'loss': 0.6156, 'grad_norm': 6.240520113282074, 'learning_rate': 2.7766201708938823e-06, 'epoch': 0.48} 48%|████▊ | 5920/12313 [4:25:51<4:38:05, 2.61s/it] 48%|████▊ | 5921/12313 [4:25:53<4:33:51, 2.57s/it] {'loss': 0.4755, 'grad_norm': 4.092737052613775, 'learning_rate': 2.7759665771862324e-06, 'epoch': 0.48} 48%|████▊ | 5921/12313 [4:25:53<4:33:51, 2.57s/it] 48%|████▊ | 5922/12313 [4:25:56<4:38:24, 2.61s/it] {'loss': 0.4982, 'grad_norm': 4.543429446499594, 'learning_rate': 2.775312964383156e-06, 'epoch': 0.48} 48%|████▊ | 5922/12313 [4:25:56<4:38:24, 2.61s/it] 48%|████▊ | 5923/12313 [4:25:58<4:33:22, 2.57s/it] {'loss': 0.6698, 'grad_norm': 4.139943817706591, 'learning_rate': 2.77465933252988e-06, 'epoch': 0.48} 48%|████▊ | 5923/12313 [4:25:58<4:33:22, 2.57s/it] 48%|████▊ | 5924/12313 [4:26:01<4:30:46, 2.54s/it] {'loss': 0.4612, 'grad_norm': 4.420260702078683, 'learning_rate': 2.7740056816716317e-06, 'epoch': 0.48} 48%|████▊ | 5924/12313 [4:26:01<4:30:46, 2.54s/it] 48%|████▊ | 5925/12313 [4:26:04<4:54:58, 2.77s/it] {'loss': 0.4372, 'grad_norm': 8.399576878392056, 'learning_rate': 2.7733520118536395e-06, 'epoch': 0.48} 48%|████▊ | 5925/12313 [4:26:04<4:54:58, 2.77s/it] 48%|████▊ | 5926/12313 [4:26:07<4:43:48, 2.67s/it] {'loss': 0.4788, 'grad_norm': 8.529789641771623, 'learning_rate': 2.772698323121135e-06, 'epoch': 0.48} 48%|████▊ | 5926/12313 [4:26:07<4:43:48, 2.67s/it] 48%|████▊ | 5927/12313 [4:26:09<4:44:25, 2.67s/it] {'loss': 0.8274, 'grad_norm': 6.033494247535142, 'learning_rate': 2.7720446155193503e-06, 'epoch': 0.48} 48%|████▊ | 5927/12313 [4:26:09<4:44:25, 2.67s/it] 48%|████▊ | 5928/12313 [4:26:12<4:43:07, 2.66s/it] {'loss': 0.5214, 'grad_norm': 3.680362638542244, 'learning_rate': 2.7713908890935177e-06, 'epoch': 0.48} 48%|████▊ | 5928/12313 [4:26:12<4:43:07, 2.66s/it] 48%|████▊ | 5929/12313 [4:26:15<4:45:14, 2.68s/it] {'loss': 0.7151, 'grad_norm': 7.379140973686685, 'learning_rate': 2.770737143888872e-06, 'epoch': 0.48} 48%|████▊ | 5929/12313 [4:26:15<4:45:14, 2.68s/it] 48%|████▊ | 5930/12313 [4:26:18<4:57:10, 2.79s/it] {'loss': 0.553, 'grad_norm': 3.6816364647280584, 'learning_rate': 2.7700833799506487e-06, 'epoch': 0.48} 48%|████▊ | 5930/12313 [4:26:18<4:57:10, 2.79s/it] 48%|████▊ | 5931/12313 [4:26:20<4:52:36, 2.75s/it] {'loss': 0.4937, 'grad_norm': 5.987951257678209, 'learning_rate': 2.7694295973240848e-06, 'epoch': 0.48} 48%|████▊ | 5931/12313 [4:26:20<4:52:36, 2.75s/it] 48%|████▊ | 5932/12313 [4:26:23<4:52:07, 2.75s/it] {'loss': 0.4982, 'grad_norm': 4.956448565368726, 'learning_rate': 2.7687757960544193e-06, 'epoch': 0.48} 48%|████▊ | 5932/12313 [4:26:23<4:52:07, 2.75s/it] 48%|████▊ | 5933/12313 [4:26:26<4:54:27, 2.77s/it] {'loss': 0.6454, 'grad_norm': 4.49940857003615, 'learning_rate': 2.7681219761868905e-06, 'epoch': 0.48} 48%|████▊ | 5933/12313 [4:26:26<4:54:27, 2.77s/it] 48%|████▊ | 5934/12313 [4:26:28<4:43:14, 2.66s/it] {'loss': 0.5949, 'grad_norm': 4.009481443453415, 'learning_rate': 2.7674681377667403e-06, 'epoch': 0.48} 48%|████▊ | 5934/12313 [4:26:28<4:43:14, 2.66s/it] 48%|████▊ | 5935/12313 [4:26:31<4:43:30, 2.67s/it] {'loss': 0.6751, 'grad_norm': 6.276103431385119, 'learning_rate': 2.7668142808392102e-06, 'epoch': 0.48} 48%|████▊ | 5935/12313 [4:26:31<4:43:30, 2.67s/it] 48%|████▊ | 5936/12313 [4:26:34<4:45:00, 2.68s/it] {'loss': 0.5605, 'grad_norm': 5.264109990138893, 'learning_rate': 2.7661604054495447e-06, 'epoch': 0.48} 48%|████▊ | 5936/12313 [4:26:34<4:45:00, 2.68s/it] 48%|████▊ | 5937/12313 [4:26:36<4:44:20, 2.68s/it] {'loss': 0.5536, 'grad_norm': 6.623257155307936, 'learning_rate': 2.765506511642987e-06, 'epoch': 0.48} 48%|████▊ | 5937/12313 [4:26:36<4:44:20, 2.68s/it] 48%|████▊ | 5938/12313 [4:26:39<4:39:47, 2.63s/it] {'loss': 0.5155, 'grad_norm': 8.180552462619701, 'learning_rate': 2.764852599464784e-06, 'epoch': 0.48} 48%|████▊ | 5938/12313 [4:26:39<4:39:47, 2.63s/it] 48%|████▊ | 5939/12313 [4:26:42<4:44:02, 2.67s/it] {'loss': 0.4975, 'grad_norm': 3.6982416953033996, 'learning_rate': 2.764198668960183e-06, 'epoch': 0.48} 48%|████▊ | 5939/12313 [4:26:42<4:44:02, 2.67s/it] 48%|████▊ | 5940/12313 [4:26:44<4:41:45, 2.65s/it] {'loss': 0.6719, 'grad_norm': 4.58146111767092, 'learning_rate': 2.7635447201744324e-06, 'epoch': 0.48} 48%|████▊ | 5940/12313 [4:26:44<4:41:45, 2.65s/it] 48%|████▊ | 5941/12313 [4:26:47<4:34:39, 2.59s/it] {'loss': 0.5106, 'grad_norm': 4.792428868707111, 'learning_rate': 2.7628907531527815e-06, 'epoch': 0.48} 48%|████▊ | 5941/12313 [4:26:47<4:34:39, 2.59s/it] 48%|████▊ | 5942/12313 [4:26:49<4:33:11, 2.57s/it] {'loss': 0.3998, 'grad_norm': 5.140027653654479, 'learning_rate': 2.762236767940482e-06, 'epoch': 0.48} 48%|████▊ | 5942/12313 [4:26:49<4:33:11, 2.57s/it] 48%|████▊ | 5943/12313 [4:26:52<4:45:38, 2.69s/it] {'loss': 0.4726, 'grad_norm': 4.42487569925798, 'learning_rate': 2.761582764582787e-06, 'epoch': 0.48} 48%|████▊ | 5943/12313 [4:26:52<4:45:38, 2.69s/it] 48%|████▊ | 5944/12313 [4:26:55<4:47:56, 2.71s/it] {'loss': 0.5601, 'grad_norm': 4.997518890904461, 'learning_rate': 2.760928743124948e-06, 'epoch': 0.48} 48%|████▊ | 5944/12313 [4:26:55<4:47:56, 2.71s/it] 48%|████▊ | 5945/12313 [4:26:58<4:51:00, 2.74s/it] {'loss': 0.6182, 'grad_norm': 3.799079838934848, 'learning_rate': 2.7602747036122213e-06, 'epoch': 0.48} 48%|████▊ | 5945/12313 [4:26:58<4:51:00, 2.74s/it] 48%|████▊ | 5946/12313 [4:27:01<4:50:58, 2.74s/it] {'loss': 0.476, 'grad_norm': 10.353271040341017, 'learning_rate': 2.759620646089863e-06, 'epoch': 0.48} 48%|████▊ | 5946/12313 [4:27:01<4:50:58, 2.74s/it] 48%|████▊ | 5947/12313 [4:27:04<4:58:22, 2.81s/it] {'loss': 0.5052, 'grad_norm': 5.666756315530278, 'learning_rate': 2.758966570603129e-06, 'epoch': 0.48} 48%|████▊ | 5947/12313 [4:27:04<4:58:22, 2.81s/it] 48%|████▊ | 5948/12313 [4:27:06<4:55:04, 2.78s/it] {'loss': 0.4994, 'grad_norm': 6.165341654780337, 'learning_rate': 2.7583124771972797e-06, 'epoch': 0.48} 48%|████▊ | 5948/12313 [4:27:06<4:55:04, 2.78s/it] 48%|████▊ | 5949/12313 [4:27:09<4:48:08, 2.72s/it] {'loss': 0.439, 'grad_norm': 4.569577065572749, 'learning_rate': 2.7576583659175738e-06, 'epoch': 0.48} 48%|████▊ | 5949/12313 [4:27:09<4:48:08, 2.72s/it] 48%|████▊ | 5950/12313 [4:27:12<4:46:30, 2.70s/it] {'loss': 0.5527, 'grad_norm': 3.098941821564416, 'learning_rate': 2.7570042368092724e-06, 'epoch': 0.48} 48%|████▊ | 5950/12313 [4:27:12<4:46:30, 2.70s/it] 48%|████▊ | 5951/12313 [4:27:14<4:49:41, 2.73s/it] {'loss': 0.4795, 'grad_norm': 5.497608597420854, 'learning_rate': 2.7563500899176383e-06, 'epoch': 0.48} 48%|████▊ | 5951/12313 [4:27:14<4:49:41, 2.73s/it] 48%|████▊ | 5952/12313 [4:27:17<4:49:57, 2.73s/it] {'loss': 0.4909, 'grad_norm': 5.7165228874758505, 'learning_rate': 2.7556959252879345e-06, 'epoch': 0.48} 48%|████▊ | 5952/12313 [4:27:17<4:49:57, 2.73s/it] 48%|████▊ | 5953/12313 [4:27:20<4:51:21, 2.75s/it] {'loss': 0.4677, 'grad_norm': 5.874962862560617, 'learning_rate': 2.755041742965426e-06, 'epoch': 0.48} 48%|████▊ | 5953/12313 [4:27:20<4:51:21, 2.75s/it] 48%|████▊ | 5954/12313 [4:27:22<4:46:15, 2.70s/it] {'loss': 0.6461, 'grad_norm': 4.095451834207825, 'learning_rate': 2.7543875429953787e-06, 'epoch': 0.48} 48%|████▊ | 5954/12313 [4:27:22<4:46:15, 2.70s/it] 48%|████▊ | 5955/12313 [4:27:25<4:46:55, 2.71s/it] {'loss': 0.4963, 'grad_norm': 4.473317225148218, 'learning_rate': 2.7537333254230596e-06, 'epoch': 0.48} 48%|████▊ | 5955/12313 [4:27:25<4:46:55, 2.71s/it] 48%|████▊ | 5956/12313 [4:27:28<4:43:20, 2.67s/it] {'loss': 0.4534, 'grad_norm': 5.773342053411397, 'learning_rate': 2.7530790902937376e-06, 'epoch': 0.48} 48%|████▊ | 5956/12313 [4:27:28<4:43:20, 2.67s/it] 48%|████▊ | 5957/12313 [4:27:30<4:35:49, 2.60s/it] {'loss': 0.6875, 'grad_norm': 3.510346924849262, 'learning_rate': 2.752424837652681e-06, 'epoch': 0.48} 48%|████▊ | 5957/12313 [4:27:30<4:35:49, 2.60s/it] 48%|████▊ | 5958/12313 [4:27:33<4:43:00, 2.67s/it] {'loss': 0.5412, 'grad_norm': 7.132250653049774, 'learning_rate': 2.751770567545163e-06, 'epoch': 0.48} 48%|████▊ | 5958/12313 [4:27:33<4:43:00, 2.67s/it] 48%|████▊ | 5959/12313 [4:27:36<4:47:19, 2.71s/it] {'loss': 0.7837, 'grad_norm': 5.085327115862414, 'learning_rate': 2.7511162800164536e-06, 'epoch': 0.48} 48%|████▊ | 5959/12313 [4:27:36<4:47:19, 2.71s/it] 48%|████▊ | 5960/12313 [4:27:40<5:19:31, 3.02s/it] {'loss': 0.5815, 'grad_norm': 7.899170324549878, 'learning_rate': 2.7504619751118266e-06, 'epoch': 0.48} 48%|████▊ | 5960/12313 [4:27:40<5:19:31, 3.02s/it] 48%|████▊ | 5961/12313 [4:27:43<5:17:47, 3.00s/it] {'loss': 0.5867, 'grad_norm': 12.534688873820935, 'learning_rate': 2.749807652876556e-06, 'epoch': 0.48} 48%|████▊ | 5961/12313 [4:27:43<5:17:47, 3.00s/it] 48%|████▊ | 5962/12313 [4:27:45<5:09:13, 2.92s/it] {'loss': 0.4844, 'grad_norm': 5.888585867450327, 'learning_rate': 2.749153313355919e-06, 'epoch': 0.48} 48%|████▊ | 5962/12313 [4:27:45<5:09:13, 2.92s/it] 48%|████▊ | 5963/12313 [4:27:48<5:00:49, 2.84s/it] {'loss': 0.4221, 'grad_norm': 4.658146495411607, 'learning_rate': 2.74849895659519e-06, 'epoch': 0.48} 48%|████▊ | 5963/12313 [4:27:48<5:00:49, 2.84s/it] 48%|████▊ | 5964/12313 [4:27:51<4:57:40, 2.81s/it] {'loss': 0.3995, 'grad_norm': 8.00901759996569, 'learning_rate': 2.7478445826396495e-06, 'epoch': 0.48} 48%|████▊ | 5964/12313 [4:27:51<4:57:40, 2.81s/it] 48%|████▊ | 5965/12313 [4:27:54<5:07:37, 2.91s/it] {'loss': 0.5922, 'grad_norm': 4.722557156452973, 'learning_rate': 2.747190191534575e-06, 'epoch': 0.48} 48%|████▊ | 5965/12313 [4:27:54<5:07:37, 2.91s/it] 48%|████▊ | 5966/12313 [4:27:56<4:58:23, 2.82s/it] {'loss': 0.483, 'grad_norm': 4.907315907025934, 'learning_rate': 2.7465357833252477e-06, 'epoch': 0.48} 48%|████▊ | 5966/12313 [4:27:56<4:58:23, 2.82s/it] 48%|████▊ | 5967/12313 [4:27:59<4:59:57, 2.84s/it] {'loss': 0.5136, 'grad_norm': 3.780776893956639, 'learning_rate': 2.7458813580569487e-06, 'epoch': 0.48} 48%|████▊ | 5967/12313 [4:27:59<4:59:57, 2.84s/it] 48%|████▊ | 5968/12313 [4:28:02<4:58:29, 2.82s/it] {'loss': 0.6048, 'grad_norm': 4.759267167903013, 'learning_rate': 2.7452269157749614e-06, 'epoch': 0.48} 48%|████▊ | 5968/12313 [4:28:02<4:58:29, 2.82s/it] 48%|████▊ | 5969/12313 [4:28:05<4:47:43, 2.72s/it] {'loss': 0.5071, 'grad_norm': 5.977936036351052, 'learning_rate': 2.744572456524569e-06, 'epoch': 0.48} 48%|████▊ | 5969/12313 [4:28:05<4:47:43, 2.72s/it] 48%|████▊ | 5970/12313 [4:28:07<4:41:59, 2.67s/it] {'loss': 0.3259, 'grad_norm': 5.801052897330955, 'learning_rate': 2.7439179803510567e-06, 'epoch': 0.48} 48%|████▊ | 5970/12313 [4:28:07<4:41:59, 2.67s/it] 48%|████▊ | 5971/12313 [4:28:10<4:38:25, 2.63s/it] {'loss': 0.454, 'grad_norm': 5.504537267230917, 'learning_rate': 2.7432634872997123e-06, 'epoch': 0.48} 48%|████▊ | 5971/12313 [4:28:10<4:38:25, 2.63s/it] 49%|████▊ | 5972/12313 [4:28:12<4:42:09, 2.67s/it] {'loss': 0.5151, 'grad_norm': 5.980973589761268, 'learning_rate': 2.7426089774158217e-06, 'epoch': 0.49} 49%|████▊ | 5972/12313 [4:28:12<4:42:09, 2.67s/it] 49%|████▊ | 5973/12313 [4:28:15<4:41:17, 2.66s/it] {'loss': 0.5434, 'grad_norm': 6.090787296944255, 'learning_rate': 2.7419544507446727e-06, 'epoch': 0.49} 49%|████▊ | 5973/12313 [4:28:15<4:41:17, 2.66s/it] 49%|████▊ | 5974/12313 [4:28:18<4:41:38, 2.67s/it] {'loss': 0.6548, 'grad_norm': 5.184785779288021, 'learning_rate': 2.7412999073315567e-06, 'epoch': 0.49} 49%|████▊ | 5974/12313 [4:28:18<4:41:38, 2.67s/it] 49%|████▊ | 5975/12313 [4:28:21<4:44:57, 2.70s/it] {'loss': 0.5838, 'grad_norm': 3.168315464569746, 'learning_rate': 2.7406453472217654e-06, 'epoch': 0.49} 49%|████▊ | 5975/12313 [4:28:21<4:44:57, 2.70s/it] 49%|████▊ | 5976/12313 [4:28:23<4:41:55, 2.67s/it] {'loss': 0.4268, 'grad_norm': 5.7401424667695204, 'learning_rate': 2.7399907704605884e-06, 'epoch': 0.49} 49%|████▊ | 5976/12313 [4:28:23<4:41:55, 2.67s/it] 49%|████▊ | 5977/12313 [4:28:26<4:42:28, 2.67s/it] {'loss': 0.4986, 'grad_norm': 8.886061000991853, 'learning_rate': 2.7393361770933198e-06, 'epoch': 0.49} 49%|████▊ | 5977/12313 [4:28:26<4:42:28, 2.67s/it] 49%|████▊ | 5978/12313 [4:28:28<4:41:28, 2.67s/it] {'loss': 0.6466, 'grad_norm': 4.804673355461681, 'learning_rate': 2.7386815671652556e-06, 'epoch': 0.49} 49%|████▊ | 5978/12313 [4:28:28<4:41:28, 2.67s/it] 49%|████▊ | 5979/12313 [4:28:32<5:08:50, 2.93s/it] {'loss': 0.4684, 'grad_norm': 5.321396681856843, 'learning_rate': 2.7380269407216896e-06, 'epoch': 0.49} 49%|████▊ | 5979/12313 [4:28:32<5:08:50, 2.93s/it] 49%|████▊ | 5980/12313 [4:28:35<5:00:04, 2.84s/it] {'loss': 0.6947, 'grad_norm': 5.686891915170021, 'learning_rate': 2.737372297807919e-06, 'epoch': 0.49} 49%|████▊ | 5980/12313 [4:28:35<5:00:04, 2.84s/it] 49%|████▊ | 5981/12313 [4:28:37<4:49:21, 2.74s/it] {'loss': 0.4924, 'grad_norm': 6.348245493300976, 'learning_rate': 2.7367176384692425e-06, 'epoch': 0.49} 49%|████▊ | 5981/12313 [4:28:37<4:49:21, 2.74s/it] 49%|████▊ | 5982/12313 [4:28:40<4:49:40, 2.75s/it] {'loss': 0.5165, 'grad_norm': 6.0903339297311545, 'learning_rate': 2.736062962750957e-06, 'epoch': 0.49} 49%|████▊ | 5982/12313 [4:28:40<4:49:40, 2.75s/it] 49%|████▊ | 5983/12313 [4:28:43<4:48:48, 2.74s/it] {'loss': 0.4462, 'grad_norm': 11.757378358765155, 'learning_rate': 2.735408270698364e-06, 'epoch': 0.49} 49%|████▊ | 5983/12313 [4:28:43<4:48:48, 2.74s/it] 49%|████▊ | 5984/12313 [4:28:45<4:51:52, 2.77s/it] {'loss': 0.6067, 'grad_norm': 16.683533350645916, 'learning_rate': 2.7347535623567656e-06, 'epoch': 0.49} 49%|████▊ | 5984/12313 [4:28:45<4:51:52, 2.77s/it] 49%|████▊ | 5985/12313 [4:28:48<4:45:50, 2.71s/it] {'loss': 0.5032, 'grad_norm': 7.944296108437706, 'learning_rate': 2.734098837771462e-06, 'epoch': 0.49} 49%|████▊ | 5985/12313 [4:28:48<4:45:50, 2.71s/it] 49%|████▊ | 5986/12313 [4:28:51<4:43:58, 2.69s/it] {'loss': 0.5087, 'grad_norm': 5.943425981392983, 'learning_rate': 2.7334440969877584e-06, 'epoch': 0.49} 49%|████▊ | 5986/12313 [4:28:51<4:43:58, 2.69s/it] 49%|████▊ | 5987/12313 [4:28:53<4:42:46, 2.68s/it] {'loss': 0.5818, 'grad_norm': 3.3987635620526686, 'learning_rate': 2.7327893400509586e-06, 'epoch': 0.49} 49%|████▊ | 5987/12313 [4:28:53<4:42:46, 2.68s/it] 49%|████▊ | 5988/12313 [4:28:56<4:41:31, 2.67s/it] {'loss': 0.4364, 'grad_norm': 6.620696249187532, 'learning_rate': 2.732134567006368e-06, 'epoch': 0.49} 49%|████▊ | 5988/12313 [4:28:56<4:41:31, 2.67s/it] 49%|████▊ | 5989/12313 [4:28:59<4:37:54, 2.64s/it] {'loss': 0.4405, 'grad_norm': 5.572056490122121, 'learning_rate': 2.731479777899295e-06, 'epoch': 0.49} 49%|████▊ | 5989/12313 [4:28:59<4:37:54, 2.64s/it] 49%|████▊ | 5990/12313 [4:29:01<4:44:17, 2.70s/it] {'loss': 0.6067, 'grad_norm': 6.354284008710806, 'learning_rate': 2.730824972775045e-06, 'epoch': 0.49} 49%|████▊ | 5990/12313 [4:29:01<4:44:17, 2.70s/it] 49%|████▊ | 5991/12313 [4:29:04<4:56:45, 2.82s/it] {'loss': 0.3966, 'grad_norm': 5.802616735842974, 'learning_rate': 2.7301701516789303e-06, 'epoch': 0.49} 49%|████▊ | 5991/12313 [4:29:04<4:56:45, 2.82s/it] 49%|████▊ | 5992/12313 [4:29:08<5:16:07, 3.00s/it] {'loss': 0.5996, 'grad_norm': 4.459624839521087, 'learning_rate': 2.729515314656258e-06, 'epoch': 0.49} 49%|████▊ | 5992/12313 [4:29:08<5:16:07, 3.00s/it] 49%|████▊ | 5993/12313 [4:29:11<5:16:20, 3.00s/it] {'loss': 0.5517, 'grad_norm': 7.157939161361248, 'learning_rate': 2.7288604617523405e-06, 'epoch': 0.49} 49%|████▊ | 5993/12313 [4:29:11<5:16:20, 3.00s/it] 49%|████▊ | 5994/12313 [4:29:14<5:03:39, 2.88s/it] {'loss': 0.4596, 'grad_norm': 5.292457922278461, 'learning_rate': 2.728205593012491e-06, 'epoch': 0.49} 49%|████▊ | 5994/12313 [4:29:14<5:03:39, 2.88s/it] 49%|████▊ | 5995/12313 [4:29:16<4:57:58, 2.83s/it] {'loss': 0.6614, 'grad_norm': 5.048505829129327, 'learning_rate': 2.7275507084820226e-06, 'epoch': 0.49} 49%|████▊ | 5995/12313 [4:29:16<4:57:58, 2.83s/it] 49%|████▊ | 5996/12313 [4:29:19<4:55:23, 2.81s/it] {'loss': 0.4645, 'grad_norm': 5.999484690389678, 'learning_rate': 2.726895808206248e-06, 'epoch': 0.49} 49%|████▊ | 5996/12313 [4:29:19<4:55:23, 2.81s/it] 49%|████▊ | 5997/12313 [4:29:22<4:51:12, 2.77s/it] {'loss': 0.5658, 'grad_norm': 4.707719844104531, 'learning_rate': 2.7262408922304857e-06, 'epoch': 0.49} 49%|████▊ | 5997/12313 [4:29:22<4:51:12, 2.77s/it] 49%|████▊ | 5998/12313 [4:29:24<4:46:52, 2.73s/it] {'loss': 0.4844, 'grad_norm': 9.302118644783453, 'learning_rate': 2.72558596060005e-06, 'epoch': 0.49} 49%|████▊ | 5998/12313 [4:29:24<4:46:52, 2.73s/it] 49%|████▊ | 5999/12313 [4:29:27<4:50:23, 2.76s/it] {'loss': 0.4755, 'grad_norm': 6.41310409364449, 'learning_rate': 2.72493101336026e-06, 'epoch': 0.49} 49%|████▊ | 5999/12313 [4:29:27<4:50:23, 2.76s/it] 49%|████▊ | 6000/12313 [4:29:30<4:48:14, 2.74s/it] {'loss': 0.4443, 'grad_norm': 7.372592645487124, 'learning_rate': 2.7242760505564346e-06, 'epoch': 0.49} 49%|████▊ | 6000/12313 [4:29:30<4:48:14, 2.74s/it] 49%|████▊ | 6001/12313 [4:29:32<4:36:11, 2.63s/it] {'loss': 0.6266, 'grad_norm': 5.478164117803782, 'learning_rate': 2.7236210722338936e-06, 'epoch': 0.49} 49%|████▊ | 6001/12313 [4:29:32<4:36:11, 2.63s/it] 49%|████▊ | 6002/12313 [4:29:35<4:34:03, 2.61s/it] {'loss': 0.6028, 'grad_norm': 4.451621100299445, 'learning_rate': 2.7229660784379575e-06, 'epoch': 0.49} 49%|████▊ | 6002/12313 [4:29:35<4:34:03, 2.61s/it] 49%|████▉ | 6003/12313 [4:29:37<4:30:22, 2.57s/it] {'loss': 0.3843, 'grad_norm': 4.829301062695959, 'learning_rate': 2.7223110692139487e-06, 'epoch': 0.49} 49%|████▉ | 6003/12313 [4:29:37<4:30:22, 2.57s/it] 49%|████▉ | 6004/12313 [4:29:40<4:35:19, 2.62s/it] {'loss': 0.5373, 'grad_norm': 8.161182349000478, 'learning_rate': 2.7216560446071904e-06, 'epoch': 0.49} 49%|████▉ | 6004/12313 [4:29:40<4:35:19, 2.62s/it] 49%|████▉ | 6005/12313 [4:29:43<4:35:40, 2.62s/it] {'loss': 0.5209, 'grad_norm': 5.075301630204499, 'learning_rate': 2.721001004663008e-06, 'epoch': 0.49} 49%|████▉ | 6005/12313 [4:29:43<4:35:40, 2.62s/it] 49%|████▉ | 6006/12313 [4:29:45<4:43:26, 2.70s/it] {'loss': 0.4714, 'grad_norm': 3.5392282378757094, 'learning_rate': 2.7203459494267243e-06, 'epoch': 0.49} 49%|████▉ | 6006/12313 [4:29:45<4:43:26, 2.70s/it] 49%|████▉ | 6007/12313 [4:29:48<4:50:35, 2.76s/it] {'loss': 0.6381, 'grad_norm': 3.3493451655565147, 'learning_rate': 2.719690878943668e-06, 'epoch': 0.49} 49%|████▉ | 6007/12313 [4:29:48<4:50:35, 2.76s/it] 49%|████▉ | 6008/12313 [4:29:51<4:42:08, 2.68s/it] {'loss': 0.5869, 'grad_norm': 11.279877740153177, 'learning_rate': 2.7190357932591653e-06, 'epoch': 0.49} 49%|████▉ | 6008/12313 [4:29:51<4:42:08, 2.68s/it] 49%|████▉ | 6009/12313 [4:29:54<4:41:56, 2.68s/it] {'loss': 0.4589, 'grad_norm': 15.655656745532893, 'learning_rate': 2.7183806924185447e-06, 'epoch': 0.49} 49%|████▉ | 6009/12313 [4:29:54<4:41:56, 2.68s/it] 49%|████▉ | 6010/12313 [4:29:56<4:48:00, 2.74s/it] {'loss': 0.5564, 'grad_norm': 7.534260268163045, 'learning_rate': 2.717725576467136e-06, 'epoch': 0.49} 49%|████▉ | 6010/12313 [4:29:56<4:48:00, 2.74s/it] 49%|████▉ | 6011/12313 [4:29:59<4:47:36, 2.74s/it] {'loss': 0.4806, 'grad_norm': 8.16602240519616, 'learning_rate': 2.71707044545027e-06, 'epoch': 0.49} 49%|████▉ | 6011/12313 [4:29:59<4:47:36, 2.74s/it] 49%|████▉ | 6012/12313 [4:30:02<4:50:55, 2.77s/it] {'loss': 0.4948, 'grad_norm': 5.065942666535822, 'learning_rate': 2.716415299413278e-06, 'epoch': 0.49} 49%|████▉ | 6012/12313 [4:30:02<4:50:55, 2.77s/it] 49%|████▉ | 6013/12313 [4:30:05<4:50:03, 2.76s/it] {'loss': 0.5663, 'grad_norm': 6.1717583953297455, 'learning_rate': 2.7157601384014927e-06, 'epoch': 0.49} 49%|████▉ | 6013/12313 [4:30:05<4:50:03, 2.76s/it] 49%|████▉ | 6014/12313 [4:30:07<4:47:16, 2.74s/it] {'loss': 0.6468, 'grad_norm': 5.9450854471553125, 'learning_rate': 2.7151049624602473e-06, 'epoch': 0.49} 49%|████▉ | 6014/12313 [4:30:07<4:47:16, 2.74s/it] 49%|████▉ | 6015/12313 [4:30:11<4:59:23, 2.85s/it] {'loss': 0.4685, 'grad_norm': 6.376587879616875, 'learning_rate': 2.714449771634877e-06, 'epoch': 0.49} 49%|████▉ | 6015/12313 [4:30:11<4:59:23, 2.85s/it] 49%|████▉ | 6016/12313 [4:30:13<5:01:56, 2.88s/it] {'loss': 0.438, 'grad_norm': 4.341090257406891, 'learning_rate': 2.713794565970718e-06, 'epoch': 0.49} 49%|████▉ | 6016/12313 [4:30:13<5:01:56, 2.88s/it] 49%|████▉ | 6017/12313 [4:30:16<4:56:34, 2.83s/it] {'loss': 0.6089, 'grad_norm': 4.016157690853844, 'learning_rate': 2.7131393455131057e-06, 'epoch': 0.49} 49%|████▉ | 6017/12313 [4:30:16<4:56:34, 2.83s/it] 49%|████▉ | 6018/12313 [4:30:19<4:45:02, 2.72s/it] {'loss': 0.3652, 'grad_norm': 2.638895965006909, 'learning_rate': 2.7124841103073794e-06, 'epoch': 0.49} 49%|████▉ | 6018/12313 [4:30:19<4:45:02, 2.72s/it] 49%|████▉ | 6019/12313 [4:30:22<4:50:37, 2.77s/it] {'loss': 0.5685, 'grad_norm': 9.501098441788494, 'learning_rate': 2.711828860398877e-06, 'epoch': 0.49} 49%|████▉ | 6019/12313 [4:30:22<4:50:37, 2.77s/it] 49%|████▉ | 6020/12313 [4:30:24<4:44:39, 2.71s/it] {'loss': 0.3793, 'grad_norm': 6.054200225549085, 'learning_rate': 2.7111735958329383e-06, 'epoch': 0.49} 49%|████▉ | 6020/12313 [4:30:24<4:44:39, 2.71s/it] 49%|████▉ | 6021/12313 [4:30:27<4:40:30, 2.67s/it] {'loss': 0.547, 'grad_norm': 5.616508580837288, 'learning_rate': 2.7105183166549048e-06, 'epoch': 0.49} 49%|████▉ | 6021/12313 [4:30:27<4:40:30, 2.67s/it] 49%|████▉ | 6022/12313 [4:30:29<4:36:51, 2.64s/it] {'loss': 0.5794, 'grad_norm': 6.361440708041633, 'learning_rate': 2.7098630229101174e-06, 'epoch': 0.49} 49%|████▉ | 6022/12313 [4:30:29<4:36:51, 2.64s/it] 49%|████▉ | 6023/12313 [4:30:32<4:31:38, 2.59s/it] {'loss': 0.6857, 'grad_norm': 9.543869316221988, 'learning_rate': 2.70920771464392e-06, 'epoch': 0.49} 49%|████▉ | 6023/12313 [4:30:32<4:31:38, 2.59s/it] 49%|████▉ | 6024/12313 [4:30:34<4:23:54, 2.52s/it] {'loss': 0.6798, 'grad_norm': 4.931811725830596, 'learning_rate': 2.708552391901656e-06, 'epoch': 0.49} 49%|████▉ | 6024/12313 [4:30:34<4:23:54, 2.52s/it] 49%|████▉ | 6025/12313 [4:30:37<4:28:00, 2.56s/it] {'loss': 0.6456, 'grad_norm': 7.691562606616547, 'learning_rate': 2.70789705472867e-06, 'epoch': 0.49} 49%|████▉ | 6025/12313 [4:30:37<4:28:00, 2.56s/it] 49%|████▉ | 6026/12313 [4:30:39<4:31:02, 2.59s/it] {'loss': 0.4363, 'grad_norm': 3.9609368324412286, 'learning_rate': 2.707241703170308e-06, 'epoch': 0.49} 49%|████▉ | 6026/12313 [4:30:39<4:31:02, 2.59s/it] 49%|████▉ | 6027/12313 [4:30:42<4:34:18, 2.62s/it] {'loss': 0.4752, 'grad_norm': 5.514465718417427, 'learning_rate': 2.706586337271917e-06, 'epoch': 0.49} 49%|████▉ | 6027/12313 [4:30:42<4:34:18, 2.62s/it] 49%|████▉ | 6028/12313 [4:30:45<4:41:33, 2.69s/it] {'loss': 0.5089, 'grad_norm': 4.986568690370059, 'learning_rate': 2.705930957078845e-06, 'epoch': 0.49} 49%|████▉ | 6028/12313 [4:30:45<4:41:33, 2.69s/it] 49%|████▉ | 6029/12313 [4:30:48<4:42:48, 2.70s/it] {'loss': 0.5348, 'grad_norm': 4.59026441450655, 'learning_rate': 2.705275562636441e-06, 'epoch': 0.49} 49%|████▉ | 6029/12313 [4:30:48<4:42:48, 2.70s/it] 49%|████▉ | 6030/12313 [4:30:50<4:33:33, 2.61s/it] {'loss': 0.5789, 'grad_norm': 5.882838457335099, 'learning_rate': 2.7046201539900537e-06, 'epoch': 0.49} 49%|████▉ | 6030/12313 [4:30:50<4:33:33, 2.61s/it] 49%|████▉ | 6031/12313 [4:30:53<4:35:54, 2.64s/it] {'loss': 0.5857, 'grad_norm': 6.273818536620701, 'learning_rate': 2.7039647311850347e-06, 'epoch': 0.49} 49%|████▉ | 6031/12313 [4:30:53<4:35:54, 2.64s/it] 49%|████▉ | 6032/12313 [4:30:56<4:47:37, 2.75s/it] {'loss': 0.4136, 'grad_norm': 4.353727277852687, 'learning_rate': 2.7033092942667362e-06, 'epoch': 0.49} 49%|████▉ | 6032/12313 [4:30:56<4:47:37, 2.75s/it] 49%|████▉ | 6033/12313 [4:30:58<4:37:48, 2.65s/it] {'loss': 0.4196, 'grad_norm': 11.49874908610111, 'learning_rate': 2.70265384328051e-06, 'epoch': 0.49} 49%|████▉ | 6033/12313 [4:30:58<4:37:48, 2.65s/it] 49%|████▉ | 6034/12313 [4:31:01<4:39:53, 2.67s/it] {'loss': 0.4894, 'grad_norm': 8.579866589240709, 'learning_rate': 2.701998378271711e-06, 'epoch': 0.49} 49%|████▉ | 6034/12313 [4:31:01<4:39:53, 2.67s/it] 49%|████▉ | 6035/12313 [4:31:04<4:37:12, 2.65s/it] {'loss': 0.5139, 'grad_norm': 25.037657431765965, 'learning_rate': 2.7013428992856925e-06, 'epoch': 0.49} 49%|████▉ | 6035/12313 [4:31:04<4:37:12, 2.65s/it] 49%|████▉ | 6036/12313 [4:31:06<4:36:51, 2.65s/it] {'loss': 0.6387, 'grad_norm': 11.228244377773777, 'learning_rate': 2.700687406367812e-06, 'epoch': 0.49} 49%|████▉ | 6036/12313 [4:31:06<4:36:51, 2.65s/it] 49%|████▉ | 6037/12313 [4:31:09<4:37:37, 2.65s/it] {'loss': 0.5617, 'grad_norm': 4.944749953225672, 'learning_rate': 2.700031899563425e-06, 'epoch': 0.49} 49%|████▉ | 6037/12313 [4:31:09<4:37:37, 2.65s/it] 49%|████▉ | 6038/12313 [4:31:11<4:30:15, 2.58s/it] {'loss': 0.6037, 'grad_norm': 11.318432666073637, 'learning_rate': 2.6993763789178885e-06, 'epoch': 0.49} 49%|████▉ | 6038/12313 [4:31:11<4:30:15, 2.58s/it] 49%|████▉ | 6039/12313 [4:31:14<4:25:04, 2.53s/it] {'loss': 0.6919, 'grad_norm': 4.831316548854263, 'learning_rate': 2.698720844476562e-06, 'epoch': 0.49} 49%|████▉ | 6039/12313 [4:31:14<4:25:04, 2.53s/it] 49%|████▉ | 6040/12313 [4:31:16<4:31:57, 2.60s/it] {'loss': 0.5024, 'grad_norm': 6.464718545938012, 'learning_rate': 2.6980652962848055e-06, 'epoch': 0.49} 49%|████▉ | 6040/12313 [4:31:16<4:31:57, 2.60s/it] 49%|████▉ | 6041/12313 [4:31:19<4:36:10, 2.64s/it] {'loss': 0.4774, 'grad_norm': 8.179746403330494, 'learning_rate': 2.697409734387978e-06, 'epoch': 0.49} 49%|████▉ | 6041/12313 [4:31:19<4:36:10, 2.64s/it] 49%|████▉ | 6042/12313 [4:31:22<4:34:51, 2.63s/it] {'loss': 0.4216, 'grad_norm': 6.753372733627089, 'learning_rate': 2.6967541588314413e-06, 'epoch': 0.49} 49%|████▉ | 6042/12313 [4:31:22<4:34:51, 2.63s/it] 49%|████▉ | 6043/12313 [4:31:25<4:39:27, 2.67s/it] {'loss': 0.5735, 'grad_norm': 3.5059703839980707, 'learning_rate': 2.6960985696605583e-06, 'epoch': 0.49} 49%|████▉ | 6043/12313 [4:31:25<4:39:27, 2.67s/it] 49%|████▉ | 6044/12313 [4:31:27<4:32:21, 2.61s/it] {'loss': 0.4922, 'grad_norm': 6.5528726281723335, 'learning_rate': 2.695442966920693e-06, 'epoch': 0.49} 49%|████▉ | 6044/12313 [4:31:27<4:32:21, 2.61s/it] 49%|████▉ | 6045/12313 [4:31:30<4:35:42, 2.64s/it] {'loss': 0.5194, 'grad_norm': 6.2801153498685744, 'learning_rate': 2.6947873506572083e-06, 'epoch': 0.49} 49%|████▉ | 6045/12313 [4:31:30<4:35:42, 2.64s/it] 49%|████▉ | 6046/12313 [4:31:33<4:42:21, 2.70s/it] {'loss': 0.7296, 'grad_norm': 2.729788705777957, 'learning_rate': 2.6941317209154694e-06, 'epoch': 0.49} 49%|████▉ | 6046/12313 [4:31:33<4:42:21, 2.70s/it] 49%|████▉ | 6047/12313 [4:31:35<4:37:08, 2.65s/it] {'loss': 0.4946, 'grad_norm': 5.110271057688403, 'learning_rate': 2.693476077740843e-06, 'epoch': 0.49} 49%|████▉ | 6047/12313 [4:31:35<4:37:08, 2.65s/it] 49%|████▉ | 6048/12313 [4:31:38<4:54:11, 2.82s/it] {'loss': 0.5102, 'grad_norm': 5.133921081919526, 'learning_rate': 2.6928204211786957e-06, 'epoch': 0.49} 49%|████▉ | 6048/12313 [4:31:38<4:54:11, 2.82s/it] 49%|████▉ | 6049/12313 [4:31:42<5:07:38, 2.95s/it] {'loss': 0.4642, 'grad_norm': 4.1327143511126, 'learning_rate': 2.6921647512743963e-06, 'epoch': 0.49} 49%|████▉ | 6049/12313 [4:31:42<5:07:38, 2.95s/it] 49%|████▉ | 6050/12313 [4:31:44<4:50:33, 2.78s/it] {'loss': 0.356, 'grad_norm': 13.617171548443578, 'learning_rate': 2.691509068073313e-06, 'epoch': 0.49} 49%|████▉ | 6050/12313 [4:31:44<4:50:33, 2.78s/it] 49%|████▉ | 6051/12313 [4:31:47<4:43:52, 2.72s/it] {'loss': 0.5327, 'grad_norm': 5.719645297157029, 'learning_rate': 2.6908533716208157e-06, 'epoch': 0.49} 49%|████▉ | 6051/12313 [4:31:47<4:43:52, 2.72s/it] 49%|████▉ | 6052/12313 [4:31:49<4:43:04, 2.71s/it] {'loss': 0.6186, 'grad_norm': 3.9479135140894224, 'learning_rate': 2.690197661962275e-06, 'epoch': 0.49} 49%|████▉ | 6052/12313 [4:31:49<4:43:04, 2.71s/it] 49%|████▉ | 6053/12313 [4:31:52<4:40:01, 2.68s/it] {'loss': 0.3574, 'grad_norm': 7.503761117502973, 'learning_rate': 2.6895419391430635e-06, 'epoch': 0.49} 49%|████▉ | 6053/12313 [4:31:52<4:40:01, 2.68s/it] 49%|████▉ | 6054/12313 [4:31:55<4:47:10, 2.75s/it] {'loss': 0.4214, 'grad_norm': 4.597046243404988, 'learning_rate': 2.688886203208552e-06, 'epoch': 0.49} 49%|████▉ | 6054/12313 [4:31:55<4:47:10, 2.75s/it] 49%|████▉ | 6055/12313 [4:31:57<4:45:28, 2.74s/it] {'loss': 0.4696, 'grad_norm': 19.152422694361487, 'learning_rate': 2.6882304542041147e-06, 'epoch': 0.49} 49%|████▉ | 6055/12313 [4:31:57<4:45:28, 2.74s/it] 49%|████▉ | 6056/12313 [4:32:00<4:35:41, 2.64s/it] {'loss': 0.4959, 'grad_norm': 6.515260679020386, 'learning_rate': 2.687574692175127e-06, 'epoch': 0.49} 49%|████▉ | 6056/12313 [4:32:00<4:35:41, 2.64s/it] 49%|████▉ | 6057/12313 [4:32:02<4:34:33, 2.63s/it] {'loss': 0.3626, 'grad_norm': 4.2821690669001224, 'learning_rate': 2.6869189171669637e-06, 'epoch': 0.49} 49%|████▉ | 6057/12313 [4:32:02<4:34:33, 2.63s/it] 49%|████▉ | 6058/12313 [4:32:05<4:35:41, 2.64s/it] {'loss': 0.5963, 'grad_norm': 7.192130700433451, 'learning_rate': 2.686263129224999e-06, 'epoch': 0.49} 49%|████▉ | 6058/12313 [4:32:05<4:35:41, 2.64s/it] 49%|████▉ | 6059/12313 [4:32:08<4:44:59, 2.73s/it] {'loss': 0.4168, 'grad_norm': 7.6817361440233585, 'learning_rate': 2.685607328394613e-06, 'epoch': 0.49} 49%|████▉ | 6059/12313 [4:32:08<4:44:59, 2.73s/it] 49%|████▉ | 6060/12313 [4:32:11<4:41:19, 2.70s/it] {'loss': 0.4758, 'grad_norm': 11.249409008964918, 'learning_rate': 2.6849515147211814e-06, 'epoch': 0.49} 49%|████▉ | 6060/12313 [4:32:11<4:41:19, 2.70s/it] 49%|████▉ | 6061/12313 [4:32:13<4:31:07, 2.60s/it] {'loss': 0.445, 'grad_norm': 3.9392524421599346, 'learning_rate': 2.6842956882500843e-06, 'epoch': 0.49} 49%|████▉ | 6061/12313 [4:32:13<4:31:07, 2.60s/it] 49%|████▉ | 6062/12313 [4:32:16<4:31:09, 2.60s/it] {'loss': 0.5492, 'grad_norm': 9.559526227459175, 'learning_rate': 2.6836398490267006e-06, 'epoch': 0.49} 49%|████▉ | 6062/12313 [4:32:16<4:31:09, 2.60s/it] 49%|████▉ | 6063/12313 [4:32:18<4:30:55, 2.60s/it] {'loss': 0.4452, 'grad_norm': 3.917087406483406, 'learning_rate': 2.6829839970964112e-06, 'epoch': 0.49} 49%|████▉ | 6063/12313 [4:32:18<4:30:55, 2.60s/it] 49%|████▉ | 6064/12313 [4:32:21<4:29:23, 2.59s/it] {'loss': 0.4458, 'grad_norm': 3.7273772258833566, 'learning_rate': 2.682328132504598e-06, 'epoch': 0.49} 49%|████▉ | 6064/12313 [4:32:21<4:29:23, 2.59s/it] 49%|████▉ | 6065/12313 [4:32:24<5:00:50, 2.89s/it] {'loss': 0.5387, 'grad_norm': 4.9038911903534474, 'learning_rate': 2.6816722552966423e-06, 'epoch': 0.49} 49%|████▉ | 6065/12313 [4:32:24<5:00:50, 2.89s/it] 49%|████▉ | 6066/12313 [4:32:27<4:55:22, 2.84s/it] {'loss': 0.4933, 'grad_norm': 5.697167255856347, 'learning_rate': 2.6810163655179287e-06, 'epoch': 0.49} 49%|████▉ | 6066/12313 [4:32:27<4:55:22, 2.84s/it] 49%|████▉ | 6067/12313 [4:32:30<4:47:12, 2.76s/it] {'loss': 0.381, 'grad_norm': 4.553361726145492, 'learning_rate': 2.6803604632138403e-06, 'epoch': 0.49} 49%|████▉ | 6067/12313 [4:32:30<4:47:12, 2.76s/it] 49%|████▉ | 6068/12313 [4:32:33<4:51:23, 2.80s/it] {'loss': 0.5172, 'grad_norm': 4.7949903203120305, 'learning_rate': 2.6797045484297624e-06, 'epoch': 0.49} 49%|████▉ | 6068/12313 [4:32:33<4:51:23, 2.80s/it] 49%|████▉ | 6069/12313 [4:32:35<4:51:05, 2.80s/it] {'loss': 0.4133, 'grad_norm': 20.378872084713798, 'learning_rate': 2.6790486212110812e-06, 'epoch': 0.49} 49%|████▉ | 6069/12313 [4:32:35<4:51:05, 2.80s/it] 49%|████▉ | 6070/12313 [4:32:38<4:44:43, 2.74s/it] {'loss': 0.4685, 'grad_norm': 6.870346383155372, 'learning_rate': 2.678392681603183e-06, 'epoch': 0.49} 49%|████▉ | 6070/12313 [4:32:38<4:44:43, 2.74s/it] 49%|████▉ | 6071/12313 [4:32:41<4:39:36, 2.69s/it] {'loss': 0.4881, 'grad_norm': 4.53706289191984, 'learning_rate': 2.6777367296514547e-06, 'epoch': 0.49} 49%|████▉ | 6071/12313 [4:32:41<4:39:36, 2.69s/it] 49%|████▉ | 6072/12313 [4:32:43<4:40:11, 2.69s/it] {'loss': 0.5477, 'grad_norm': 5.005808865354581, 'learning_rate': 2.677080765401286e-06, 'epoch': 0.49} 49%|████▉ | 6072/12313 [4:32:43<4:40:11, 2.69s/it] 49%|████▉ | 6073/12313 [4:32:46<4:40:33, 2.70s/it] {'loss': 0.4074, 'grad_norm': 8.68217397091714, 'learning_rate': 2.6764247888980654e-06, 'epoch': 0.49} 49%|████▉ | 6073/12313 [4:32:46<4:40:33, 2.70s/it] 49%|████▉ | 6074/12313 [4:32:49<4:35:33, 2.65s/it] {'loss': 0.5763, 'grad_norm': 6.058970447210371, 'learning_rate': 2.675768800187182e-06, 'epoch': 0.49} 49%|████▉ | 6074/12313 [4:32:49<4:35:33, 2.65s/it] 49%|████▉ | 6075/12313 [4:32:51<4:41:03, 2.70s/it] {'loss': 0.6046, 'grad_norm': 2.9831711571626522, 'learning_rate': 2.67511279931403e-06, 'epoch': 0.49} 49%|████▉ | 6075/12313 [4:32:51<4:41:03, 2.70s/it] 49%|████▉ | 6076/12313 [4:32:54<4:39:02, 2.68s/it] {'loss': 0.505, 'grad_norm': 3.841362882001304, 'learning_rate': 2.674456786323998e-06, 'epoch': 0.49} 49%|████▉ | 6076/12313 [4:32:54<4:39:02, 2.68s/it] 49%|████▉ | 6077/12313 [4:32:57<4:42:31, 2.72s/it] {'loss': 0.5466, 'grad_norm': 5.296826365325529, 'learning_rate': 2.6738007612624792e-06, 'epoch': 0.49} 49%|████▉ | 6077/12313 [4:32:57<4:42:31, 2.72s/it] 49%|████▉ | 6078/12313 [4:32:59<4:40:18, 2.70s/it] {'loss': 0.5204, 'grad_norm': 4.678775568920299, 'learning_rate': 2.673144724174868e-06, 'epoch': 0.49} 49%|████▉ | 6078/12313 [4:32:59<4:40:18, 2.70s/it] 49%|████▉ | 6079/12313 [4:33:02<4:37:23, 2.67s/it] {'loss': 0.5329, 'grad_norm': 4.857667826957571, 'learning_rate': 2.6724886751065584e-06, 'epoch': 0.49} 49%|████▉ | 6079/12313 [4:33:02<4:37:23, 2.67s/it] 49%|████▉ | 6080/12313 [4:33:05<4:35:47, 2.65s/it] {'loss': 0.529, 'grad_norm': 5.592492702134483, 'learning_rate': 2.671832614102945e-06, 'epoch': 0.49} 49%|████▉ | 6080/12313 [4:33:05<4:35:47, 2.65s/it] 49%|████▉ | 6081/12313 [4:33:07<4:38:56, 2.69s/it] {'loss': 0.5004, 'grad_norm': 61.655374457513666, 'learning_rate': 2.671176541209424e-06, 'epoch': 0.49} 49%|████▉ | 6081/12313 [4:33:07<4:38:56, 2.69s/it] 49%|████▉ | 6082/12313 [4:33:10<4:40:28, 2.70s/it] {'loss': 0.4508, 'grad_norm': 6.490521162736462, 'learning_rate': 2.6705204564713927e-06, 'epoch': 0.49} 49%|████▉ | 6082/12313 [4:33:10<4:40:28, 2.70s/it] 49%|████▉ | 6083/12313 [4:33:13<4:39:06, 2.69s/it] {'loss': 0.4754, 'grad_norm': 6.042977369262763, 'learning_rate': 2.669864359934249e-06, 'epoch': 0.49} 49%|████▉ | 6083/12313 [4:33:13<4:39:06, 2.69s/it] 49%|████▉ | 6084/12313 [4:33:16<4:44:50, 2.74s/it] {'loss': 0.4356, 'grad_norm': 7.550162317594159, 'learning_rate': 2.6692082516433886e-06, 'epoch': 0.49} 49%|████▉ | 6084/12313 [4:33:16<4:44:50, 2.74s/it] 49%|████▉ | 6085/12313 [4:33:18<4:41:59, 2.72s/it] {'loss': 0.6147, 'grad_norm': 3.954161538122901, 'learning_rate': 2.668552131644214e-06, 'epoch': 0.49} 49%|████▉ | 6085/12313 [4:33:18<4:41:59, 2.72s/it] 49%|████▉ | 6086/12313 [4:33:22<4:56:20, 2.86s/it] {'loss': 0.4406, 'grad_norm': 8.837611857542369, 'learning_rate': 2.667895999982124e-06, 'epoch': 0.49} 49%|████▉ | 6086/12313 [4:33:22<4:56:20, 2.86s/it] 49%|████▉ | 6087/12313 [4:33:24<4:52:02, 2.81s/it] {'loss': 0.4472, 'grad_norm': 5.060653912888538, 'learning_rate': 2.6672398567025188e-06, 'epoch': 0.49} 49%|████▉ | 6087/12313 [4:33:24<4:52:02, 2.81s/it] 49%|████▉ | 6088/12313 [4:33:27<4:48:35, 2.78s/it] {'loss': 0.507, 'grad_norm': 9.229957236777977, 'learning_rate': 2.666583701850802e-06, 'epoch': 0.49} 49%|████▉ | 6088/12313 [4:33:27<4:48:35, 2.78s/it] 49%|████▉ | 6089/12313 [4:33:30<4:45:14, 2.75s/it] {'loss': 0.4867, 'grad_norm': 4.035963295981821, 'learning_rate': 2.6659275354723735e-06, 'epoch': 0.49} 49%|████▉ | 6089/12313 [4:33:30<4:45:14, 2.75s/it] 49%|████▉ | 6090/12313 [4:33:32<4:40:38, 2.71s/it] {'loss': 0.6755, 'grad_norm': 8.351371921914652, 'learning_rate': 2.6652713576126376e-06, 'epoch': 0.49} 49%|████▉ | 6090/12313 [4:33:32<4:40:38, 2.71s/it] 49%|████▉ | 6091/12313 [4:33:35<4:41:41, 2.72s/it] {'loss': 0.6439, 'grad_norm': 5.78678933022576, 'learning_rate': 2.6646151683169985e-06, 'epoch': 0.49} 49%|████▉ | 6091/12313 [4:33:35<4:41:41, 2.72s/it] 49%|████▉ | 6092/12313 [4:33:38<4:41:49, 2.72s/it] {'loss': 0.7211, 'grad_norm': 4.566506684904742, 'learning_rate': 2.6639589676308614e-06, 'epoch': 0.49} 49%|████▉ | 6092/12313 [4:33:38<4:41:49, 2.72s/it] 49%|████▉ | 6093/12313 [4:33:40<4:38:02, 2.68s/it] {'loss': 0.6008, 'grad_norm': 7.609049634242752, 'learning_rate': 2.663302755599631e-06, 'epoch': 0.49} 49%|████▉ | 6093/12313 [4:33:40<4:38:02, 2.68s/it] 49%|████▉ | 6094/12313 [4:33:43<4:38:54, 2.69s/it] {'loss': 0.5058, 'grad_norm': 3.3025767017322893, 'learning_rate': 2.6626465322687144e-06, 'epoch': 0.49} 49%|████▉ | 6094/12313 [4:33:43<4:38:54, 2.69s/it] 50%|████▉ | 6095/12313 [4:33:46<4:34:10, 2.65s/it] {'loss': 0.3995, 'grad_norm': 5.250107319379549, 'learning_rate': 2.6619902976835187e-06, 'epoch': 0.5} 50%|████▉ | 6095/12313 [4:33:46<4:34:10, 2.65s/it] 50%|████▉ | 6096/12313 [4:33:48<4:34:38, 2.65s/it] {'loss': 0.4428, 'grad_norm': 4.264694509387339, 'learning_rate': 2.6613340518894513e-06, 'epoch': 0.5} 50%|████▉ | 6096/12313 [4:33:48<4:34:38, 2.65s/it] 50%|████▉ | 6097/12313 [4:33:51<4:35:04, 2.66s/it] {'loss': 0.4568, 'grad_norm': 9.523848099068603, 'learning_rate': 2.6606777949319217e-06, 'epoch': 0.5} 50%|████▉ | 6097/12313 [4:33:51<4:35:04, 2.66s/it] 50%|████▉ | 6098/12313 [4:33:54<4:36:49, 2.67s/it] {'loss': 0.7066, 'grad_norm': 4.645197872989053, 'learning_rate': 2.6600215268563396e-06, 'epoch': 0.5} 50%|████▉ | 6098/12313 [4:33:54<4:36:49, 2.67s/it] 50%|████▉ | 6099/12313 [4:33:56<4:43:07, 2.73s/it] {'loss': 0.5934, 'grad_norm': 6.723227949020278, 'learning_rate': 2.6593652477081146e-06, 'epoch': 0.5} 50%|████▉ | 6099/12313 [4:33:56<4:43:07, 2.73s/it] 50%|████▉ | 6100/12313 [4:33:59<4:39:20, 2.70s/it] {'loss': 0.4197, 'grad_norm': 6.836596280031364, 'learning_rate': 2.658708957532657e-06, 'epoch': 0.5} 50%|████▉ | 6100/12313 [4:33:59<4:39:20, 2.70s/it] 50%|████▉ | 6101/12313 [4:34:02<4:34:01, 2.65s/it] {'loss': 0.5521, 'grad_norm': 5.530454100104951, 'learning_rate': 2.6580526563753794e-06, 'epoch': 0.5} 50%|████▉ | 6101/12313 [4:34:02<4:34:01, 2.65s/it] 50%|████▉ | 6102/12313 [4:34:04<4:34:41, 2.65s/it] {'loss': 0.4716, 'grad_norm': 8.014803649260598, 'learning_rate': 2.6573963442816957e-06, 'epoch': 0.5} 50%|████▉ | 6102/12313 [4:34:04<4:34:41, 2.65s/it] 50%|████▉ | 6103/12313 [4:34:07<4:34:36, 2.65s/it] {'loss': 0.6337, 'grad_norm': 5.651578109021286, 'learning_rate': 2.656740021297017e-06, 'epoch': 0.5} 50%|████▉ | 6103/12313 [4:34:07<4:34:36, 2.65s/it] 50%|████▉ | 6104/12313 [4:34:09<4:30:17, 2.61s/it] {'loss': 0.4835, 'grad_norm': 4.497707754410902, 'learning_rate': 2.6560836874667584e-06, 'epoch': 0.5} 50%|████▉ | 6104/12313 [4:34:09<4:30:17, 2.61s/it] 50%|████▉ | 6105/12313 [4:34:13<4:48:52, 2.79s/it] {'loss': 0.4341, 'grad_norm': 6.748930438264841, 'learning_rate': 2.6554273428363352e-06, 'epoch': 0.5} 50%|████▉ | 6105/12313 [4:34:13<4:48:52, 2.79s/it] 50%|████▉ | 6106/12313 [4:34:15<4:39:50, 2.71s/it] {'loss': 0.4681, 'grad_norm': 4.813991480179138, 'learning_rate': 2.6547709874511622e-06, 'epoch': 0.5} 50%|████▉ | 6106/12313 [4:34:15<4:39:50, 2.71s/it] 50%|████▉ | 6107/12313 [4:34:18<4:42:11, 2.73s/it] {'loss': 0.4306, 'grad_norm': 4.661897396362085, 'learning_rate': 2.654114621356656e-06, 'epoch': 0.5} 50%|████▉ | 6107/12313 [4:34:18<4:42:11, 2.73s/it] 50%|████▉ | 6108/12313 [4:34:20<4:35:56, 2.67s/it] {'loss': 0.4603, 'grad_norm': 6.397578426337775, 'learning_rate': 2.6534582445982338e-06, 'epoch': 0.5} 50%|████▉ | 6108/12313 [4:34:20<4:35:56, 2.67s/it] 50%|████▉ | 6109/12313 [4:34:23<4:33:43, 2.65s/it] {'loss': 0.6069, 'grad_norm': 4.315053588094933, 'learning_rate': 2.6528018572213133e-06, 'epoch': 0.5} 50%|████▉ | 6109/12313 [4:34:23<4:33:43, 2.65s/it] 50%|████▉ | 6110/12313 [4:34:26<4:55:36, 2.86s/it] {'loss': 0.4975, 'grad_norm': 3.5982681139239876, 'learning_rate': 2.6521454592713125e-06, 'epoch': 0.5} 50%|████▉ | 6110/12313 [4:34:26<4:55:36, 2.86s/it] 50%|████▉ | 6111/12313 [4:34:29<4:52:31, 2.83s/it] {'loss': 0.4753, 'grad_norm': 4.778399998810193, 'learning_rate': 2.6514890507936515e-06, 'epoch': 0.5} 50%|████▉ | 6111/12313 [4:34:29<4:52:31, 2.83s/it] 50%|████▉ | 6112/12313 [4:34:32<4:45:02, 2.76s/it] {'loss': 0.6111, 'grad_norm': 5.875526293696564, 'learning_rate': 2.6508326318337498e-06, 'epoch': 0.5} 50%|████▉ | 6112/12313 [4:34:32<4:45:02, 2.76s/it] 50%|████▉ | 6113/12313 [4:34:34<4:42:35, 2.73s/it] {'loss': 0.3784, 'grad_norm': 4.292823551355768, 'learning_rate': 2.6501762024370283e-06, 'epoch': 0.5} 50%|████▉ | 6113/12313 [4:34:34<4:42:35, 2.73s/it] 50%|████▉ | 6114/12313 [4:34:37<4:42:02, 2.73s/it] {'loss': 0.582, 'grad_norm': 8.401173587655329, 'learning_rate': 2.6495197626489082e-06, 'epoch': 0.5} 50%|████▉ | 6114/12313 [4:34:37<4:42:02, 2.73s/it] 50%|████▉ | 6115/12313 [4:34:40<4:40:13, 2.71s/it] {'loss': 0.4502, 'grad_norm': 4.321012772875746, 'learning_rate': 2.6488633125148127e-06, 'epoch': 0.5} 50%|████▉ | 6115/12313 [4:34:40<4:40:13, 2.71s/it] 50%|████▉ | 6116/12313 [4:34:42<4:36:12, 2.67s/it] {'loss': 0.5668, 'grad_norm': 5.379767516829725, 'learning_rate': 2.6482068520801625e-06, 'epoch': 0.5} 50%|████▉ | 6116/12313 [4:34:42<4:36:12, 2.67s/it] 50%|████▉ | 6117/12313 [4:34:45<4:38:59, 2.70s/it] {'loss': 0.4885, 'grad_norm': 5.218798786070938, 'learning_rate': 2.647550381390383e-06, 'epoch': 0.5} 50%|████▉ | 6117/12313 [4:34:45<4:38:59, 2.70s/it] 50%|████▉ | 6118/12313 [4:34:48<4:45:19, 2.76s/it] {'loss': 0.5571, 'grad_norm': 6.280161135864912, 'learning_rate': 2.6468939004908987e-06, 'epoch': 0.5} 50%|████▉ | 6118/12313 [4:34:48<4:45:19, 2.76s/it] 50%|████▉ | 6119/12313 [4:34:51<4:40:06, 2.71s/it] {'loss': 0.6213, 'grad_norm': 6.147745243094484, 'learning_rate': 2.646237409427133e-06, 'epoch': 0.5} 50%|████▉ | 6119/12313 [4:34:51<4:40:06, 2.71s/it] 50%|████▉ | 6120/12313 [4:34:53<4:41:14, 2.72s/it] {'loss': 0.5256, 'grad_norm': 4.237997643901119, 'learning_rate': 2.645580908244513e-06, 'epoch': 0.5} 50%|████▉ | 6120/12313 [4:34:53<4:41:14, 2.72s/it] 50%|████▉ | 6121/12313 [4:34:56<4:44:15, 2.75s/it] {'loss': 0.4717, 'grad_norm': 5.794983422357201, 'learning_rate': 2.644924396988465e-06, 'epoch': 0.5} 50%|████▉ | 6121/12313 [4:34:56<4:44:15, 2.75s/it] 50%|████▉ | 6122/12313 [4:34:59<4:39:36, 2.71s/it] {'loss': 0.605, 'grad_norm': 3.028956483672569, 'learning_rate': 2.644267875704415e-06, 'epoch': 0.5} 50%|████▉ | 6122/12313 [4:34:59<4:39:36, 2.71s/it] 50%|████▉ | 6123/12313 [4:35:01<4:30:10, 2.62s/it] {'loss': 0.3828, 'grad_norm': 4.783447111346257, 'learning_rate': 2.6436113444377916e-06, 'epoch': 0.5} 50%|████▉ | 6123/12313 [4:35:01<4:30:10, 2.62s/it] 50%|████▉ | 6124/12313 [4:35:04<4:28:13, 2.60s/it] {'loss': 0.4624, 'grad_norm': 4.634896048686429, 'learning_rate': 2.6429548032340233e-06, 'epoch': 0.5} 50%|████▉ | 6124/12313 [4:35:04<4:28:13, 2.60s/it] 50%|████▉ | 6125/12313 [4:35:07<4:32:57, 2.65s/it] {'loss': 0.471, 'grad_norm': 4.4180065168228575, 'learning_rate': 2.642298252138539e-06, 'epoch': 0.5} 50%|████▉ | 6125/12313 [4:35:07<4:32:57, 2.65s/it] 50%|████▉ | 6126/12313 [4:35:09<4:32:05, 2.64s/it] {'loss': 0.3827, 'grad_norm': 7.029597857740861, 'learning_rate': 2.641641691196769e-06, 'epoch': 0.5} 50%|████▉ | 6126/12313 [4:35:09<4:32:05, 2.64s/it] 50%|████▉ | 6127/12313 [4:35:12<4:32:51, 2.65s/it] {'loss': 0.634, 'grad_norm': 3.5588276162139922, 'learning_rate': 2.6409851204541435e-06, 'epoch': 0.5} 50%|████▉ | 6127/12313 [4:35:12<4:32:51, 2.65s/it] 50%|████▉ | 6128/12313 [4:35:15<4:37:39, 2.69s/it] {'loss': 0.4562, 'grad_norm': 5.229212123221044, 'learning_rate': 2.640328539956094e-06, 'epoch': 0.5} 50%|████▉ | 6128/12313 [4:35:15<4:37:39, 2.69s/it] 50%|████▉ | 6129/12313 [4:35:18<4:46:39, 2.78s/it] {'loss': 0.5074, 'grad_norm': 4.266180122295215, 'learning_rate': 2.639671949748052e-06, 'epoch': 0.5} 50%|████▉ | 6129/12313 [4:35:18<4:46:39, 2.78s/it] 50%|████▉ | 6130/12313 [4:35:20<4:41:46, 2.73s/it] {'loss': 0.5987, 'grad_norm': 5.993522414034609, 'learning_rate': 2.6390153498754506e-06, 'epoch': 0.5} 50%|████▉ | 6130/12313 [4:35:20<4:41:46, 2.73s/it] 50%|████▉ | 6131/12313 [4:35:23<4:41:03, 2.73s/it] {'loss': 0.5943, 'grad_norm': 5.432626385109642, 'learning_rate': 2.638358740383723e-06, 'epoch': 0.5} 50%|████▉ | 6131/12313 [4:35:23<4:41:03, 2.73s/it] 50%|████▉ | 6132/12313 [4:35:26<4:37:36, 2.69s/it] {'loss': 0.4533, 'grad_norm': 7.744059010319982, 'learning_rate': 2.637702121318302e-06, 'epoch': 0.5} 50%|████▉ | 6132/12313 [4:35:26<4:37:36, 2.69s/it] 50%|████▉ | 6133/12313 [4:35:28<4:35:38, 2.68s/it] {'loss': 0.4274, 'grad_norm': 6.471183582152214, 'learning_rate': 2.6370454927246237e-06, 'epoch': 0.5} 50%|████▉ | 6133/12313 [4:35:28<4:35:38, 2.68s/it] 50%|████▉ | 6134/12313 [4:35:31<4:41:23, 2.73s/it] {'loss': 0.4676, 'grad_norm': 3.457874851254734, 'learning_rate': 2.6363888546481224e-06, 'epoch': 0.5} 50%|████▉ | 6134/12313 [4:35:31<4:41:23, 2.73s/it] 50%|████▉ | 6135/12313 [4:35:34<4:36:39, 2.69s/it] {'loss': 0.4627, 'grad_norm': 8.481196069942994, 'learning_rate': 2.635732207134234e-06, 'epoch': 0.5} 50%|████▉ | 6135/12313 [4:35:34<4:36:39, 2.69s/it] 50%|████▉ | 6136/12313 [4:35:36<4:37:36, 2.70s/it] {'loss': 0.4784, 'grad_norm': 5.660075404955752, 'learning_rate': 2.635075550228395e-06, 'epoch': 0.5} 50%|████▉ | 6136/12313 [4:35:36<4:37:36, 2.70s/it] 50%|████▉ | 6137/12313 [4:35:39<4:46:23, 2.78s/it] {'loss': 0.4569, 'grad_norm': 4.313279838626725, 'learning_rate': 2.634418883976043e-06, 'epoch': 0.5} 50%|████▉ | 6137/12313 [4:35:39<4:46:23, 2.78s/it] 50%|████▉ | 6138/12313 [4:35:42<4:49:08, 2.81s/it] {'loss': 0.5727, 'grad_norm': 3.362477204863664, 'learning_rate': 2.6337622084226163e-06, 'epoch': 0.5} 50%|████▉ | 6138/12313 [4:35:42<4:49:08, 2.81s/it] 50%|████▉ | 6139/12313 [4:35:45<4:46:43, 2.79s/it] {'loss': 0.7045, 'grad_norm': 4.485740579944916, 'learning_rate': 2.633105523613551e-06, 'epoch': 0.5} 50%|████▉ | 6139/12313 [4:35:45<4:46:43, 2.79s/it] 50%|████▉ | 6140/12313 [4:35:48<4:48:41, 2.81s/it] {'loss': 0.693, 'grad_norm': 3.989314569645018, 'learning_rate': 2.6324488295942897e-06, 'epoch': 0.5} 50%|████▉ | 6140/12313 [4:35:48<4:48:41, 2.81s/it] 50%|████▉ | 6141/12313 [4:35:50<4:36:02, 2.68s/it] {'loss': 0.4904, 'grad_norm': 6.808212058806507, 'learning_rate': 2.6317921264102697e-06, 'epoch': 0.5} 50%|████▉ | 6141/12313 [4:35:50<4:36:02, 2.68s/it] 50%|████▉ | 6142/12313 [4:35:53<4:38:18, 2.71s/it] {'loss': 0.6021, 'grad_norm': 4.01346195857619, 'learning_rate': 2.6311354141069324e-06, 'epoch': 0.5} 50%|████▉ | 6142/12313 [4:35:53<4:38:18, 2.71s/it] 50%|████▉ | 6143/12313 [4:35:55<4:31:21, 2.64s/it] {'loss': 0.5009, 'grad_norm': 5.471950249286309, 'learning_rate': 2.630478692729718e-06, 'epoch': 0.5} 50%|████▉ | 6143/12313 [4:35:56<4:31:21, 2.64s/it] 50%|████▉ | 6144/12313 [4:35:58<4:29:08, 2.62s/it] {'loss': 0.3804, 'grad_norm': 4.820722399207098, 'learning_rate': 2.6298219623240685e-06, 'epoch': 0.5} 50%|████▉ | 6144/12313 [4:35:58<4:29:08, 2.62s/it] 50%|████▉ | 6145/12313 [4:36:01<4:40:35, 2.73s/it] {'loss': 0.4665, 'grad_norm': 3.164263910898727, 'learning_rate': 2.6291652229354264e-06, 'epoch': 0.5} 50%|████▉ | 6145/12313 [4:36:01<4:40:35, 2.73s/it] 50%|████▉ | 6146/12313 [4:36:03<4:30:54, 2.64s/it] {'loss': 0.5254, 'grad_norm': 5.584269076148355, 'learning_rate': 2.6285084746092347e-06, 'epoch': 0.5} 50%|████▉ | 6146/12313 [4:36:03<4:30:54, 2.64s/it] 50%|████▉ | 6147/12313 [4:36:06<4:24:36, 2.57s/it] {'loss': 0.4869, 'grad_norm': 10.9678056140828, 'learning_rate': 2.627851717390936e-06, 'epoch': 0.5} 50%|████▉ | 6147/12313 [4:36:06<4:24:36, 2.57s/it] 50%|████▉ | 6148/12313 [4:36:09<4:31:19, 2.64s/it] {'loss': 0.5502, 'grad_norm': 6.984083124060342, 'learning_rate': 2.6271949513259764e-06, 'epoch': 0.5} 50%|████▉ | 6148/12313 [4:36:09<4:31:19, 2.64s/it] 50%|████▉ | 6149/12313 [4:36:11<4:33:38, 2.66s/it] {'loss': 0.5626, 'grad_norm': 4.28910430937024, 'learning_rate': 2.626538176459798e-06, 'epoch': 0.5} 50%|████▉ | 6149/12313 [4:36:11<4:33:38, 2.66s/it] 50%|████▉ | 6150/12313 [4:36:14<4:39:59, 2.73s/it] {'loss': 0.5614, 'grad_norm': 5.809158832217803, 'learning_rate': 2.625881392837849e-06, 'epoch': 0.5} 50%|████▉ | 6150/12313 [4:36:14<4:39:59, 2.73s/it] 50%|████▉ | 6151/12313 [4:36:17<4:41:02, 2.74s/it] {'loss': 0.5047, 'grad_norm': 4.895799235690263, 'learning_rate': 2.6252246005055725e-06, 'epoch': 0.5} 50%|████▉ | 6151/12313 [4:36:17<4:41:02, 2.74s/it] 50%|████▉ | 6152/12313 [4:36:20<4:47:11, 2.80s/it] {'loss': 0.555, 'grad_norm': 3.584830115023326, 'learning_rate': 2.6245677995084163e-06, 'epoch': 0.5} 50%|████▉ | 6152/12313 [4:36:20<4:47:11, 2.80s/it] 50%|████▉ | 6153/12313 [4:36:23<4:43:35, 2.76s/it] {'loss': 0.4141, 'grad_norm': 10.564765809267739, 'learning_rate': 2.6239109898918286e-06, 'epoch': 0.5} 50%|████▉ | 6153/12313 [4:36:23<4:43:35, 2.76s/it] 50%|████▉ | 6154/12313 [4:36:25<4:34:03, 2.67s/it] {'loss': 0.5562, 'grad_norm': 3.816580959787303, 'learning_rate': 2.6232541717012563e-06, 'epoch': 0.5} 50%|████▉ | 6154/12313 [4:36:25<4:34:03, 2.67s/it] 50%|████▉ | 6155/12313 [4:36:28<4:36:24, 2.69s/it] {'loss': 0.4434, 'grad_norm': 7.68222211219942, 'learning_rate': 2.6225973449821468e-06, 'epoch': 0.5} 50%|████▉ | 6155/12313 [4:36:28<4:36:24, 2.69s/it] 50%|████▉ | 6156/12313 [4:36:31<4:35:31, 2.68s/it] {'loss': 0.67, 'grad_norm': 4.677405643402403, 'learning_rate': 2.6219405097799498e-06, 'epoch': 0.5} 50%|████▉ | 6156/12313 [4:36:31<4:35:31, 2.68s/it] 50%|█████ | 6157/12313 [4:36:33<4:37:51, 2.71s/it] {'loss': 0.5044, 'grad_norm': 6.735859677395967, 'learning_rate': 2.6212836661401154e-06, 'epoch': 0.5} 50%|█████ | 6157/12313 [4:36:33<4:37:51, 2.71s/it] 50%|█████ | 6158/12313 [4:36:36<4:33:10, 2.66s/it] {'loss': 0.4942, 'grad_norm': 5.559869392207337, 'learning_rate': 2.6206268141080924e-06, 'epoch': 0.5} 50%|█████ | 6158/12313 [4:36:36<4:33:10, 2.66s/it] 50%|█████ | 6159/12313 [4:36:39<4:32:42, 2.66s/it] {'loss': 0.5946, 'grad_norm': 5.91589064626228, 'learning_rate': 2.619969953729333e-06, 'epoch': 0.5} 50%|█████ | 6159/12313 [4:36:39<4:32:42, 2.66s/it] 50%|█████ | 6160/12313 [4:36:41<4:34:46, 2.68s/it] {'loss': 0.6033, 'grad_norm': 9.884104436935436, 'learning_rate': 2.6193130850492876e-06, 'epoch': 0.5} 50%|█████ | 6160/12313 [4:36:41<4:34:46, 2.68s/it] 50%|█████ | 6161/12313 [4:36:44<4:43:59, 2.77s/it] {'loss': 0.4999, 'grad_norm': 5.191171578581443, 'learning_rate': 2.618656208113408e-06, 'epoch': 0.5} 50%|█████ | 6161/12313 [4:36:44<4:43:59, 2.77s/it] 50%|█████ | 6162/12313 [4:36:47<4:38:08, 2.71s/it] {'loss': 0.5341, 'grad_norm': 4.869819153010546, 'learning_rate': 2.6179993229671473e-06, 'epoch': 0.5} 50%|█████ | 6162/12313 [4:36:47<4:38:08, 2.71s/it] 50%|█████ | 6163/12313 [4:36:49<4:36:52, 2.70s/it] {'loss': 0.5408, 'grad_norm': 4.257762076221053, 'learning_rate': 2.6173424296559575e-06, 'epoch': 0.5} 50%|█████ | 6163/12313 [4:36:49<4:36:52, 2.70s/it] 50%|█████ | 6164/12313 [4:36:52<4:36:36, 2.70s/it] {'loss': 0.5863, 'grad_norm': 4.96043740363745, 'learning_rate': 2.6166855282252933e-06, 'epoch': 0.5} 50%|█████ | 6164/12313 [4:36:52<4:36:36, 2.70s/it] 50%|█████ | 6165/12313 [4:36:55<4:46:12, 2.79s/it] {'loss': 0.4936, 'grad_norm': 4.664873730144854, 'learning_rate': 2.616028618720607e-06, 'epoch': 0.5} 50%|█████ | 6165/12313 [4:36:55<4:46:12, 2.79s/it] 50%|█████ | 6166/12313 [4:36:58<4:34:47, 2.68s/it] {'loss': 0.5604, 'grad_norm': 5.757723223582411, 'learning_rate': 2.615371701187355e-06, 'epoch': 0.5} 50%|█████ | 6166/12313 [4:36:58<4:34:47, 2.68s/it] 50%|█████ | 6167/12313 [4:37:00<4:34:41, 2.68s/it] {'loss': 0.5215, 'grad_norm': 5.55920229153415, 'learning_rate': 2.6147147756709925e-06, 'epoch': 0.5} 50%|█████ | 6167/12313 [4:37:00<4:34:41, 2.68s/it] 50%|█████ | 6168/12313 [4:37:03<4:32:13, 2.66s/it] {'loss': 0.6314, 'grad_norm': 3.916022531924741, 'learning_rate': 2.614057842216973e-06, 'epoch': 0.5} 50%|█████ | 6168/12313 [4:37:03<4:32:13, 2.66s/it] 50%|█████ | 6169/12313 [4:37:05<4:30:00, 2.64s/it] {'loss': 0.5058, 'grad_norm': 3.748268469101728, 'learning_rate': 2.6134009008707555e-06, 'epoch': 0.5} 50%|█████ | 6169/12313 [4:37:05<4:30:00, 2.64s/it] 50%|█████ | 6170/12313 [4:37:08<4:29:24, 2.63s/it] {'loss': 0.6175, 'grad_norm': 3.866469109499023, 'learning_rate': 2.6127439516777956e-06, 'epoch': 0.5} 50%|█████ | 6170/12313 [4:37:08<4:29:24, 2.63s/it] 50%|█████ | 6171/12313 [4:37:11<4:26:46, 2.61s/it] {'loss': 0.4369, 'grad_norm': 9.784917821882507, 'learning_rate': 2.6120869946835513e-06, 'epoch': 0.5} 50%|█████ | 6171/12313 [4:37:11<4:26:46, 2.61s/it] 50%|█████ | 6172/12313 [4:37:13<4:30:52, 2.65s/it] {'loss': 0.5966, 'grad_norm': 3.918198159806199, 'learning_rate': 2.61143002993348e-06, 'epoch': 0.5} 50%|█████ | 6172/12313 [4:37:13<4:30:52, 2.65s/it] 50%|█████ | 6173/12313 [4:37:16<4:29:52, 2.64s/it] {'loss': 0.5353, 'grad_norm': 9.939958648361882, 'learning_rate': 2.61077305747304e-06, 'epoch': 0.5} 50%|█████ | 6173/12313 [4:37:16<4:29:52, 2.64s/it] 50%|█████ | 6174/12313 [4:37:19<4:39:53, 2.74s/it] {'loss': 0.4513, 'grad_norm': 6.322410360361211, 'learning_rate': 2.610116077347691e-06, 'epoch': 0.5} 50%|█████ | 6174/12313 [4:37:19<4:39:53, 2.74s/it] 50%|█████ | 6175/12313 [4:37:22<4:37:04, 2.71s/it] {'loss': 0.3957, 'grad_norm': 6.622717663567328, 'learning_rate': 2.609459089602892e-06, 'epoch': 0.5} 50%|█████ | 6175/12313 [4:37:22<4:37:04, 2.71s/it] 50%|█████ | 6176/12313 [4:37:24<4:32:35, 2.67s/it] {'loss': 0.3698, 'grad_norm': 8.064816332776429, 'learning_rate': 2.6088020942841034e-06, 'epoch': 0.5} 50%|█████ | 6176/12313 [4:37:24<4:32:35, 2.67s/it] 50%|█████ | 6177/12313 [4:37:27<4:38:28, 2.72s/it] {'loss': 0.5031, 'grad_norm': 7.086132865620177, 'learning_rate': 2.6081450914367864e-06, 'epoch': 0.5} 50%|█████ | 6177/12313 [4:37:27<4:38:28, 2.72s/it] 50%|█████ | 6178/12313 [4:37:30<4:34:33, 2.69s/it] {'loss': 0.5348, 'grad_norm': 3.7954526083234077, 'learning_rate': 2.6074880811064003e-06, 'epoch': 0.5} 50%|█████ | 6178/12313 [4:37:30<4:34:33, 2.69s/it] 50%|█████ | 6179/12313 [4:37:32<4:27:20, 2.61s/it] {'loss': 0.4605, 'grad_norm': 5.020997847466201, 'learning_rate': 2.606831063338408e-06, 'epoch': 0.5} 50%|█████ | 6179/12313 [4:37:32<4:27:20, 2.61s/it] 50%|█████ | 6180/12313 [4:37:35<4:32:37, 2.67s/it] {'loss': 0.5455, 'grad_norm': 4.560830832766969, 'learning_rate': 2.6061740381782723e-06, 'epoch': 0.5} 50%|█████ | 6180/12313 [4:37:35<4:32:37, 2.67s/it] 50%|█████ | 6181/12313 [4:37:38<4:37:30, 2.72s/it] {'loss': 0.5157, 'grad_norm': 4.229120124206621, 'learning_rate': 2.605517005671454e-06, 'epoch': 0.5} 50%|█████ | 6181/12313 [4:37:38<4:37:30, 2.72s/it] 50%|█████ | 6182/12313 [4:37:40<4:32:23, 2.67s/it] {'loss': 0.4667, 'grad_norm': 4.581886089334441, 'learning_rate': 2.604859965863418e-06, 'epoch': 0.5} 50%|█████ | 6182/12313 [4:37:40<4:32:23, 2.67s/it] 50%|█████ | 6183/12313 [4:37:43<4:41:48, 2.76s/it] {'loss': 0.5935, 'grad_norm': 4.4186999593426055, 'learning_rate': 2.6042029187996277e-06, 'epoch': 0.5} 50%|█████ | 6183/12313 [4:37:43<4:41:48, 2.76s/it] 50%|█████ | 6184/12313 [4:37:46<4:40:52, 2.75s/it] {'loss': 0.5252, 'grad_norm': 4.736522021215294, 'learning_rate': 2.6035458645255467e-06, 'epoch': 0.5} 50%|█████ | 6184/12313 [4:37:46<4:40:52, 2.75s/it] 50%|█████ | 6185/12313 [4:37:49<4:35:09, 2.69s/it] {'loss': 0.5201, 'grad_norm': 4.668585962104124, 'learning_rate': 2.602888803086639e-06, 'epoch': 0.5} 50%|█████ | 6185/12313 [4:37:49<4:35:09, 2.69s/it] 50%|█████ | 6186/12313 [4:37:51<4:29:31, 2.64s/it] {'loss': 0.4524, 'grad_norm': 9.66768445142635, 'learning_rate': 2.602231734528372e-06, 'epoch': 0.5} 50%|█████ | 6186/12313 [4:37:51<4:29:31, 2.64s/it] 50%|█████ | 6187/12313 [4:37:53<4:23:28, 2.58s/it] {'loss': 0.6911, 'grad_norm': 4.15432156915695, 'learning_rate': 2.601574658896209e-06, 'epoch': 0.5} 50%|█████ | 6187/12313 [4:37:53<4:23:28, 2.58s/it] 50%|█████ | 6188/12313 [4:37:57<4:46:26, 2.81s/it] {'loss': 0.587, 'grad_norm': 5.036750129329255, 'learning_rate': 2.6009175762356176e-06, 'epoch': 0.5} 50%|█████ | 6188/12313 [4:37:57<4:46:26, 2.81s/it] 50%|█████ | 6189/12313 [4:37:59<4:42:52, 2.77s/it] {'loss': 0.5369, 'grad_norm': 9.244386485882309, 'learning_rate': 2.6002604865920645e-06, 'epoch': 0.5} 50%|█████ | 6189/12313 [4:37:59<4:42:52, 2.77s/it] 50%|█████ | 6190/12313 [4:38:02<4:45:18, 2.80s/it] {'loss': 0.6125, 'grad_norm': 4.712476893869534, 'learning_rate': 2.5996033900110155e-06, 'epoch': 0.5} 50%|█████ | 6190/12313 [4:38:02<4:45:18, 2.80s/it] 50%|█████ | 6191/12313 [4:38:05<4:43:40, 2.78s/it] {'loss': 0.6711, 'grad_norm': 7.034161631409917, 'learning_rate': 2.5989462865379394e-06, 'epoch': 0.5} 50%|█████ | 6191/12313 [4:38:05<4:43:40, 2.78s/it] 50%|█████ | 6192/12313 [4:38:08<4:50:48, 2.85s/it] {'loss': 0.5937, 'grad_norm': 2.537027536386944, 'learning_rate': 2.598289176218304e-06, 'epoch': 0.5} 50%|█████ | 6192/12313 [4:38:08<4:50:48, 2.85s/it] 50%|█████ | 6193/12313 [4:38:11<4:49:37, 2.84s/it] {'loss': 0.6653, 'grad_norm': 3.394185226747455, 'learning_rate': 2.597632059097577e-06, 'epoch': 0.5} 50%|█████ | 6193/12313 [4:38:11<4:49:37, 2.84s/it] 50%|█████ | 6194/12313 [4:38:14<4:42:44, 2.77s/it] {'loss': 0.6337, 'grad_norm': 4.376129626575339, 'learning_rate': 2.5969749352212294e-06, 'epoch': 0.5} 50%|█████ | 6194/12313 [4:38:14<4:42:44, 2.77s/it] 50%|█████ | 6195/12313 [4:38:16<4:37:08, 2.72s/it] {'loss': 0.4952, 'grad_norm': 4.748486742998477, 'learning_rate': 2.5963178046347286e-06, 'epoch': 0.5} 50%|█████ | 6195/12313 [4:38:16<4:37:08, 2.72s/it] 50%|█████ | 6196/12313 [4:38:19<4:34:55, 2.70s/it] {'loss': 0.3893, 'grad_norm': 7.369302442783826, 'learning_rate': 2.595660667383547e-06, 'epoch': 0.5} 50%|█████ | 6196/12313 [4:38:19<4:34:55, 2.70s/it] 50%|█████ | 6197/12313 [4:38:21<4:29:30, 2.64s/it] {'loss': 0.5277, 'grad_norm': 5.509513918719035, 'learning_rate': 2.5950035235131515e-06, 'epoch': 0.5} 50%|█████ | 6197/12313 [4:38:21<4:29:30, 2.64s/it] 50%|█████ | 6198/12313 [4:38:24<4:28:05, 2.63s/it] {'loss': 0.4237, 'grad_norm': 5.829645133957113, 'learning_rate': 2.594346373069016e-06, 'epoch': 0.5} 50%|█████ | 6198/12313 [4:38:24<4:28:05, 2.63s/it] 50%|█████ | 6199/12313 [4:38:26<4:24:14, 2.59s/it] {'loss': 0.4906, 'grad_norm': 5.626826411794362, 'learning_rate': 2.593689216096611e-06, 'epoch': 0.5} 50%|█████ | 6199/12313 [4:38:26<4:24:14, 2.59s/it] 50%|█████ | 6200/12313 [4:38:29<4:20:55, 2.56s/it] {'loss': 0.548, 'grad_norm': 6.977550708147046, 'learning_rate': 2.5930320526414083e-06, 'epoch': 0.5} 50%|█████ | 6200/12313 [4:38:29<4:20:55, 2.56s/it] 50%|█████ | 6201/12313 [4:38:31<4:21:20, 2.57s/it] {'loss': 0.4672, 'grad_norm': 7.209561596520351, 'learning_rate': 2.592374882748879e-06, 'epoch': 0.5} 50%|█████ | 6201/12313 [4:38:31<4:21:20, 2.57s/it] 50%|█████ | 6202/12313 [4:38:34<4:22:08, 2.57s/it] {'loss': 0.5535, 'grad_norm': 5.543565192935603, 'learning_rate': 2.5917177064644974e-06, 'epoch': 0.5} 50%|█████ | 6202/12313 [4:38:34<4:22:08, 2.57s/it] 50%|█████ | 6203/12313 [4:38:36<4:16:28, 2.52s/it] {'loss': 0.6972, 'grad_norm': 5.366571236178294, 'learning_rate': 2.5910605238337355e-06, 'epoch': 0.5} 50%|█████ | 6203/12313 [4:38:36<4:16:28, 2.52s/it] 50%|█████ | 6204/12313 [4:38:39<4:21:26, 2.57s/it] {'loss': 0.4503, 'grad_norm': 6.261566607697583, 'learning_rate': 2.5904033349020675e-06, 'epoch': 0.5} 50%|█████ | 6204/12313 [4:38:39<4:21:26, 2.57s/it] 50%|█████ | 6205/12313 [4:38:42<4:22:54, 2.58s/it] {'loss': 0.4825, 'grad_norm': 5.805760768553691, 'learning_rate': 2.589746139714967e-06, 'epoch': 0.5} 50%|█████ | 6205/12313 [4:38:42<4:22:54, 2.58s/it] 50%|█████ | 6206/12313 [4:38:44<4:26:06, 2.61s/it] {'loss': 0.4764, 'grad_norm': 8.328312111239873, 'learning_rate': 2.5890889383179086e-06, 'epoch': 0.5} 50%|█████ | 6206/12313 [4:38:44<4:26:06, 2.61s/it] 50%|█████ | 6207/12313 [4:38:47<4:36:43, 2.72s/it] {'loss': 0.5084, 'grad_norm': 4.810122399520198, 'learning_rate': 2.588431730756367e-06, 'epoch': 0.5} 50%|█████ | 6207/12313 [4:38:47<4:36:43, 2.72s/it] 50%|█████ | 6208/12313 [4:38:50<4:38:15, 2.73s/it] {'loss': 0.524, 'grad_norm': 3.7065509135197545, 'learning_rate': 2.5877745170758177e-06, 'epoch': 0.5} 50%|█████ | 6208/12313 [4:38:50<4:38:15, 2.73s/it] 50%|█████ | 6209/12313 [4:38:53<4:41:28, 2.77s/it] {'loss': 0.5104, 'grad_norm': 5.121308484330212, 'learning_rate': 2.5871172973217367e-06, 'epoch': 0.5} 50%|█████ | 6209/12313 [4:38:53<4:41:28, 2.77s/it] 50%|█████ | 6210/12313 [4:38:56<4:38:03, 2.73s/it] {'loss': 0.5344, 'grad_norm': 5.928958710988731, 'learning_rate': 2.5864600715396e-06, 'epoch': 0.5} 50%|█████ | 6210/12313 [4:38:56<4:38:03, 2.73s/it] 50%|█████ | 6211/12313 [4:38:58<4:33:18, 2.69s/it] {'loss': 0.4809, 'grad_norm': 5.887834972208363, 'learning_rate': 2.585802839774883e-06, 'epoch': 0.5} 50%|█████ | 6211/12313 [4:38:58<4:33:18, 2.69s/it] 50%|█████ | 6212/12313 [4:39:01<4:34:59, 2.70s/it] {'loss': 0.6071, 'grad_norm': 4.367368393282637, 'learning_rate': 2.5851456020730643e-06, 'epoch': 0.5} 50%|█████ | 6212/12313 [4:39:01<4:34:59, 2.70s/it] 50%|█████ | 6213/12313 [4:39:04<4:48:33, 2.84s/it] {'loss': 0.6266, 'grad_norm': 3.4461457206000374, 'learning_rate': 2.584488358479621e-06, 'epoch': 0.5} 50%|█████ | 6213/12313 [4:39:04<4:48:33, 2.84s/it] 50%|█████ | 6214/12313 [4:39:07<4:42:46, 2.78s/it] {'loss': 0.5782, 'grad_norm': 3.1291003154252004, 'learning_rate': 2.5838311090400293e-06, 'epoch': 0.5} 50%|█████ | 6214/12313 [4:39:07<4:42:46, 2.78s/it] 50%|█████ | 6215/12313 [4:39:09<4:31:17, 2.67s/it] {'loss': 0.395, 'grad_norm': 9.218149230525775, 'learning_rate': 2.58317385379977e-06, 'epoch': 0.5} 50%|█████ | 6215/12313 [4:39:09<4:31:17, 2.67s/it] 50%|█████ | 6216/12313 [4:39:12<4:32:04, 2.68s/it] {'loss': 0.4645, 'grad_norm': 7.375352110089412, 'learning_rate': 2.582516592804319e-06, 'epoch': 0.5} 50%|█████ | 6216/12313 [4:39:12<4:32:04, 2.68s/it] 50%|█████ | 6217/12313 [4:39:15<4:31:09, 2.67s/it] {'loss': 0.514, 'grad_norm': 6.15235163146789, 'learning_rate': 2.5818593260991565e-06, 'epoch': 0.5} 50%|█████ | 6217/12313 [4:39:15<4:31:09, 2.67s/it] 50%|█████ | 6218/12313 [4:39:17<4:31:08, 2.67s/it] {'loss': 0.5103, 'grad_norm': 16.29462482704463, 'learning_rate': 2.581202053729762e-06, 'epoch': 0.5} 50%|█████ | 6218/12313 [4:39:17<4:31:08, 2.67s/it] 51%|█████ | 6219/12313 [4:39:20<4:27:56, 2.64s/it] {'loss': 0.5023, 'grad_norm': 4.3058949775554884, 'learning_rate': 2.580544775741616e-06, 'epoch': 0.51} 51%|█████ | 6219/12313 [4:39:20<4:27:56, 2.64s/it] 51%|█████ | 6220/12313 [4:39:22<4:23:27, 2.59s/it] {'loss': 0.6187, 'grad_norm': 3.2866867580977606, 'learning_rate': 2.579887492180197e-06, 'epoch': 0.51} 51%|█████ | 6220/12313 [4:39:22<4:23:27, 2.59s/it] 51%|█████ | 6221/12313 [4:39:25<4:22:51, 2.59s/it] {'loss': 0.3933, 'grad_norm': 5.219156744499874, 'learning_rate': 2.579230203090986e-06, 'epoch': 0.51} 51%|█████ | 6221/12313 [4:39:25<4:22:51, 2.59s/it] 51%|█████ | 6222/12313 [4:39:28<4:27:08, 2.63s/it] {'loss': 0.506, 'grad_norm': 4.147926897905404, 'learning_rate': 2.578572908519465e-06, 'epoch': 0.51} 51%|█████ | 6222/12313 [4:39:28<4:27:08, 2.63s/it] 51%|█████ | 6223/12313 [4:39:30<4:27:20, 2.63s/it] {'loss': 0.5677, 'grad_norm': 4.123405600563359, 'learning_rate': 2.577915608511114e-06, 'epoch': 0.51} 51%|█████ | 6223/12313 [4:39:30<4:27:20, 2.63s/it] 51%|█████ | 6224/12313 [4:39:33<4:32:34, 2.69s/it] {'loss': 0.4562, 'grad_norm': 7.522333120828346, 'learning_rate': 2.5772583031114157e-06, 'epoch': 0.51} 51%|█████ | 6224/12313 [4:39:33<4:32:34, 2.69s/it] 51%|█████ | 6225/12313 [4:39:36<4:31:37, 2.68s/it] {'loss': 0.3849, 'grad_norm': 6.046806353687886, 'learning_rate': 2.5766009923658516e-06, 'epoch': 0.51} 51%|█████ | 6225/12313 [4:39:36<4:31:37, 2.68s/it] 51%|█████ | 6226/12313 [4:39:38<4:34:49, 2.71s/it] {'loss': 0.4868, 'grad_norm': 4.707865925528092, 'learning_rate': 2.5759436763199047e-06, 'epoch': 0.51} 51%|█████ | 6226/12313 [4:39:38<4:34:49, 2.71s/it] 51%|█████ | 6227/12313 [4:39:41<4:34:31, 2.71s/it] {'loss': 0.587, 'grad_norm': 4.715913190140127, 'learning_rate': 2.575286355019056e-06, 'epoch': 0.51} 51%|█████ | 6227/12313 [4:39:41<4:34:31, 2.71s/it] 51%|█████ | 6228/12313 [4:39:44<4:36:26, 2.73s/it] {'loss': 0.7003, 'grad_norm': 3.2497965234461264, 'learning_rate': 2.5746290285087912e-06, 'epoch': 0.51} 51%|█████ | 6228/12313 [4:39:44<4:36:26, 2.73s/it] 51%|█████ | 6229/12313 [4:39:47<4:32:39, 2.69s/it] {'loss': 0.6808, 'grad_norm': 3.280084946712466, 'learning_rate': 2.5739716968345922e-06, 'epoch': 0.51} 51%|█████ | 6229/12313 [4:39:47<4:32:39, 2.69s/it] 51%|█████ | 6230/12313 [4:39:49<4:33:26, 2.70s/it] {'loss': 0.4032, 'grad_norm': 5.724019303106518, 'learning_rate': 2.573314360041943e-06, 'epoch': 0.51} 51%|█████ | 6230/12313 [4:39:49<4:33:26, 2.70s/it] 51%|█████ | 6231/12313 [4:39:52<4:31:44, 2.68s/it] {'loss': 0.4832, 'grad_norm': 4.723214741303482, 'learning_rate': 2.5726570181763286e-06, 'epoch': 0.51} 51%|█████ | 6231/12313 [4:39:52<4:31:44, 2.68s/it] 51%|█████ | 6232/12313 [4:39:55<4:31:01, 2.67s/it] {'loss': 0.4762, 'grad_norm': 3.7300991619481363, 'learning_rate': 2.571999671283233e-06, 'epoch': 0.51} 51%|█████ | 6232/12313 [4:39:55<4:31:01, 2.67s/it] 51%|█████ | 6233/12313 [4:39:57<4:31:25, 2.68s/it] {'loss': 0.5565, 'grad_norm': 4.889315053935509, 'learning_rate': 2.5713423194081404e-06, 'epoch': 0.51} 51%|█████ | 6233/12313 [4:39:57<4:31:25, 2.68s/it] 51%|█████ | 6234/12313 [4:40:00<4:24:31, 2.61s/it] {'loss': 0.5569, 'grad_norm': 7.67062134201034, 'learning_rate': 2.570684962596538e-06, 'epoch': 0.51} 51%|█████ | 6234/12313 [4:40:00<4:24:31, 2.61s/it] 51%|█████ | 6235/12313 [4:40:03<4:31:51, 2.68s/it] {'loss': 0.4171, 'grad_norm': 4.100525163986654, 'learning_rate': 2.5700276008939096e-06, 'epoch': 0.51} 51%|█████ | 6235/12313 [4:40:03<4:31:51, 2.68s/it] 51%|█████ | 6236/12313 [4:40:05<4:31:20, 2.68s/it] {'loss': 0.4534, 'grad_norm': 4.883509578878357, 'learning_rate': 2.569370234345742e-06, 'epoch': 0.51} 51%|█████ | 6236/12313 [4:40:05<4:31:20, 2.68s/it] 51%|█████ | 6237/12313 [4:40:08<4:42:45, 2.79s/it] {'loss': 0.4791, 'grad_norm': 17.999174929712392, 'learning_rate': 2.568712862997522e-06, 'epoch': 0.51} 51%|█████ | 6237/12313 [4:40:08<4:42:45, 2.79s/it] 51%|█████ | 6238/12313 [4:40:11<4:32:29, 2.69s/it] {'loss': 0.3923, 'grad_norm': 3.501500494032202, 'learning_rate': 2.5680554868947346e-06, 'epoch': 0.51} 51%|█████ | 6238/12313 [4:40:11<4:32:29, 2.69s/it] 51%|█████ | 6239/12313 [4:40:13<4:28:53, 2.66s/it] {'loss': 0.557, 'grad_norm': 5.490863237584197, 'learning_rate': 2.5673981060828672e-06, 'epoch': 0.51} 51%|█████ | 6239/12313 [4:40:13<4:28:53, 2.66s/it] 51%|█████ | 6240/12313 [4:40:16<4:32:38, 2.69s/it] {'loss': 0.4779, 'grad_norm': 4.017635335289608, 'learning_rate': 2.5667407206074084e-06, 'epoch': 0.51} 51%|█████ | 6240/12313 [4:40:16<4:32:38, 2.69s/it] 51%|█████ | 6241/12313 [4:40:19<4:27:32, 2.64s/it] {'loss': 0.5404, 'grad_norm': 5.193790698559395, 'learning_rate': 2.566083330513845e-06, 'epoch': 0.51} 51%|█████ | 6241/12313 [4:40:19<4:27:32, 2.64s/it] 51%|█████ | 6242/12313 [4:40:21<4:24:30, 2.61s/it] {'loss': 0.5054, 'grad_norm': 3.781621395266743, 'learning_rate': 2.565425935847665e-06, 'epoch': 0.51} 51%|█████ | 6242/12313 [4:40:21<4:24:30, 2.61s/it] 51%|█████ | 6243/12313 [4:40:24<4:28:04, 2.65s/it] {'loss': 0.565, 'grad_norm': 6.125227421255686, 'learning_rate': 2.564768536654356e-06, 'epoch': 0.51} 51%|█████ | 6243/12313 [4:40:24<4:28:04, 2.65s/it] 51%|█████ | 6244/12313 [4:40:27<4:28:24, 2.65s/it] {'loss': 0.5639, 'grad_norm': 5.159395575524112, 'learning_rate': 2.564111132979407e-06, 'epoch': 0.51} 51%|█████ | 6244/12313 [4:40:27<4:28:24, 2.65s/it] 51%|█████ | 6245/12313 [4:40:29<4:30:00, 2.67s/it] {'loss': 0.4247, 'grad_norm': 5.800011134638956, 'learning_rate': 2.563453724868308e-06, 'epoch': 0.51} 51%|█████ | 6245/12313 [4:40:29<4:30:00, 2.67s/it] 51%|█████ | 6246/12313 [4:40:32<4:28:16, 2.65s/it] {'loss': 0.4201, 'grad_norm': 3.2153805898476606, 'learning_rate': 2.5627963123665455e-06, 'epoch': 0.51} 51%|█████ | 6246/12313 [4:40:32<4:28:16, 2.65s/it] 51%|█████ | 6247/12313 [4:40:34<4:23:23, 2.61s/it] {'loss': 0.472, 'grad_norm': 3.5092901295905405, 'learning_rate': 2.5621388955196113e-06, 'epoch': 0.51} 51%|█████ | 6247/12313 [4:40:34<4:23:23, 2.61s/it] 51%|█████ | 6248/12313 [4:40:37<4:24:18, 2.61s/it] {'loss': 0.4915, 'grad_norm': 8.695984681785939, 'learning_rate': 2.561481474372995e-06, 'epoch': 0.51} 51%|█████ | 6248/12313 [4:40:37<4:24:18, 2.61s/it] 51%|█████ | 6249/12313 [4:40:40<4:24:15, 2.61s/it] {'loss': 0.4936, 'grad_norm': 4.710656984668509, 'learning_rate': 2.560824048972185e-06, 'epoch': 0.51} 51%|█████ | 6249/12313 [4:40:40<4:24:15, 2.61s/it] 51%|█████ | 6250/12313 [4:40:42<4:19:35, 2.57s/it] {'loss': 0.6042, 'grad_norm': 7.293645049422129, 'learning_rate': 2.5601666193626735e-06, 'epoch': 0.51} 51%|█████ | 6250/12313 [4:40:42<4:19:35, 2.57s/it] 51%|█████ | 6251/12313 [4:40:45<4:21:46, 2.59s/it] {'loss': 0.443, 'grad_norm': 5.2621850945387285, 'learning_rate': 2.55950918558995e-06, 'epoch': 0.51} 51%|█████ | 6251/12313 [4:40:45<4:21:46, 2.59s/it] 51%|█████ | 6252/12313 [4:40:47<4:25:47, 2.63s/it] {'loss': 0.697, 'grad_norm': 3.4152519821072578, 'learning_rate': 2.558851747699506e-06, 'epoch': 0.51} 51%|█████ | 6252/12313 [4:40:47<4:25:47, 2.63s/it] 51%|█████ | 6253/12313 [4:40:50<4:25:01, 2.62s/it] {'loss': 0.4773, 'grad_norm': 5.202402703658923, 'learning_rate': 2.5581943057368317e-06, 'epoch': 0.51} 51%|█████ | 6253/12313 [4:40:50<4:25:01, 2.62s/it] 51%|█████ | 6254/12313 [4:40:52<4:19:34, 2.57s/it] {'loss': 0.3156, 'grad_norm': 7.606949152034, 'learning_rate': 2.5575368597474202e-06, 'epoch': 0.51} 51%|█████ | 6254/12313 [4:40:52<4:19:34, 2.57s/it] 51%|█████ | 6255/12313 [4:40:55<4:23:36, 2.61s/it] {'loss': 0.4507, 'grad_norm': 3.929776691209423, 'learning_rate': 2.5568794097767624e-06, 'epoch': 0.51} 51%|█████ | 6255/12313 [4:40:55<4:23:36, 2.61s/it] 51%|█████ | 6256/12313 [4:40:58<4:17:32, 2.55s/it] {'loss': 0.6335, 'grad_norm': 6.335298963383283, 'learning_rate': 2.5562219558703504e-06, 'epoch': 0.51} 51%|█████ | 6256/12313 [4:40:58<4:17:32, 2.55s/it] 51%|█████ | 6257/12313 [4:41:00<4:20:37, 2.58s/it] {'loss': 0.5108, 'grad_norm': 7.8562095782475865, 'learning_rate': 2.555564498073677e-06, 'epoch': 0.51} 51%|█████ | 6257/12313 [4:41:00<4:20:37, 2.58s/it] 51%|█████ | 6258/12313 [4:41:03<4:23:32, 2.61s/it] {'loss': 0.5093, 'grad_norm': 4.8650466475894145, 'learning_rate': 2.554907036432235e-06, 'epoch': 0.51} 51%|█████ | 6258/12313 [4:41:03<4:23:32, 2.61s/it] 51%|█████ | 6259/12313 [4:41:06<4:25:44, 2.63s/it] {'loss': 0.5999, 'grad_norm': 5.229859161415607, 'learning_rate': 2.554249570991515e-06, 'epoch': 0.51} 51%|█████ | 6259/12313 [4:41:06<4:25:44, 2.63s/it] 51%|█████ | 6260/12313 [4:41:08<4:27:05, 2.65s/it] {'loss': 0.435, 'grad_norm': 4.359502656532959, 'learning_rate': 2.5535921017970123e-06, 'epoch': 0.51} 51%|█████ | 6260/12313 [4:41:08<4:27:05, 2.65s/it] 51%|█████ | 6261/12313 [4:41:11<4:26:11, 2.64s/it] {'loss': 0.4546, 'grad_norm': 15.539251038374521, 'learning_rate': 2.5529346288942203e-06, 'epoch': 0.51} 51%|█████ | 6261/12313 [4:41:11<4:26:11, 2.64s/it] 51%|█████ | 6262/12313 [4:41:14<4:37:45, 2.75s/it] {'loss': 0.4273, 'grad_norm': 4.359760496582075, 'learning_rate': 2.5522771523286317e-06, 'epoch': 0.51} 51%|█████ | 6262/12313 [4:41:14<4:37:45, 2.75s/it] 51%|█████ | 6263/12313 [4:41:17<4:34:47, 2.73s/it] {'loss': 0.5175, 'grad_norm': 5.6985722122365114, 'learning_rate': 2.551619672145741e-06, 'epoch': 0.51} 51%|█████ | 6263/12313 [4:41:17<4:34:47, 2.73s/it] 51%|█████ | 6264/12313 [4:41:19<4:30:36, 2.68s/it] {'loss': 0.5247, 'grad_norm': 4.451602003719397, 'learning_rate': 2.5509621883910424e-06, 'epoch': 0.51} 51%|█████ | 6264/12313 [4:41:19<4:30:36, 2.68s/it] 51%|█████ | 6265/12313 [4:41:22<4:29:39, 2.68s/it] {'loss': 0.4593, 'grad_norm': 6.12534073894256, 'learning_rate': 2.55030470111003e-06, 'epoch': 0.51} 51%|█████ | 6265/12313 [4:41:22<4:29:39, 2.68s/it] 51%|█████ | 6266/12313 [4:41:25<4:37:03, 2.75s/it] {'loss': 0.523, 'grad_norm': 5.010881145167957, 'learning_rate': 2.5496472103481984e-06, 'epoch': 0.51} 51%|█████ | 6266/12313 [4:41:25<4:37:03, 2.75s/it] 51%|█████ | 6267/12313 [4:41:28<4:40:18, 2.78s/it] {'loss': 0.4728, 'grad_norm': 6.63962229868484, 'learning_rate': 2.5489897161510425e-06, 'epoch': 0.51} 51%|█████ | 6267/12313 [4:41:28<4:40:18, 2.78s/it] 51%|█████ | 6268/12313 [4:41:30<4:42:15, 2.80s/it] {'loss': 0.5094, 'grad_norm': 4.6745969311887485, 'learning_rate': 2.5483322185640575e-06, 'epoch': 0.51} 51%|█████ | 6268/12313 [4:41:30<4:42:15, 2.80s/it] 51%|█████ | 6269/12313 [4:41:33<4:37:31, 2.75s/it] {'loss': 0.5129, 'grad_norm': 7.671233857618601, 'learning_rate': 2.547674717632739e-06, 'epoch': 0.51} 51%|█████ | 6269/12313 [4:41:33<4:37:31, 2.75s/it] 51%|█████ | 6270/12313 [4:41:36<4:33:39, 2.72s/it] {'loss': 0.4483, 'grad_norm': 4.60865861862504, 'learning_rate': 2.547017213402582e-06, 'epoch': 0.51} 51%|█████ | 6270/12313 [4:41:36<4:33:39, 2.72s/it] 51%|█████ | 6271/12313 [4:41:39<4:38:33, 2.77s/it] {'loss': 0.4052, 'grad_norm': 6.2621430489481265, 'learning_rate': 2.546359705919083e-06, 'epoch': 0.51} 51%|█████ | 6271/12313 [4:41:39<4:38:33, 2.77s/it] 51%|█████ | 6272/12313 [4:41:41<4:31:53, 2.70s/it] {'loss': 0.504, 'grad_norm': 5.501701524912082, 'learning_rate': 2.545702195227737e-06, 'epoch': 0.51} 51%|█████ | 6272/12313 [4:41:41<4:31:53, 2.70s/it] 51%|█████ | 6273/12313 [4:41:44<4:33:13, 2.71s/it] {'loss': 0.7419, 'grad_norm': 9.43685395208808, 'learning_rate': 2.545044681374042e-06, 'epoch': 0.51} 51%|█████ | 6273/12313 [4:41:44<4:33:13, 2.71s/it] 51%|█████ | 6274/12313 [4:41:46<4:24:11, 2.62s/it] {'loss': 0.6057, 'grad_norm': 4.655667182094343, 'learning_rate': 2.544387164403493e-06, 'epoch': 0.51} 51%|█████ | 6274/12313 [4:41:46<4:24:11, 2.62s/it] 51%|█████ | 6275/12313 [4:41:49<4:27:24, 2.66s/it] {'loss': 0.3979, 'grad_norm': 6.207757916402813, 'learning_rate': 2.543729644361587e-06, 'epoch': 0.51} 51%|█████ | 6275/12313 [4:41:49<4:27:24, 2.66s/it] 51%|█████ | 6276/12313 [4:41:52<4:28:21, 2.67s/it] {'loss': 0.4088, 'grad_norm': 6.082343128329472, 'learning_rate': 2.5430721212938216e-06, 'epoch': 0.51} 51%|█████ | 6276/12313 [4:41:52<4:28:21, 2.67s/it] 51%|█████ | 6277/12313 [4:41:54<4:26:41, 2.65s/it] {'loss': 0.4561, 'grad_norm': 4.720976085216124, 'learning_rate': 2.542414595245693e-06, 'epoch': 0.51} 51%|█████ | 6277/12313 [4:41:54<4:26:41, 2.65s/it] 51%|█████ | 6278/12313 [4:41:57<4:24:49, 2.63s/it] {'loss': 0.6201, 'grad_norm': 4.73426813490108, 'learning_rate': 2.541757066262699e-06, 'epoch': 0.51} 51%|█████ | 6278/12313 [4:41:57<4:24:49, 2.63s/it] 51%|█████ | 6279/12313 [4:42:00<4:24:48, 2.63s/it] {'loss': 0.4821, 'grad_norm': 3.383613942551711, 'learning_rate': 2.541099534390336e-06, 'epoch': 0.51} 51%|█████ | 6279/12313 [4:42:00<4:24:48, 2.63s/it] 51%|█████ | 6280/12313 [4:42:02<4:21:54, 2.60s/it] {'loss': 0.4494, 'grad_norm': 8.433408631314988, 'learning_rate': 2.5404419996741042e-06, 'epoch': 0.51} 51%|█████ | 6280/12313 [4:42:02<4:21:54, 2.60s/it] 51%|█████ | 6281/12313 [4:42:05<4:34:04, 2.73s/it] {'loss': 0.5385, 'grad_norm': 5.815113828274007, 'learning_rate': 2.5397844621594997e-06, 'epoch': 0.51} 51%|█████ | 6281/12313 [4:42:05<4:34:04, 2.73s/it] 51%|█████ | 6282/12313 [4:42:08<4:35:35, 2.74s/it] {'loss': 0.4344, 'grad_norm': 4.687103952333867, 'learning_rate': 2.5391269218920202e-06, 'epoch': 0.51} 51%|█████ | 6282/12313 [4:42:08<4:35:35, 2.74s/it] 51%|█████ | 6283/12313 [4:42:11<4:43:06, 2.82s/it] {'loss': 0.4559, 'grad_norm': 5.089752144632809, 'learning_rate': 2.5384693789171656e-06, 'epoch': 0.51} 51%|█████ | 6283/12313 [4:42:11<4:43:06, 2.82s/it] 51%|█████ | 6284/12313 [4:42:14<4:44:14, 2.83s/it] {'loss': 0.5045, 'grad_norm': 5.854876777674836, 'learning_rate': 2.537811833280433e-06, 'epoch': 0.51} 51%|█████ | 6284/12313 [4:42:14<4:44:14, 2.83s/it] 51%|█████ | 6285/12313 [4:42:17<4:42:19, 2.81s/it] {'loss': 0.4898, 'grad_norm': 4.799615858367655, 'learning_rate': 2.5371542850273224e-06, 'epoch': 0.51} 51%|█████ | 6285/12313 [4:42:17<4:42:19, 2.81s/it] 51%|█████ | 6286/12313 [4:42:19<4:37:29, 2.76s/it] {'loss': 0.6302, 'grad_norm': 5.42400764526346, 'learning_rate': 2.5364967342033307e-06, 'epoch': 0.51} 51%|█████ | 6286/12313 [4:42:19<4:37:29, 2.76s/it] 51%|█████ | 6287/12313 [4:42:23<4:57:28, 2.96s/it] {'loss': 0.4977, 'grad_norm': 4.933143417479578, 'learning_rate': 2.5358391808539597e-06, 'epoch': 0.51} 51%|█████ | 6287/12313 [4:42:23<4:57:28, 2.96s/it] 51%|█████ | 6288/12313 [4:42:25<4:47:27, 2.86s/it] {'loss': 0.4809, 'grad_norm': 3.4223345719076113, 'learning_rate': 2.535181625024706e-06, 'epoch': 0.51} 51%|█████ | 6288/12313 [4:42:25<4:47:27, 2.86s/it] 51%|█████ | 6289/12313 [4:42:28<4:41:40, 2.81s/it] {'loss': 0.3358, 'grad_norm': 9.061912369576826, 'learning_rate': 2.53452406676107e-06, 'epoch': 0.51} 51%|█████ | 6289/12313 [4:42:28<4:41:40, 2.81s/it] 51%|█████ | 6290/12313 [4:42:31<4:36:41, 2.76s/it] {'loss': 0.6054, 'grad_norm': 10.525328047917656, 'learning_rate': 2.5338665061085518e-06, 'epoch': 0.51} 51%|█████ | 6290/12313 [4:42:31<4:36:41, 2.76s/it] 51%|█████ | 6291/12313 [4:42:33<4:36:30, 2.76s/it] {'loss': 0.6881, 'grad_norm': 5.383474823413491, 'learning_rate': 2.5332089431126504e-06, 'epoch': 0.51} 51%|█████ | 6291/12313 [4:42:33<4:36:30, 2.76s/it] 51%|█████ | 6292/12313 [4:42:36<4:44:29, 2.83s/it] {'loss': 0.6387, 'grad_norm': 11.602351673058207, 'learning_rate': 2.532551377818866e-06, 'epoch': 0.51} 51%|█████ | 6292/12313 [4:42:36<4:44:29, 2.83s/it] 51%|█████ | 6293/12313 [4:42:39<4:39:50, 2.79s/it] {'loss': 0.6418, 'grad_norm': 4.317046810663442, 'learning_rate': 2.5318938102726985e-06, 'epoch': 0.51} 51%|█████ | 6293/12313 [4:42:39<4:39:50, 2.79s/it] 51%|█████ | 6294/12313 [4:42:42<4:31:35, 2.71s/it] {'loss': 0.4955, 'grad_norm': 4.588601931278866, 'learning_rate': 2.5312362405196485e-06, 'epoch': 0.51} 51%|█████ | 6294/12313 [4:42:42<4:31:35, 2.71s/it] 51%|█████ | 6295/12313 [4:42:45<4:40:33, 2.80s/it] {'loss': 0.4686, 'grad_norm': 5.9618029800437204, 'learning_rate': 2.530578668605215e-06, 'epoch': 0.51} 51%|█████ | 6295/12313 [4:42:45<4:40:33, 2.80s/it] 51%|█████ | 6296/12313 [4:42:47<4:44:02, 2.83s/it] {'loss': 0.624, 'grad_norm': 4.309047195757413, 'learning_rate': 2.5299210945749005e-06, 'epoch': 0.51} 51%|█████ | 6296/12313 [4:42:47<4:44:02, 2.83s/it] 51%|█████ | 6297/12313 [4:42:50<4:41:16, 2.81s/it] {'loss': 0.5655, 'grad_norm': 6.062649085654284, 'learning_rate': 2.529263518474204e-06, 'epoch': 0.51} 51%|█████ | 6297/12313 [4:42:50<4:41:16, 2.81s/it] 51%|█████ | 6298/12313 [4:42:53<4:46:05, 2.85s/it] {'loss': 0.6048, 'grad_norm': 8.101899866072504, 'learning_rate': 2.5286059403486262e-06, 'epoch': 0.51} 51%|█████ | 6298/12313 [4:42:53<4:46:05, 2.85s/it] 51%|█████ | 6299/12313 [4:42:56<4:37:20, 2.77s/it] {'loss': 0.5202, 'grad_norm': 4.855939734927873, 'learning_rate': 2.52794836024367e-06, 'epoch': 0.51} 51%|█████ | 6299/12313 [4:42:56<4:37:20, 2.77s/it] 51%|█████ | 6300/12313 [4:42:58<4:35:04, 2.74s/it] {'loss': 0.6758, 'grad_norm': 4.1533242601287315, 'learning_rate': 2.5272907782048343e-06, 'epoch': 0.51} 51%|█████ | 6300/12313 [4:42:58<4:35:04, 2.74s/it] 51%|█████ | 6301/12313 [4:43:01<4:27:10, 2.67s/it] {'loss': 0.4042, 'grad_norm': 44.854170715654014, 'learning_rate': 2.526633194277622e-06, 'epoch': 0.51} 51%|█████ | 6301/12313 [4:43:01<4:27:10, 2.67s/it] 51%|█████ | 6302/12313 [4:43:04<4:27:32, 2.67s/it] {'loss': 0.641, 'grad_norm': 4.477675057177277, 'learning_rate': 2.5259756085075333e-06, 'epoch': 0.51} 51%|█████ | 6302/12313 [4:43:04<4:27:32, 2.67s/it] 51%|█████ | 6303/12313 [4:43:06<4:20:54, 2.60s/it] {'loss': 0.4153, 'grad_norm': 5.00794940429514, 'learning_rate': 2.5253180209400697e-06, 'epoch': 0.51} 51%|█████ | 6303/12313 [4:43:06<4:20:54, 2.60s/it] 51%|█████ | 6304/12313 [4:43:09<4:20:35, 2.60s/it] {'loss': 0.3842, 'grad_norm': 35.355057930038036, 'learning_rate': 2.5246604316207327e-06, 'epoch': 0.51} 51%|█████ | 6304/12313 [4:43:09<4:20:35, 2.60s/it] 51%|█████ | 6305/12313 [4:43:11<4:15:51, 2.56s/it] {'loss': 0.5766, 'grad_norm': 5.069001886917277, 'learning_rate': 2.524002840595025e-06, 'epoch': 0.51} 51%|█████ | 6305/12313 [4:43:11<4:15:51, 2.56s/it] 51%|█████ | 6306/12313 [4:43:14<4:17:30, 2.57s/it] {'loss': 0.5046, 'grad_norm': 4.7801204519502845, 'learning_rate': 2.523345247908448e-06, 'epoch': 0.51} 51%|█████ | 6306/12313 [4:43:14<4:17:30, 2.57s/it] 51%|█████ | 6307/12313 [4:43:16<4:16:09, 2.56s/it] {'loss': 0.4069, 'grad_norm': 13.094992788363395, 'learning_rate': 2.522687653606503e-06, 'epoch': 0.51} 51%|█████ | 6307/12313 [4:43:16<4:16:09, 2.56s/it] 51%|█████ | 6308/12313 [4:43:19<4:12:06, 2.52s/it] {'loss': 0.6193, 'grad_norm': 3.4115277705733273, 'learning_rate': 2.5220300577346925e-06, 'epoch': 0.51} 51%|█████ | 6308/12313 [4:43:19<4:12:06, 2.52s/it] 51%|█████ | 6309/12313 [4:43:22<4:25:23, 2.65s/it] {'loss': 0.4763, 'grad_norm': 4.934143576608908, 'learning_rate': 2.521372460338518e-06, 'epoch': 0.51} 51%|█████ | 6309/12313 [4:43:22<4:25:23, 2.65s/it] 51%|█████ | 6310/12313 [4:43:24<4:30:28, 2.70s/it] {'loss': 0.5961, 'grad_norm': 4.955171281673901, 'learning_rate': 2.5207148614634836e-06, 'epoch': 0.51} 51%|█████ | 6310/12313 [4:43:24<4:30:28, 2.70s/it] 51%|█████▏ | 6311/12313 [4:43:27<4:26:30, 2.66s/it] {'loss': 0.4608, 'grad_norm': 5.804504802010579, 'learning_rate': 2.5200572611550893e-06, 'epoch': 0.51} 51%|█████▏ | 6311/12313 [4:43:27<4:26:30, 2.66s/it] 51%|█████▏ | 6312/12313 [4:43:30<4:35:12, 2.75s/it] {'loss': 0.5245, 'grad_norm': 3.3522351499110346, 'learning_rate': 2.5193996594588395e-06, 'epoch': 0.51} 51%|█████▏ | 6312/12313 [4:43:30<4:35:12, 2.75s/it] 51%|█████▏ | 6313/12313 [4:43:33<4:33:19, 2.73s/it] {'loss': 0.6784, 'grad_norm': 4.503464521908895, 'learning_rate': 2.5187420564202357e-06, 'epoch': 0.51} 51%|█████▏ | 6313/12313 [4:43:33<4:33:19, 2.73s/it] 51%|█████▏ | 6314/12313 [4:43:35<4:36:32, 2.77s/it] {'loss': 0.5239, 'grad_norm': 4.619599612350149, 'learning_rate': 2.518084452084781e-06, 'epoch': 0.51} 51%|█████▏ | 6314/12313 [4:43:36<4:36:32, 2.77s/it] 51%|█████▏ | 6315/12313 [4:43:38<4:35:40, 2.76s/it] {'loss': 0.4502, 'grad_norm': 6.2490361812009745, 'learning_rate': 2.5174268464979775e-06, 'epoch': 0.51} 51%|█████▏ | 6315/12313 [4:43:38<4:35:40, 2.76s/it] 51%|█████▏ | 6316/12313 [4:43:41<4:30:57, 2.71s/it] {'loss': 0.45, 'grad_norm': 6.493719400660229, 'learning_rate': 2.516769239705328e-06, 'epoch': 0.51} 51%|█████▏ | 6316/12313 [4:43:41<4:30:57, 2.71s/it] 51%|█████▏ | 6317/12313 [4:43:43<4:27:42, 2.68s/it] {'loss': 0.5184, 'grad_norm': 15.674054593332599, 'learning_rate': 2.5161116317523367e-06, 'epoch': 0.51} 51%|█████▏ | 6317/12313 [4:43:43<4:27:42, 2.68s/it] 51%|█████▏ | 6318/12313 [4:43:46<4:25:27, 2.66s/it] {'loss': 0.4148, 'grad_norm': 5.072386439299523, 'learning_rate': 2.5154540226845053e-06, 'epoch': 0.51} 51%|█████▏ | 6318/12313 [4:43:46<4:25:27, 2.66s/it] 51%|█████▏ | 6319/12313 [4:43:49<4:23:00, 2.63s/it] {'loss': 0.6593, 'grad_norm': 6.330463701552802, 'learning_rate': 2.514796412547337e-06, 'epoch': 0.51} 51%|█████▏ | 6319/12313 [4:43:49<4:23:00, 2.63s/it] 51%|█████▏ | 6320/12313 [4:43:51<4:26:19, 2.67s/it] {'loss': 0.6006, 'grad_norm': 6.329569453939536, 'learning_rate': 2.5141388013863366e-06, 'epoch': 0.51} 51%|█████▏ | 6320/12313 [4:43:51<4:26:19, 2.67s/it] 51%|█████▏ | 6321/12313 [4:43:54<4:24:11, 2.65s/it] {'loss': 0.4651, 'grad_norm': 4.995421040496941, 'learning_rate': 2.5134811892470046e-06, 'epoch': 0.51} 51%|█████▏ | 6321/12313 [4:43:54<4:24:11, 2.65s/it] 51%|█████▏ | 6322/12313 [4:43:57<4:22:24, 2.63s/it] {'loss': 0.34, 'grad_norm': 6.312161136312402, 'learning_rate': 2.512823576174846e-06, 'epoch': 0.51} 51%|█████▏ | 6322/12313 [4:43:57<4:22:24, 2.63s/it] 51%|█████▏ | 6323/12313 [4:43:59<4:17:50, 2.58s/it] {'loss': 0.4297, 'grad_norm': 4.377238702155564, 'learning_rate': 2.5121659622153643e-06, 'epoch': 0.51} 51%|█████▏ | 6323/12313 [4:43:59<4:17:50, 2.58s/it] 51%|█████▏ | 6324/12313 [4:44:01<4:13:26, 2.54s/it] {'loss': 0.5385, 'grad_norm': 4.928516077514282, 'learning_rate': 2.511508347414062e-06, 'epoch': 0.51} 51%|█████▏ | 6324/12313 [4:44:01<4:13:26, 2.54s/it] 51%|█████▏ | 6325/12313 [4:44:04<4:17:45, 2.58s/it] {'loss': 0.3269, 'grad_norm': 7.762372366845343, 'learning_rate': 2.510850731816443e-06, 'epoch': 0.51} 51%|█████▏ | 6325/12313 [4:44:04<4:17:45, 2.58s/it] 51%|█████▏ | 6326/12313 [4:44:07<4:17:58, 2.59s/it] {'loss': 0.3859, 'grad_norm': 5.504193583882282, 'learning_rate': 2.510193115468011e-06, 'epoch': 0.51} 51%|█████▏ | 6326/12313 [4:44:07<4:17:58, 2.59s/it] 51%|█████▏ | 6327/12313 [4:44:09<4:22:52, 2.63s/it] {'loss': 0.4028, 'grad_norm': 6.152137635079484, 'learning_rate': 2.5095354984142682e-06, 'epoch': 0.51} 51%|█████▏ | 6327/12313 [4:44:09<4:22:52, 2.63s/it] 51%|█████▏ | 6328/12313 [4:44:13<4:42:58, 2.84s/it] {'loss': 0.3954, 'grad_norm': 7.085948581559598, 'learning_rate': 2.5088778807007203e-06, 'epoch': 0.51} 51%|█████▏ | 6328/12313 [4:44:13<4:42:58, 2.84s/it] 51%|█████▏ | 6329/12313 [4:44:15<4:37:50, 2.79s/it] {'loss': 0.4362, 'grad_norm': 5.649925226999557, 'learning_rate': 2.5082202623728707e-06, 'epoch': 0.51} 51%|█████▏ | 6329/12313 [4:44:15<4:37:50, 2.79s/it] 51%|█████▏ | 6330/12313 [4:44:18<4:31:52, 2.73s/it] {'loss': 0.7279, 'grad_norm': 7.520999779577865, 'learning_rate': 2.507562643476222e-06, 'epoch': 0.51} 51%|█████▏ | 6330/12313 [4:44:18<4:31:52, 2.73s/it] 51%|█████▏ | 6331/12313 [4:44:21<4:31:50, 2.73s/it] {'loss': 0.745, 'grad_norm': 3.4023713944158143, 'learning_rate': 2.5069050240562782e-06, 'epoch': 0.51} 51%|█████▏ | 6331/12313 [4:44:21<4:31:50, 2.73s/it] 51%|█████▏ | 6332/12313 [4:44:23<4:30:50, 2.72s/it] {'loss': 0.5726, 'grad_norm': 5.4405959398028125, 'learning_rate': 2.5062474041585432e-06, 'epoch': 0.51} 51%|█████▏ | 6332/12313 [4:44:23<4:30:50, 2.72s/it] 51%|█████▏ | 6333/12313 [4:44:26<4:22:30, 2.63s/it] {'loss': 0.5267, 'grad_norm': 5.517222494769225, 'learning_rate': 2.5055897838285207e-06, 'epoch': 0.51} 51%|█████▏ | 6333/12313 [4:44:26<4:22:30, 2.63s/it] 51%|█████▏ | 6334/12313 [4:44:28<4:14:20, 2.55s/it] {'loss': 0.5198, 'grad_norm': 4.726623734798838, 'learning_rate': 2.504932163111715e-06, 'epoch': 0.51} 51%|█████▏ | 6334/12313 [4:44:28<4:14:20, 2.55s/it] 51%|█████▏ | 6335/12313 [4:44:31<4:21:14, 2.62s/it] {'loss': 0.5967, 'grad_norm': 4.440896263973417, 'learning_rate': 2.5042745420536295e-06, 'epoch': 0.51} 51%|█████▏ | 6335/12313 [4:44:31<4:21:14, 2.62s/it] 51%|█████▏ | 6336/12313 [4:44:34<4:27:02, 2.68s/it] {'loss': 0.5971, 'grad_norm': 3.37064042601589, 'learning_rate': 2.503616920699769e-06, 'epoch': 0.51} 51%|█████▏ | 6336/12313 [4:44:34<4:27:02, 2.68s/it] 51%|█████▏ | 6337/12313 [4:44:36<4:24:07, 2.65s/it] {'loss': 0.5007, 'grad_norm': 4.690221273319353, 'learning_rate': 2.502959299095636e-06, 'epoch': 0.51} 51%|█████▏ | 6337/12313 [4:44:36<4:24:07, 2.65s/it] 51%|█████▏ | 6338/12313 [4:44:39<4:18:30, 2.60s/it] {'loss': 0.5678, 'grad_norm': 4.908136131112301, 'learning_rate': 2.5023016772867353e-06, 'epoch': 0.51} 51%|█████▏ | 6338/12313 [4:44:39<4:18:30, 2.60s/it] 51%|█████▏ | 6339/12313 [4:44:41<4:14:47, 2.56s/it] {'loss': 0.4118, 'grad_norm': 5.443552173567539, 'learning_rate': 2.5016440553185718e-06, 'epoch': 0.51} 51%|█████▏ | 6339/12313 [4:44:41<4:14:47, 2.56s/it] 51%|█████▏ | 6340/12313 [4:44:44<4:21:02, 2.62s/it] {'loss': 0.6241, 'grad_norm': 6.1838875143833665, 'learning_rate': 2.5009864332366467e-06, 'epoch': 0.51} 51%|█████▏ | 6340/12313 [4:44:44<4:21:02, 2.62s/it] 51%|█████▏ | 6341/12313 [4:44:47<4:25:30, 2.67s/it] {'loss': 0.5065, 'grad_norm': 3.986184988106473, 'learning_rate': 2.5003288110864664e-06, 'epoch': 0.51} 51%|█████▏ | 6341/12313 [4:44:47<4:25:30, 2.67s/it] 52%|█████▏ | 6342/12313 [4:44:50<4:27:16, 2.69s/it] {'loss': 0.6497, 'grad_norm': 4.231672066174447, 'learning_rate': 2.4996711889135344e-06, 'epoch': 0.52} 52%|█████▏ | 6342/12313 [4:44:50<4:27:16, 2.69s/it] 52%|█████▏ | 6343/12313 [4:44:52<4:26:05, 2.67s/it] {'loss': 0.4079, 'grad_norm': 5.815390035755676, 'learning_rate': 2.499013566763354e-06, 'epoch': 0.52} 52%|█████▏ | 6343/12313 [4:44:52<4:26:05, 2.67s/it] 52%|█████▏ | 6344/12313 [4:44:55<4:26:51, 2.68s/it] {'loss': 0.438, 'grad_norm': 7.9133098452043455, 'learning_rate': 2.4983559446814295e-06, 'epoch': 0.52} 52%|█████▏ | 6344/12313 [4:44:55<4:26:51, 2.68s/it] 52%|█████▏ | 6345/12313 [4:44:58<4:25:29, 2.67s/it] {'loss': 0.5876, 'grad_norm': 3.207812612582073, 'learning_rate': 2.497698322713265e-06, 'epoch': 0.52} 52%|█████▏ | 6345/12313 [4:44:58<4:25:29, 2.67s/it] 52%|█████▏ | 6346/12313 [4:45:00<4:23:17, 2.65s/it] {'loss': 0.5272, 'grad_norm': 4.353200703794427, 'learning_rate': 2.4970407009043646e-06, 'epoch': 0.52} 52%|█████▏ | 6346/12313 [4:45:00<4:23:17, 2.65s/it] 52%|█████▏ | 6347/12313 [4:45:03<4:22:29, 2.64s/it] {'loss': 0.4392, 'grad_norm': 5.892469802972017, 'learning_rate': 2.4963830793002313e-06, 'epoch': 0.52} 52%|█████▏ | 6347/12313 [4:45:03<4:22:29, 2.64s/it] 52%|█████▏ | 6348/12313 [4:45:05<4:20:31, 2.62s/it] {'loss': 0.326, 'grad_norm': 7.87771121157062, 'learning_rate': 2.495725457946371e-06, 'epoch': 0.52} 52%|█████▏ | 6348/12313 [4:45:05<4:20:31, 2.62s/it] 52%|█████▏ | 6349/12313 [4:45:08<4:18:32, 2.60s/it] {'loss': 0.5542, 'grad_norm': 4.905801737169398, 'learning_rate': 2.4950678368882863e-06, 'epoch': 0.52} 52%|█████▏ | 6349/12313 [4:45:08<4:18:32, 2.60s/it] 52%|█████▏ | 6350/12313 [4:45:11<4:19:09, 2.61s/it] {'loss': 0.5257, 'grad_norm': 3.2384008143696117, 'learning_rate': 2.49441021617148e-06, 'epoch': 0.52} 52%|█████▏ | 6350/12313 [4:45:11<4:19:09, 2.61s/it] 52%|█████▏ | 6351/12313 [4:45:14<4:32:23, 2.74s/it] {'loss': 0.6298, 'grad_norm': 8.399266140485144, 'learning_rate': 2.4937525958414576e-06, 'epoch': 0.52} 52%|█████▏ | 6351/12313 [4:45:14<4:32:23, 2.74s/it] 52%|█████▏ | 6352/12313 [4:45:16<4:25:53, 2.68s/it] {'loss': 0.3812, 'grad_norm': 6.628609352541173, 'learning_rate': 2.4930949759437234e-06, 'epoch': 0.52} 52%|█████▏ | 6352/12313 [4:45:16<4:25:53, 2.68s/it] 52%|█████▏ | 6353/12313 [4:45:19<4:26:35, 2.68s/it] {'loss': 0.5552, 'grad_norm': 7.882154633725088, 'learning_rate': 2.492437356523779e-06, 'epoch': 0.52} 52%|█████▏ | 6353/12313 [4:45:19<4:26:35, 2.68s/it] 52%|█████▏ | 6354/12313 [4:45:22<4:29:25, 2.71s/it] {'loss': 0.5093, 'grad_norm': 9.337887778597489, 'learning_rate': 2.4917797376271297e-06, 'epoch': 0.52} 52%|█████▏ | 6354/12313 [4:45:22<4:29:25, 2.71s/it] 52%|█████▏ | 6355/12313 [4:45:24<4:21:51, 2.64s/it] {'loss': 0.5847, 'grad_norm': 7.376950318246357, 'learning_rate': 2.49112211929928e-06, 'epoch': 0.52} 52%|█████▏ | 6355/12313 [4:45:24<4:21:51, 2.64s/it] 52%|█████▏ | 6356/12313 [4:45:27<4:17:29, 2.59s/it] {'loss': 0.5784, 'grad_norm': 6.482321048724295, 'learning_rate': 2.4904645015857318e-06, 'epoch': 0.52} 52%|█████▏ | 6356/12313 [4:45:27<4:17:29, 2.59s/it] 52%|█████▏ | 6357/12313 [4:45:29<4:20:07, 2.62s/it] {'loss': 0.5229, 'grad_norm': 5.797801300962893, 'learning_rate': 2.48980688453199e-06, 'epoch': 0.52} 52%|█████▏ | 6357/12313 [4:45:29<4:20:07, 2.62s/it] 52%|█████▏ | 6358/12313 [4:45:32<4:18:07, 2.60s/it] {'loss': 0.4893, 'grad_norm': 5.360257972509032, 'learning_rate': 2.4891492681835584e-06, 'epoch': 0.52} 52%|█████▏ | 6358/12313 [4:45:32<4:18:07, 2.60s/it] 52%|█████▏ | 6359/12313 [4:45:35<4:20:56, 2.63s/it] {'loss': 0.4775, 'grad_norm': 5.202599723312153, 'learning_rate': 2.4884916525859386e-06, 'epoch': 0.52} 52%|█████▏ | 6359/12313 [4:45:35<4:20:56, 2.63s/it] 52%|█████▏ | 6360/12313 [4:45:37<4:19:15, 2.61s/it] {'loss': 0.4474, 'grad_norm': 6.783080832818119, 'learning_rate': 2.4878340377846365e-06, 'epoch': 0.52} 52%|█████▏ | 6360/12313 [4:45:37<4:19:15, 2.61s/it] 52%|█████▏ | 6361/12313 [4:45:40<4:16:32, 2.59s/it] {'loss': 0.4225, 'grad_norm': 12.845276601042336, 'learning_rate': 2.4871764238251547e-06, 'epoch': 0.52} 52%|█████▏ | 6361/12313 [4:45:40<4:16:32, 2.59s/it] 52%|█████▏ | 6362/12313 [4:45:42<4:22:11, 2.64s/it] {'loss': 0.5043, 'grad_norm': 7.494098434686925, 'learning_rate': 2.4865188107529963e-06, 'epoch': 0.52} 52%|█████▏ | 6362/12313 [4:45:42<4:22:11, 2.64s/it] 52%|█████▏ | 6363/12313 [4:45:45<4:32:08, 2.74s/it] {'loss': 0.4393, 'grad_norm': 5.004627980302835, 'learning_rate': 2.485861198613664e-06, 'epoch': 0.52} 52%|█████▏ | 6363/12313 [4:45:45<4:32:08, 2.74s/it] 52%|█████▏ | 6364/12313 [4:45:48<4:20:55, 2.63s/it] {'loss': 0.5499, 'grad_norm': 6.726551242092707, 'learning_rate': 2.4852035874526632e-06, 'epoch': 0.52} 52%|█████▏ | 6364/12313 [4:45:48<4:20:55, 2.63s/it] 52%|█████▏ | 6365/12313 [4:45:51<4:24:10, 2.66s/it] {'loss': 0.5416, 'grad_norm': 4.666439480933289, 'learning_rate': 2.4845459773154964e-06, 'epoch': 0.52} 52%|█████▏ | 6365/12313 [4:45:51<4:24:10, 2.66s/it] 52%|█████▏ | 6366/12313 [4:45:53<4:25:55, 2.68s/it] {'loss': 0.5082, 'grad_norm': 5.180749129277349, 'learning_rate': 2.483888368247664e-06, 'epoch': 0.52} 52%|█████▏ | 6366/12313 [4:45:53<4:25:55, 2.68s/it] 52%|█████▏ | 6367/12313 [4:45:56<4:17:58, 2.60s/it] {'loss': 0.4328, 'grad_norm': 4.494845006036604, 'learning_rate': 2.4832307602946726e-06, 'epoch': 0.52} 52%|█████▏ | 6367/12313 [4:45:56<4:17:58, 2.60s/it] 52%|█████▏ | 6368/12313 [4:45:58<4:19:15, 2.62s/it] {'loss': 0.4709, 'grad_norm': 7.722256382211164, 'learning_rate': 2.4825731535020242e-06, 'epoch': 0.52} 52%|█████▏ | 6368/12313 [4:45:58<4:19:15, 2.62s/it] 52%|█████▏ | 6369/12313 [4:46:01<4:23:58, 2.66s/it] {'loss': 0.5804, 'grad_norm': 5.336177066714621, 'learning_rate': 2.48191554791522e-06, 'epoch': 0.52} 52%|█████▏ | 6369/12313 [4:46:01<4:23:58, 2.66s/it] 52%|█████▏ | 6370/12313 [4:46:04<4:18:57, 2.61s/it] {'loss': 0.4719, 'grad_norm': 6.222339347839942, 'learning_rate': 2.481257943579765e-06, 'epoch': 0.52} 52%|█████▏ | 6370/12313 [4:46:04<4:18:57, 2.61s/it] 52%|█████▏ | 6371/12313 [4:46:06<4:21:26, 2.64s/it] {'loss': 0.6593, 'grad_norm': 7.613124790360803, 'learning_rate': 2.4806003405411617e-06, 'epoch': 0.52} 52%|█████▏ | 6371/12313 [4:46:06<4:21:26, 2.64s/it] 52%|█████▏ | 6372/12313 [4:46:09<4:33:59, 2.77s/it] {'loss': 0.4365, 'grad_norm': 6.354106819523852, 'learning_rate': 2.479942738844911e-06, 'epoch': 0.52} 52%|█████▏ | 6372/12313 [4:46:09<4:33:59, 2.77s/it] 52%|█████▏ | 6373/12313 [4:46:12<4:27:23, 2.70s/it] {'loss': 0.5308, 'grad_norm': 5.688366482514378, 'learning_rate': 2.479285138536517e-06, 'epoch': 0.52} 52%|█████▏ | 6373/12313 [4:46:12<4:27:23, 2.70s/it] 52%|█████▏ | 6374/12313 [4:46:14<4:20:34, 2.63s/it] {'loss': 0.4804, 'grad_norm': 5.415154693434397, 'learning_rate': 2.4786275396614823e-06, 'epoch': 0.52} 52%|█████▏ | 6374/12313 [4:46:14<4:20:34, 2.63s/it] 52%|█████▏ | 6375/12313 [4:46:17<4:25:30, 2.68s/it] {'loss': 0.3669, 'grad_norm': 28.82102125102892, 'learning_rate': 2.477969942265308e-06, 'epoch': 0.52} 52%|█████▏ | 6375/12313 [4:46:17<4:25:30, 2.68s/it] 52%|█████▏ | 6376/12313 [4:46:20<4:26:00, 2.69s/it] {'loss': 0.6299, 'grad_norm': 6.810481547484077, 'learning_rate': 2.4773123463934973e-06, 'epoch': 0.52} 52%|█████▏ | 6376/12313 [4:46:20<4:26:00, 2.69s/it] 52%|█████▏ | 6377/12313 [4:46:23<4:27:15, 2.70s/it] {'loss': 0.5736, 'grad_norm': 4.392949418938218, 'learning_rate': 2.476654752091553e-06, 'epoch': 0.52} 52%|█████▏ | 6377/12313 [4:46:23<4:27:15, 2.70s/it] 52%|█████▏ | 6378/12313 [4:46:26<4:34:32, 2.78s/it] {'loss': 0.5811, 'grad_norm': 4.2704517486876234, 'learning_rate': 2.4759971594049763e-06, 'epoch': 0.52} 52%|█████▏ | 6378/12313 [4:46:26<4:34:32, 2.78s/it] 52%|█████▏ | 6379/12313 [4:46:28<4:23:19, 2.66s/it] {'loss': 0.4761, 'grad_norm': 6.692688448203209, 'learning_rate': 2.4753395683792677e-06, 'epoch': 0.52} 52%|█████▏ | 6379/12313 [4:46:28<4:23:19, 2.66s/it] 52%|█████▏ | 6380/12313 [4:46:31<4:22:17, 2.65s/it] {'loss': 0.5694, 'grad_norm': 7.334413784638161, 'learning_rate': 2.474681979059931e-06, 'epoch': 0.52} 52%|█████▏ | 6380/12313 [4:46:31<4:22:17, 2.65s/it] 52%|█████▏ | 6381/12313 [4:46:33<4:26:57, 2.70s/it] {'loss': 0.4231, 'grad_norm': 4.544276408270827, 'learning_rate': 2.474024391492468e-06, 'epoch': 0.52} 52%|█████▏ | 6381/12313 [4:46:33<4:26:57, 2.70s/it] 52%|█████▏ | 6382/12313 [4:46:36<4:22:53, 2.66s/it] {'loss': 0.6074, 'grad_norm': 4.593897927457953, 'learning_rate': 2.473366805722379e-06, 'epoch': 0.52} 52%|█████▏ | 6382/12313 [4:46:36<4:22:53, 2.66s/it] 52%|█████▏ | 6383/12313 [4:46:39<4:23:42, 2.67s/it] {'loss': 0.4831, 'grad_norm': 4.897073039683832, 'learning_rate': 2.472709221795166e-06, 'epoch': 0.52} 52%|█████▏ | 6383/12313 [4:46:39<4:23:42, 2.67s/it] 52%|█████▏ | 6384/12313 [4:46:41<4:23:51, 2.67s/it] {'loss': 0.5052, 'grad_norm': 4.507107061670829, 'learning_rate': 2.4720516397563314e-06, 'epoch': 0.52} 52%|█████▏ | 6384/12313 [4:46:41<4:23:51, 2.67s/it] 52%|█████▏ | 6385/12313 [4:46:44<4:20:03, 2.63s/it] {'loss': 0.6067, 'grad_norm': 4.582461812162122, 'learning_rate': 2.471394059651374e-06, 'epoch': 0.52} 52%|█████▏ | 6385/12313 [4:46:44<4:20:03, 2.63s/it] 52%|█████▏ | 6386/12313 [4:46:46<4:18:33, 2.62s/it] {'loss': 0.6531, 'grad_norm': 5.866771836678322, 'learning_rate': 2.470736481525797e-06, 'epoch': 0.52} 52%|█████▏ | 6386/12313 [4:46:46<4:18:33, 2.62s/it] 52%|█████▏ | 6387/12313 [4:46:49<4:20:19, 2.64s/it] {'loss': 0.4097, 'grad_norm': 5.813106703432218, 'learning_rate': 2.470078905425101e-06, 'epoch': 0.52} 52%|█████▏ | 6387/12313 [4:46:49<4:20:19, 2.64s/it] 52%|█████▏ | 6388/12313 [4:46:52<4:22:09, 2.65s/it] {'loss': 0.4998, 'grad_norm': 7.253453752131115, 'learning_rate': 2.4694213313947855e-06, 'epoch': 0.52} 52%|█████▏ | 6388/12313 [4:46:52<4:22:09, 2.65s/it] 52%|█████▏ | 6389/12313 [4:46:54<4:20:52, 2.64s/it] {'loss': 0.5751, 'grad_norm': 3.081751300748125, 'learning_rate': 2.4687637594803527e-06, 'epoch': 0.52} 52%|█████▏ | 6389/12313 [4:46:54<4:20:52, 2.64s/it] 52%|█████▏ | 6390/12313 [4:46:57<4:25:16, 2.69s/it] {'loss': 0.4083, 'grad_norm': 5.45405710599491, 'learning_rate': 2.4681061897273028e-06, 'epoch': 0.52} 52%|█████▏ | 6390/12313 [4:46:57<4:25:16, 2.69s/it] 52%|█████▏ | 6391/12313 [4:47:00<4:24:53, 2.68s/it] {'loss': 0.4051, 'grad_norm': 4.843125937608803, 'learning_rate': 2.4674486221811345e-06, 'epoch': 0.52} 52%|█████▏ | 6391/12313 [4:47:00<4:24:53, 2.68s/it] 52%|█████▏ | 6392/12313 [4:47:02<4:21:13, 2.65s/it] {'loss': 0.5527, 'grad_norm': 6.00791556685988, 'learning_rate': 2.46679105688735e-06, 'epoch': 0.52} 52%|█████▏ | 6392/12313 [4:47:02<4:21:13, 2.65s/it] 52%|█████▏ | 6393/12313 [4:47:05<4:25:03, 2.69s/it] {'loss': 0.4875, 'grad_norm': 3.547137678101548, 'learning_rate': 2.466133493891449e-06, 'epoch': 0.52} 52%|█████▏ | 6393/12313 [4:47:05<4:25:03, 2.69s/it] 52%|█████▏ | 6394/12313 [4:47:08<4:23:22, 2.67s/it] {'loss': 0.496, 'grad_norm': 8.66243792163446, 'learning_rate': 2.46547593323893e-06, 'epoch': 0.52} 52%|█████▏ | 6394/12313 [4:47:08<4:23:22, 2.67s/it] 52%|█████▏ | 6395/12313 [4:47:11<4:23:06, 2.67s/it] {'loss': 0.4364, 'grad_norm': 4.430513402849331, 'learning_rate': 2.464818374975295e-06, 'epoch': 0.52} 52%|█████▏ | 6395/12313 [4:47:11<4:23:06, 2.67s/it] 52%|█████▏ | 6396/12313 [4:47:13<4:22:06, 2.66s/it] {'loss': 0.6302, 'grad_norm': 3.7975317298710625, 'learning_rate': 2.4641608191460415e-06, 'epoch': 0.52} 52%|█████▏ | 6396/12313 [4:47:13<4:22:06, 2.66s/it] 52%|█████▏ | 6397/12313 [4:47:16<4:21:27, 2.65s/it] {'loss': 0.5352, 'grad_norm': 3.888846410063618, 'learning_rate': 2.46350326579667e-06, 'epoch': 0.52} 52%|█████▏ | 6397/12313 [4:47:16<4:21:27, 2.65s/it] 52%|█████▏ | 6398/12313 [4:47:18<4:17:17, 2.61s/it] {'loss': 0.4854, 'grad_norm': 4.412943996478983, 'learning_rate': 2.462845714972679e-06, 'epoch': 0.52} 52%|█████▏ | 6398/12313 [4:47:18<4:17:17, 2.61s/it] 52%|█████▏ | 6399/12313 [4:47:21<4:21:37, 2.65s/it] {'loss': 0.4375, 'grad_norm': 3.706708426188841, 'learning_rate': 2.4621881667195676e-06, 'epoch': 0.52} 52%|█████▏ | 6399/12313 [4:47:21<4:21:37, 2.65s/it] 52%|█████▏ | 6400/12313 [4:47:24<4:20:40, 2.65s/it] {'loss': 0.5516, 'grad_norm': 3.8449883186160787, 'learning_rate': 2.4615306210828357e-06, 'epoch': 0.52} 52%|█████▏ | 6400/12313 [4:47:24<4:20:40, 2.65s/it] 52%|█████▏ | 6401/12313 [4:47:26<4:19:13, 2.63s/it] {'loss': 0.5237, 'grad_norm': 5.909942632055917, 'learning_rate': 2.46087307810798e-06, 'epoch': 0.52} 52%|█████▏ | 6401/12313 [4:47:26<4:19:13, 2.63s/it] 52%|█████▏ | 6402/12313 [4:47:29<4:16:28, 2.60s/it] {'loss': 0.4175, 'grad_norm': 3.547279382037591, 'learning_rate': 2.460215537840501e-06, 'epoch': 0.52} 52%|█████▏ | 6402/12313 [4:47:29<4:16:28, 2.60s/it] 52%|█████▏ | 6403/12313 [4:47:32<4:18:27, 2.62s/it] {'loss': 0.6123, 'grad_norm': 4.882379724186628, 'learning_rate': 2.459558000325897e-06, 'epoch': 0.52} 52%|█████▏ | 6403/12313 [4:47:32<4:18:27, 2.62s/it] 52%|█████▏ | 6404/12313 [4:47:34<4:16:47, 2.61s/it] {'loss': 0.5215, 'grad_norm': 4.050921637738143, 'learning_rate': 2.458900465609664e-06, 'epoch': 0.52} 52%|█████▏ | 6404/12313 [4:47:34<4:16:47, 2.61s/it] 52%|█████▏ | 6405/12313 [4:47:37<4:18:09, 2.62s/it] {'loss': 0.4617, 'grad_norm': 3.382346078488962, 'learning_rate': 2.4582429337373018e-06, 'epoch': 0.52} 52%|█████▏ | 6405/12313 [4:47:37<4:18:09, 2.62s/it] 52%|█████▏ | 6406/12313 [4:47:39<4:17:05, 2.61s/it] {'loss': 0.4909, 'grad_norm': 7.257748442971193, 'learning_rate': 2.4575854047543082e-06, 'epoch': 0.52} 52%|█████▏ | 6406/12313 [4:47:39<4:17:05, 2.61s/it] 52%|█████▏ | 6407/12313 [4:47:42<4:17:12, 2.61s/it] {'loss': 0.4125, 'grad_norm': 6.459780012681047, 'learning_rate': 2.456927878706179e-06, 'epoch': 0.52} 52%|█████▏ | 6407/12313 [4:47:42<4:17:12, 2.61s/it] 52%|█████▏ | 6408/12313 [4:47:45<4:23:39, 2.68s/it] {'loss': 0.5746, 'grad_norm': 3.8936939687249863, 'learning_rate': 2.4562703556384136e-06, 'epoch': 0.52} 52%|█████▏ | 6408/12313 [4:47:45<4:23:39, 2.68s/it] 52%|█████▏ | 6409/12313 [4:47:47<4:19:42, 2.64s/it] {'loss': 0.4194, 'grad_norm': 5.650603779541892, 'learning_rate': 2.4556128355965076e-06, 'epoch': 0.52} 52%|█████▏ | 6409/12313 [4:47:47<4:19:42, 2.64s/it] 52%|█████▏ | 6410/12313 [4:47:50<4:20:31, 2.65s/it] {'loss': 0.5296, 'grad_norm': 6.233819656387275, 'learning_rate': 2.454955318625958e-06, 'epoch': 0.52} 52%|█████▏ | 6410/12313 [4:47:50<4:20:31, 2.65s/it] 52%|█████▏ | 6411/12313 [4:47:52<4:14:02, 2.58s/it] {'loss': 0.5687, 'grad_norm': 5.175873926721092, 'learning_rate': 2.4542978047722633e-06, 'epoch': 0.52} 52%|█████▏ | 6411/12313 [4:47:52<4:14:02, 2.58s/it] 52%|█████▏ | 6412/12313 [4:47:55<4:14:33, 2.59s/it] {'loss': 0.7367, 'grad_norm': 7.441015494915213, 'learning_rate': 2.453640294080918e-06, 'epoch': 0.52} 52%|█████▏ | 6412/12313 [4:47:55<4:14:33, 2.59s/it] 52%|█████▏ | 6413/12313 [4:47:58<4:12:51, 2.57s/it] {'loss': 0.5059, 'grad_norm': 7.282413948152037, 'learning_rate': 2.452982786597419e-06, 'epoch': 0.52} 52%|█████▏ | 6413/12313 [4:47:58<4:12:51, 2.57s/it] 52%|█████▏ | 6414/12313 [4:48:00<4:14:28, 2.59s/it] {'loss': 0.4086, 'grad_norm': 5.063377741570186, 'learning_rate': 2.452325282367262e-06, 'epoch': 0.52} 52%|█████▏ | 6414/12313 [4:48:00<4:14:28, 2.59s/it] 52%|█████▏ | 6415/12313 [4:48:03<4:16:16, 2.61s/it] {'loss': 0.5554, 'grad_norm': 4.377231675145512, 'learning_rate': 2.4516677814359434e-06, 'epoch': 0.52} 52%|█████▏ | 6415/12313 [4:48:03<4:16:16, 2.61s/it] 52%|█████▏ | 6416/12313 [4:48:06<4:23:47, 2.68s/it] {'loss': 0.5666, 'grad_norm': 5.111222932344912, 'learning_rate': 2.4510102838489587e-06, 'epoch': 0.52} 52%|█████▏ | 6416/12313 [4:48:06<4:23:47, 2.68s/it] 52%|█████▏ | 6417/12313 [4:48:09<4:27:14, 2.72s/it] {'loss': 0.5856, 'grad_norm': 2.6572969128420363, 'learning_rate': 2.4503527896518025e-06, 'epoch': 0.52} 52%|█████▏ | 6417/12313 [4:48:09<4:27:14, 2.72s/it] 52%|█████▏ | 6418/12313 [4:48:12<4:35:41, 2.81s/it] {'loss': 0.5549, 'grad_norm': 3.467977660333967, 'learning_rate': 2.449695298889971e-06, 'epoch': 0.52} 52%|█████▏ | 6418/12313 [4:48:12<4:35:41, 2.81s/it] 52%|█████▏ | 6419/12313 [4:48:14<4:27:48, 2.73s/it] {'loss': 0.5114, 'grad_norm': 3.9586980008450547, 'learning_rate': 2.449037811608959e-06, 'epoch': 0.52} 52%|█████▏ | 6419/12313 [4:48:14<4:27:48, 2.73s/it] 52%|█████▏ | 6420/12313 [4:48:16<4:18:04, 2.63s/it] {'loss': 0.4881, 'grad_norm': 6.6024044152724795, 'learning_rate': 2.4483803278542594e-06, 'epoch': 0.52} 52%|█████▏ | 6420/12313 [4:48:16<4:18:04, 2.63s/it] 52%|█████▏ | 6421/12313 [4:48:20<4:30:58, 2.76s/it] {'loss': 0.4957, 'grad_norm': 3.872306998345536, 'learning_rate': 2.447722847671369e-06, 'epoch': 0.52} 52%|█████▏ | 6421/12313 [4:48:20<4:30:58, 2.76s/it] 52%|█████▏ | 6422/12313 [4:48:22<4:27:42, 2.73s/it] {'loss': 0.5453, 'grad_norm': 4.928256043695277, 'learning_rate': 2.4470653711057805e-06, 'epoch': 0.52} 52%|█████▏ | 6422/12313 [4:48:22<4:27:42, 2.73s/it] 52%|█████▏ | 6423/12313 [4:48:25<4:23:40, 2.69s/it] {'loss': 0.4302, 'grad_norm': 5.00049623611133, 'learning_rate': 2.446407898202988e-06, 'epoch': 0.52} 52%|█████▏ | 6423/12313 [4:48:25<4:23:40, 2.69s/it] 52%|█████▏ | 6424/12313 [4:48:27<4:23:03, 2.68s/it] {'loss': 0.4867, 'grad_norm': 6.210683915021455, 'learning_rate': 2.445750429008486e-06, 'epoch': 0.52} 52%|█████▏ | 6424/12313 [4:48:27<4:23:03, 2.68s/it] 52%|█████▏ | 6425/12313 [4:48:30<4:21:21, 2.66s/it] {'loss': 0.5342, 'grad_norm': 4.733137071274775, 'learning_rate': 2.4450929635677667e-06, 'epoch': 0.52} 52%|█████▏ | 6425/12313 [4:48:30<4:21:21, 2.66s/it] 52%|█████▏ | 6426/12313 [4:48:33<4:21:58, 2.67s/it] {'loss': 0.5391, 'grad_norm': 3.573213441447143, 'learning_rate': 2.4444355019263235e-06, 'epoch': 0.52} 52%|█████▏ | 6426/12313 [4:48:33<4:21:58, 2.67s/it] 52%|█████▏ | 6427/12313 [4:48:35<4:21:50, 2.67s/it] {'loss': 0.5641, 'grad_norm': 3.9218556675383685, 'learning_rate': 2.44377804412965e-06, 'epoch': 0.52} 52%|█████▏ | 6427/12313 [4:48:35<4:21:50, 2.67s/it] 52%|█████▏ | 6428/12313 [4:48:38<4:28:33, 2.74s/it] {'loss': 0.5727, 'grad_norm': 3.72842203066307, 'learning_rate': 2.443120590223238e-06, 'epoch': 0.52} 52%|█████▏ | 6428/12313 [4:48:38<4:28:33, 2.74s/it] 52%|█████▏ | 6429/12313 [4:48:41<4:26:16, 2.72s/it] {'loss': 0.5027, 'grad_norm': 4.5179452250246355, 'learning_rate': 2.4424631402525797e-06, 'epoch': 0.52} 52%|█████▏ | 6429/12313 [4:48:41<4:26:16, 2.72s/it] 52%|█████▏ | 6430/12313 [4:48:44<4:23:20, 2.69s/it] {'loss': 0.4627, 'grad_norm': 7.296740350963874, 'learning_rate': 2.4418056942631687e-06, 'epoch': 0.52} 52%|█████▏ | 6430/12313 [4:48:44<4:23:20, 2.69s/it] 52%|█████▏ | 6431/12313 [4:48:46<4:20:05, 2.65s/it] {'loss': 0.4634, 'grad_norm': 4.733022729564796, 'learning_rate': 2.4411482523004946e-06, 'epoch': 0.52} 52%|█████▏ | 6431/12313 [4:48:46<4:20:05, 2.65s/it] 52%|█████▏ | 6432/12313 [4:48:49<4:18:11, 2.63s/it] {'loss': 0.3586, 'grad_norm': 5.089595564180344, 'learning_rate': 2.4404908144100513e-06, 'epoch': 0.52} 52%|█████▏ | 6432/12313 [4:48:49<4:18:11, 2.63s/it] 52%|█████▏ | 6433/12313 [4:48:51<4:13:34, 2.59s/it] {'loss': 0.465, 'grad_norm': 5.358613043214558, 'learning_rate': 2.4398333806373274e-06, 'epoch': 0.52} 52%|█████▏ | 6433/12313 [4:48:51<4:13:34, 2.59s/it] 52%|█████▏ | 6434/12313 [4:48:54<4:13:43, 2.59s/it] {'loss': 0.5472, 'grad_norm': 5.524190371189003, 'learning_rate': 2.4391759510278153e-06, 'epoch': 0.52} 52%|█████▏ | 6434/12313 [4:48:54<4:13:43, 2.59s/it] 52%|█████▏ | 6435/12313 [4:48:57<4:24:34, 2.70s/it] {'loss': 0.5572, 'grad_norm': 4.103822972000649, 'learning_rate': 2.438518525627006e-06, 'epoch': 0.52} 52%|█████▏ | 6435/12313 [4:48:57<4:24:34, 2.70s/it] 52%|█████▏ | 6436/12313 [4:48:59<4:22:29, 2.68s/it] {'loss': 0.4209, 'grad_norm': 5.959127088930798, 'learning_rate': 2.4378611044803887e-06, 'epoch': 0.52} 52%|█████▏ | 6436/12313 [4:48:59<4:22:29, 2.68s/it] 52%|█████▏ | 6437/12313 [4:49:02<4:18:28, 2.64s/it] {'loss': 0.3897, 'grad_norm': 5.31529709254031, 'learning_rate': 2.437203687633455e-06, 'epoch': 0.52} 52%|█████▏ | 6437/12313 [4:49:02<4:18:28, 2.64s/it] 52%|█████▏ | 6438/12313 [4:49:04<4:14:09, 2.60s/it] {'loss': 0.4428, 'grad_norm': 3.7128349527804656, 'learning_rate': 2.436546275131693e-06, 'epoch': 0.52} 52%|█████▏ | 6438/12313 [4:49:04<4:14:09, 2.60s/it] 52%|█████▏ | 6439/12313 [4:49:07<4:19:43, 2.65s/it] {'loss': 0.575, 'grad_norm': 3.784355733006889, 'learning_rate': 2.435888867020593e-06, 'epoch': 0.52} 52%|█████▏ | 6439/12313 [4:49:07<4:19:43, 2.65s/it] 52%|█████▏ | 6440/12313 [4:49:10<4:19:55, 2.66s/it] {'loss': 0.5123, 'grad_norm': 4.907099811863354, 'learning_rate': 2.435231463345645e-06, 'epoch': 0.52} 52%|█████▏ | 6440/12313 [4:49:10<4:19:55, 2.66s/it] 52%|█████▏ | 6441/12313 [4:49:13<4:29:23, 2.75s/it] {'loss': 0.6211, 'grad_norm': 3.094941647382156, 'learning_rate': 2.4345740641523362e-06, 'epoch': 0.52} 52%|█████▏ | 6441/12313 [4:49:13<4:29:23, 2.75s/it] 52%|█████▏ | 6442/12313 [4:49:16<4:26:58, 2.73s/it] {'loss': 0.4124, 'grad_norm': 5.697788871531617, 'learning_rate': 2.4339166694861553e-06, 'epoch': 0.52} 52%|█████▏ | 6442/12313 [4:49:16<4:26:58, 2.73s/it] 52%|█████▏ | 6443/12313 [4:49:18<4:27:36, 2.74s/it] {'loss': 0.4934, 'grad_norm': 5.825537615998602, 'learning_rate': 2.433259279392592e-06, 'epoch': 0.52} 52%|█████▏ | 6443/12313 [4:49:18<4:27:36, 2.74s/it] 52%|█████▏ | 6444/12313 [4:49:21<4:24:33, 2.70s/it] {'loss': 0.5467, 'grad_norm': 4.894893357625769, 'learning_rate': 2.432601893917133e-06, 'epoch': 0.52} 52%|█████▏ | 6444/12313 [4:49:21<4:24:33, 2.70s/it] 52%|█████▏ | 6445/12313 [4:49:24<4:29:28, 2.76s/it] {'loss': 0.5173, 'grad_norm': 7.012722371867207, 'learning_rate': 2.431944513105266e-06, 'epoch': 0.52} 52%|█████▏ | 6445/12313 [4:49:24<4:29:28, 2.76s/it] 52%|█████▏ | 6446/12313 [4:49:26<4:21:30, 2.67s/it] {'loss': 0.5101, 'grad_norm': 3.8914614811391144, 'learning_rate': 2.4312871370024794e-06, 'epoch': 0.52} 52%|█████▏ | 6446/12313 [4:49:26<4:21:30, 2.67s/it] 52%|█████▏ | 6447/12313 [4:49:29<4:32:48, 2.79s/it] {'loss': 0.4239, 'grad_norm': 4.15063424291914, 'learning_rate': 2.4306297656542584e-06, 'epoch': 0.52} 52%|█████▏ | 6447/12313 [4:49:29<4:32:48, 2.79s/it] 52%|█████▏ | 6448/12313 [4:49:32<4:24:12, 2.70s/it] {'loss': 0.5355, 'grad_norm': 26.114252256627633, 'learning_rate': 2.4299723991060904e-06, 'epoch': 0.52} 52%|█████▏ | 6448/12313 [4:49:32<4:24:12, 2.70s/it] 52%|█████▏ | 6449/12313 [4:49:35<4:25:23, 2.72s/it] {'loss': 0.3273, 'grad_norm': 5.921632572418434, 'learning_rate': 2.4293150374034625e-06, 'epoch': 0.52} 52%|█████▏ | 6449/12313 [4:49:35<4:25:23, 2.72s/it] 52%|█████▏ | 6450/12313 [4:49:37<4:25:04, 2.71s/it] {'loss': 0.4013, 'grad_norm': 5.16163949915224, 'learning_rate': 2.4286576805918604e-06, 'epoch': 0.52} 52%|█████▏ | 6450/12313 [4:49:37<4:25:04, 2.71s/it] 52%|█████▏ | 6451/12313 [4:49:40<4:19:42, 2.66s/it] {'loss': 0.4692, 'grad_norm': 3.8705275409101643, 'learning_rate': 2.4280003287167684e-06, 'epoch': 0.52} 52%|█████▏ | 6451/12313 [4:49:40<4:19:42, 2.66s/it] 52%|█████▏ | 6452/12313 [4:49:43<4:24:23, 2.71s/it] {'loss': 0.4827, 'grad_norm': 4.713434827452213, 'learning_rate': 2.427342981823672e-06, 'epoch': 0.52} 52%|█████▏ | 6452/12313 [4:49:43<4:24:23, 2.71s/it] 52%|█████▏ | 6453/12313 [4:49:45<4:23:23, 2.70s/it] {'loss': 0.597, 'grad_norm': 9.847453330969838, 'learning_rate': 2.426685639958058e-06, 'epoch': 0.52} 52%|█████▏ | 6453/12313 [4:49:45<4:23:23, 2.70s/it] 52%|█████▏ | 6454/12313 [4:49:48<4:23:56, 2.70s/it] {'loss': 0.7119, 'grad_norm': 14.084536911041138, 'learning_rate': 2.426028303165409e-06, 'epoch': 0.52} 52%|█████▏ | 6454/12313 [4:49:48<4:23:56, 2.70s/it] 52%|█████▏ | 6455/12313 [4:49:51<4:28:41, 2.75s/it] {'loss': 0.5353, 'grad_norm': 4.162924811617758, 'learning_rate': 2.425370971491209e-06, 'epoch': 0.52} 52%|█████▏ | 6455/12313 [4:49:51<4:28:41, 2.75s/it] 52%|█████▏ | 6456/12313 [4:49:54<4:28:46, 2.75s/it] {'loss': 0.4668, 'grad_norm': 5.310608964267923, 'learning_rate': 2.424713644980945e-06, 'epoch': 0.52} 52%|█████▏ | 6456/12313 [4:49:54<4:28:46, 2.75s/it] 52%|█████▏ | 6457/12313 [4:49:56<4:19:17, 2.66s/it] {'loss': 0.5376, 'grad_norm': 3.485581874385866, 'learning_rate': 2.424056323680097e-06, 'epoch': 0.52} 52%|█████▏ | 6457/12313 [4:49:56<4:19:17, 2.66s/it] 52%|█████▏ | 6458/12313 [4:49:59<4:19:29, 2.66s/it] {'loss': 0.5362, 'grad_norm': 4.466249110985916, 'learning_rate': 2.423399007634149e-06, 'epoch': 0.52} 52%|█████▏ | 6458/12313 [4:49:59<4:19:29, 2.66s/it] 52%|█████▏ | 6459/12313 [4:50:01<4:14:27, 2.61s/it] {'loss': 0.4823, 'grad_norm': 15.507085752685619, 'learning_rate': 2.422741696888585e-06, 'epoch': 0.52} 52%|█████▏ | 6459/12313 [4:50:01<4:14:27, 2.61s/it] 52%|█████▏ | 6460/12313 [4:50:04<4:13:51, 2.60s/it] {'loss': 0.5139, 'grad_norm': 3.42283227124507, 'learning_rate': 2.4220843914888865e-06, 'epoch': 0.52} 52%|█████▏ | 6460/12313 [4:50:04<4:13:51, 2.60s/it] 52%|█████▏ | 6461/12313 [4:50:07<4:18:13, 2.65s/it] {'loss': 0.6101, 'grad_norm': 4.471153348445109, 'learning_rate': 2.4214270914805353e-06, 'epoch': 0.52} 52%|█████▏ | 6461/12313 [4:50:07<4:18:13, 2.65s/it] 52%|█████▏ | 6462/12313 [4:50:09<4:15:53, 2.62s/it] {'loss': 0.5725, 'grad_norm': 4.937962838447406, 'learning_rate': 2.4207697969090145e-06, 'epoch': 0.52} 52%|█████▏ | 6462/12313 [4:50:09<4:15:53, 2.62s/it] 52%|█████▏ | 6463/12313 [4:50:12<4:14:37, 2.61s/it] {'loss': 0.6678, 'grad_norm': 3.4301025114527355, 'learning_rate': 2.420112507819804e-06, 'epoch': 0.52} 52%|█████▏ | 6463/12313 [4:50:12<4:14:37, 2.61s/it] 52%|█████▏ | 6464/12313 [4:50:14<4:14:24, 2.61s/it] {'loss': 0.4788, 'grad_norm': 9.810088340742176, 'learning_rate': 2.4194552242583845e-06, 'epoch': 0.52} 52%|█████▏ | 6464/12313 [4:50:14<4:14:24, 2.61s/it] 53%|█████▎ | 6465/12313 [4:50:17<4:15:02, 2.62s/it] {'loss': 0.588, 'grad_norm': 5.247524236071172, 'learning_rate': 2.4187979462702382e-06, 'epoch': 0.53} 53%|█████▎ | 6465/12313 [4:50:17<4:15:02, 2.62s/it] 53%|█████▎ | 6466/12313 [4:50:20<4:21:37, 2.68s/it] {'loss': 0.5239, 'grad_norm': 3.8501987558213133, 'learning_rate': 2.4181406739008443e-06, 'epoch': 0.53} 53%|█████▎ | 6466/12313 [4:50:20<4:21:37, 2.68s/it] 53%|█████▎ | 6467/12313 [4:50:22<4:18:19, 2.65s/it] {'loss': 0.4764, 'grad_norm': 4.286565115822655, 'learning_rate': 2.417483407195682e-06, 'epoch': 0.53} 53%|█████▎ | 6467/12313 [4:50:22<4:18:19, 2.65s/it] 53%|█████▎ | 6468/12313 [4:50:25<4:18:30, 2.65s/it] {'loss': 0.5093, 'grad_norm': 2.9196698091729347, 'learning_rate': 2.416826146200231e-06, 'epoch': 0.53} 53%|█████▎ | 6468/12313 [4:50:25<4:18:30, 2.65s/it] 53%|█████▎ | 6469/12313 [4:50:28<4:28:22, 2.76s/it] {'loss': 0.6884, 'grad_norm': 4.6923054622476, 'learning_rate': 2.4161688909599715e-06, 'epoch': 0.53} 53%|█████▎ | 6469/12313 [4:50:28<4:28:22, 2.76s/it] 53%|█████▎ | 6470/12313 [4:50:31<4:23:19, 2.70s/it] {'loss': 0.5132, 'grad_norm': 3.966646784959062, 'learning_rate': 2.4155116415203804e-06, 'epoch': 0.53} 53%|█████▎ | 6470/12313 [4:50:31<4:23:19, 2.70s/it] 53%|█████▎ | 6471/12313 [4:50:33<4:25:55, 2.73s/it] {'loss': 0.5949, 'grad_norm': 3.6916733502842956, 'learning_rate': 2.4148543979269357e-06, 'epoch': 0.53} 53%|█████▎ | 6471/12313 [4:50:33<4:25:55, 2.73s/it] 53%|█████▎ | 6472/12313 [4:50:36<4:24:05, 2.71s/it] {'loss': 0.5262, 'grad_norm': 4.986568652074796, 'learning_rate': 2.4141971602251176e-06, 'epoch': 0.53} 53%|█████▎ | 6472/12313 [4:50:36<4:24:05, 2.71s/it] 53%|█████▎ | 6473/12313 [4:50:39<4:24:41, 2.72s/it] {'loss': 0.3962, 'grad_norm': 7.761776773353264, 'learning_rate': 2.4135399284604012e-06, 'epoch': 0.53} 53%|█████▎ | 6473/12313 [4:50:39<4:24:41, 2.72s/it] 53%|█████▎ | 6474/12313 [4:50:42<4:27:14, 2.75s/it] {'loss': 0.655, 'grad_norm': 28.513128146924068, 'learning_rate': 2.4128827026782633e-06, 'epoch': 0.53} 53%|█████▎ | 6474/12313 [4:50:42<4:27:14, 2.75s/it] 53%|█████▎ | 6475/12313 [4:50:45<4:31:24, 2.79s/it] {'loss': 0.6157, 'grad_norm': 5.030638393005198, 'learning_rate': 2.4122254829241827e-06, 'epoch': 0.53} 53%|█████▎ | 6475/12313 [4:50:45<4:31:24, 2.79s/it] 53%|█████▎ | 6476/12313 [4:50:47<4:27:36, 2.75s/it] {'loss': 0.4513, 'grad_norm': 3.759316711837708, 'learning_rate': 2.4115682692436337e-06, 'epoch': 0.53} 53%|█████▎ | 6476/12313 [4:50:47<4:27:36, 2.75s/it] 53%|█████▎ | 6477/12313 [4:50:50<4:25:05, 2.73s/it] {'loss': 0.427, 'grad_norm': 5.1999339488745395, 'learning_rate': 2.4109110616820918e-06, 'epoch': 0.53} 53%|█████▎ | 6477/12313 [4:50:50<4:25:05, 2.73s/it] 53%|█████▎ | 6478/12313 [4:50:52<4:21:30, 2.69s/it] {'loss': 0.4293, 'grad_norm': 9.081659712327916, 'learning_rate': 2.4102538602850337e-06, 'epoch': 0.53} 53%|█████▎ | 6478/12313 [4:50:52<4:21:30, 2.69s/it] 53%|█████▎ | 6479/12313 [4:50:55<4:15:03, 2.62s/it] {'loss': 0.5247, 'grad_norm': 3.9469459629155574, 'learning_rate': 2.4095966650979342e-06, 'epoch': 0.53} 53%|█████▎ | 6479/12313 [4:50:55<4:15:03, 2.62s/it] 53%|█████▎ | 6480/12313 [4:50:58<4:23:26, 2.71s/it] {'loss': 0.5248, 'grad_norm': 4.830126815254179, 'learning_rate': 2.4089394761662653e-06, 'epoch': 0.53} 53%|█████▎ | 6480/12313 [4:50:58<4:23:26, 2.71s/it] 53%|█████▎ | 6481/12313 [4:51:01<4:23:46, 2.71s/it] {'loss': 0.4429, 'grad_norm': 5.427859891508367, 'learning_rate': 2.4082822935355035e-06, 'epoch': 0.53} 53%|█████▎ | 6481/12313 [4:51:01<4:23:46, 2.71s/it] 53%|█████▎ | 6482/12313 [4:51:03<4:18:59, 2.66s/it] {'loss': 0.4578, 'grad_norm': 5.032571069469282, 'learning_rate': 2.4076251172511224e-06, 'epoch': 0.53} 53%|█████▎ | 6482/12313 [4:51:03<4:18:59, 2.66s/it] 53%|█████▎ | 6483/12313 [4:51:07<4:48:20, 2.97s/it] {'loss': 0.5519, 'grad_norm': 3.8795340568732364, 'learning_rate': 2.4069679473585925e-06, 'epoch': 0.53} 53%|█████▎ | 6483/12313 [4:51:07<4:48:20, 2.97s/it] 53%|█████▎ | 6484/12313 [4:51:10<4:41:34, 2.90s/it] {'loss': 0.5532, 'grad_norm': 3.9342593469573877, 'learning_rate': 2.4063107839033894e-06, 'epoch': 0.53} 53%|█████▎ | 6484/12313 [4:51:10<4:41:34, 2.90s/it] 53%|█████▎ | 6485/12313 [4:51:12<4:30:57, 2.79s/it] {'loss': 0.5749, 'grad_norm': 6.08364360680254, 'learning_rate': 2.4056536269309847e-06, 'epoch': 0.53} 53%|█████▎ | 6485/12313 [4:51:12<4:30:57, 2.79s/it] 53%|█████▎ | 6486/12313 [4:51:15<4:26:02, 2.74s/it] {'loss': 0.3915, 'grad_norm': 6.140920876184235, 'learning_rate': 2.4049964764868493e-06, 'epoch': 0.53} 53%|█████▎ | 6486/12313 [4:51:15<4:26:02, 2.74s/it] 53%|█████▎ | 6487/12313 [4:51:18<4:48:14, 2.97s/it] {'loss': 0.4485, 'grad_norm': 4.532054993744361, 'learning_rate': 2.4043393326164536e-06, 'epoch': 0.53} 53%|█████▎ | 6487/12313 [4:51:18<4:48:14, 2.97s/it] 53%|█████▎ | 6488/12313 [4:51:21<4:41:46, 2.90s/it] {'loss': 0.4327, 'grad_norm': 4.418973334061362, 'learning_rate': 2.403682195365272e-06, 'epoch': 0.53} 53%|█████▎ | 6488/12313 [4:51:21<4:41:46, 2.90s/it] 53%|█████▎ | 6489/12313 [4:51:24<4:33:48, 2.82s/it] {'loss': 0.545, 'grad_norm': 4.167489894614658, 'learning_rate': 2.4030250647787714e-06, 'epoch': 0.53} 53%|█████▎ | 6489/12313 [4:51:24<4:33:48, 2.82s/it] 53%|█████▎ | 6490/12313 [4:51:26<4:28:44, 2.77s/it] {'loss': 0.5363, 'grad_norm': 5.917681962851331, 'learning_rate': 2.402367940902423e-06, 'epoch': 0.53} 53%|█████▎ | 6490/12313 [4:51:26<4:28:44, 2.77s/it] 53%|█████▎ | 6491/12313 [4:51:29<4:34:25, 2.83s/it] {'loss': 0.3902, 'grad_norm': 10.913996246318028, 'learning_rate': 2.401710823781697e-06, 'epoch': 0.53} 53%|█████▎ | 6491/12313 [4:51:29<4:34:25, 2.83s/it] 53%|█████▎ | 6492/12313 [4:51:32<4:23:56, 2.72s/it] {'loss': 0.4407, 'grad_norm': 25.430239522056482, 'learning_rate': 2.4010537134620614e-06, 'epoch': 0.53} 53%|█████▎ | 6492/12313 [4:51:32<4:23:56, 2.72s/it] 53%|█████▎ | 6493/12313 [4:51:34<4:23:56, 2.72s/it] {'loss': 0.5414, 'grad_norm': 9.226096785828776, 'learning_rate': 2.400396609988985e-06, 'epoch': 0.53} 53%|█████▎ | 6493/12313 [4:51:34<4:23:56, 2.72s/it] 53%|█████▎ | 6494/12313 [4:51:37<4:24:00, 2.72s/it] {'loss': 0.5125, 'grad_norm': 4.553011519720875, 'learning_rate': 2.3997395134079367e-06, 'epoch': 0.53} 53%|█████▎ | 6494/12313 [4:51:37<4:24:00, 2.72s/it] 53%|█████▎ | 6495/12313 [4:51:40<4:31:56, 2.80s/it] {'loss': 0.5804, 'grad_norm': 3.85392516623142, 'learning_rate': 2.399082423764383e-06, 'epoch': 0.53} 53%|█████▎ | 6495/12313 [4:51:40<4:31:56, 2.80s/it] 53%|█████▎ | 6496/12313 [4:51:43<4:22:15, 2.71s/it] {'loss': 0.4332, 'grad_norm': 6.43430352311751, 'learning_rate': 2.3984253411037913e-06, 'epoch': 0.53} 53%|█████▎ | 6496/12313 [4:51:43<4:22:15, 2.71s/it] 53%|█████▎ | 6497/12313 [4:51:45<4:21:09, 2.69s/it] {'loss': 0.4464, 'grad_norm': 5.348598603512892, 'learning_rate': 2.397768265471629e-06, 'epoch': 0.53} 53%|█████▎ | 6497/12313 [4:51:45<4:21:09, 2.69s/it] 53%|█████▎ | 6498/12313 [4:51:48<4:17:04, 2.65s/it] {'loss': 0.5333, 'grad_norm': 5.503883803363463, 'learning_rate': 2.397111196913362e-06, 'epoch': 0.53} 53%|█████▎ | 6498/12313 [4:51:48<4:17:04, 2.65s/it] 53%|█████▎ | 6499/12313 [4:51:50<4:17:52, 2.66s/it] {'loss': 0.5963, 'grad_norm': 5.084027665154841, 'learning_rate': 2.396454135474454e-06, 'epoch': 0.53} 53%|█████▎ | 6499/12313 [4:51:51<4:17:52, 2.66s/it] 53%|█████▎ | 6500/12313 [4:51:53<4:15:54, 2.64s/it] {'loss': 0.6582, 'grad_norm': 4.104843883398135, 'learning_rate': 2.3957970812003727e-06, 'epoch': 0.53} 53%|█████▎ | 6500/12313 [4:51:53<4:15:54, 2.64s/it] 53%|█████▎ | 6501/12313 [4:51:56<4:11:16, 2.59s/it] {'loss': 0.5699, 'grad_norm': 6.334197801454462, 'learning_rate': 2.3951400341365827e-06, 'epoch': 0.53} 53%|█████▎ | 6501/12313 [4:51:56<4:11:16, 2.59s/it] 53%|█████▎ | 6502/12313 [4:51:58<4:13:34, 2.62s/it] {'loss': 0.4565, 'grad_norm': 6.9437969081416435, 'learning_rate': 2.394482994328546e-06, 'epoch': 0.53} 53%|█████▎ | 6502/12313 [4:51:58<4:13:34, 2.62s/it] 53%|█████▎ | 6503/12313 [4:52:01<4:12:23, 2.61s/it] {'loss': 0.705, 'grad_norm': 3.941363956260876, 'learning_rate': 2.393825961821728e-06, 'epoch': 0.53} 53%|█████▎ | 6503/12313 [4:52:01<4:12:23, 2.61s/it] 53%|█████▎ | 6504/12313 [4:52:03<4:12:44, 2.61s/it] {'loss': 0.488, 'grad_norm': 3.7273474058501974, 'learning_rate': 2.3931689366615926e-06, 'epoch': 0.53} 53%|█████▎ | 6504/12313 [4:52:03<4:12:44, 2.61s/it] 53%|█████▎ | 6505/12313 [4:52:06<4:13:46, 2.62s/it] {'loss': 0.5142, 'grad_norm': 5.084976289181063, 'learning_rate': 2.392511918893601e-06, 'epoch': 0.53} 53%|█████▎ | 6505/12313 [4:52:06<4:13:46, 2.62s/it] 53%|█████▎ | 6506/12313 [4:52:09<4:08:52, 2.57s/it] {'loss': 0.4072, 'grad_norm': 6.4658935525713295, 'learning_rate': 2.3918549085632145e-06, 'epoch': 0.53} 53%|█████▎ | 6506/12313 [4:52:09<4:08:52, 2.57s/it] 53%|█████▎ | 6507/12313 [4:52:11<4:06:12, 2.54s/it] {'loss': 0.4404, 'grad_norm': 3.651970957929354, 'learning_rate': 2.3911979057158974e-06, 'epoch': 0.53} 53%|█████▎ | 6507/12313 [4:52:11<4:06:12, 2.54s/it] 53%|█████▎ | 6508/12313 [4:52:14<4:12:49, 2.61s/it] {'loss': 0.5651, 'grad_norm': 3.5446244109539045, 'learning_rate': 2.3905409103971096e-06, 'epoch': 0.53} 53%|█████▎ | 6508/12313 [4:52:14<4:12:49, 2.61s/it] 53%|█████▎ | 6509/12313 [4:52:16<4:12:33, 2.61s/it] {'loss': 0.4663, 'grad_norm': 3.9057802685762217, 'learning_rate': 2.38988392265231e-06, 'epoch': 0.53} 53%|█████▎ | 6509/12313 [4:52:16<4:12:33, 2.61s/it] 53%|█████▎ | 6510/12313 [4:52:19<4:08:49, 2.57s/it] {'loss': 0.5126, 'grad_norm': 2.871351243716771, 'learning_rate': 2.389226942526961e-06, 'epoch': 0.53} 53%|█████▎ | 6510/12313 [4:52:19<4:08:49, 2.57s/it] 53%|█████▎ | 6511/12313 [4:52:22<4:18:34, 2.67s/it] {'loss': 0.463, 'grad_norm': 6.264655385586473, 'learning_rate': 2.3885699700665217e-06, 'epoch': 0.53} 53%|█████▎ | 6511/12313 [4:52:22<4:18:34, 2.67s/it] 53%|█████▎ | 6512/12313 [4:52:24<4:15:38, 2.64s/it] {'loss': 0.3438, 'grad_norm': 6.176994725304146, 'learning_rate': 2.3879130053164495e-06, 'epoch': 0.53} 53%|█████▎ | 6512/12313 [4:52:24<4:15:38, 2.64s/it] 53%|█████▎ | 6513/12313 [4:52:27<4:13:25, 2.62s/it] {'loss': 0.6078, 'grad_norm': 3.841956377687345, 'learning_rate': 2.3872560483222048e-06, 'epoch': 0.53} 53%|█████▎ | 6513/12313 [4:52:27<4:13:25, 2.62s/it] 53%|█████▎ | 6514/12313 [4:52:30<4:14:18, 2.63s/it] {'loss': 0.49, 'grad_norm': 3.946093406576217, 'learning_rate': 2.3865990991292458e-06, 'epoch': 0.53} 53%|█████▎ | 6514/12313 [4:52:30<4:14:18, 2.63s/it] 53%|█████▎ | 6515/12313 [4:52:32<4:08:23, 2.57s/it] {'loss': 0.5979, 'grad_norm': 5.815463605481333, 'learning_rate': 2.3859421577830276e-06, 'epoch': 0.53} 53%|█████▎ | 6515/12313 [4:52:32<4:08:23, 2.57s/it] 53%|█████▎ | 6516/12313 [4:52:35<4:10:10, 2.59s/it] {'loss': 0.4522, 'grad_norm': 7.2849496983935875, 'learning_rate': 2.385285224329009e-06, 'epoch': 0.53} 53%|█████▎ | 6516/12313 [4:52:35<4:10:10, 2.59s/it] 53%|█████▎ | 6517/12313 [4:52:37<4:09:53, 2.59s/it] {'loss': 0.4208, 'grad_norm': 6.703027017567743, 'learning_rate': 2.384628298812646e-06, 'epoch': 0.53} 53%|█████▎ | 6517/12313 [4:52:37<4:09:53, 2.59s/it] 53%|█████▎ | 6518/12313 [4:52:40<4:08:02, 2.57s/it] {'loss': 0.649, 'grad_norm': 4.068730154097392, 'learning_rate': 2.383971381279393e-06, 'epoch': 0.53} 53%|█████▎ | 6518/12313 [4:52:40<4:08:02, 2.57s/it] 53%|█████▎ | 6519/12313 [4:52:43<4:19:32, 2.69s/it] {'loss': 0.7164, 'grad_norm': 3.9699040583000667, 'learning_rate': 2.383314471774707e-06, 'epoch': 0.53} 53%|█████▎ | 6519/12313 [4:52:43<4:19:32, 2.69s/it] 53%|█████▎ | 6520/12313 [4:52:45<4:17:31, 2.67s/it] {'loss': 0.5432, 'grad_norm': 11.734590811153446, 'learning_rate': 2.382657570344043e-06, 'epoch': 0.53} 53%|█████▎ | 6520/12313 [4:52:45<4:17:31, 2.67s/it] 53%|█████▎ | 6521/12313 [4:52:48<4:13:57, 2.63s/it] {'loss': 0.5122, 'grad_norm': 5.270660282246951, 'learning_rate': 2.382000677032854e-06, 'epoch': 0.53} 53%|█████▎ | 6521/12313 [4:52:48<4:13:57, 2.63s/it] 53%|█████▎ | 6522/12313 [4:52:51<4:18:30, 2.68s/it] {'loss': 0.5513, 'grad_norm': 4.145643825845797, 'learning_rate': 2.3813437918865925e-06, 'epoch': 0.53} 53%|█████▎ | 6522/12313 [4:52:51<4:18:30, 2.68s/it] 53%|█████▎ | 6523/12313 [4:52:54<4:23:19, 2.73s/it] {'loss': 0.5945, 'grad_norm': 5.695234292809091, 'learning_rate': 2.380686914950713e-06, 'epoch': 0.53} 53%|█████▎ | 6523/12313 [4:52:54<4:23:19, 2.73s/it] 53%|█████▎ | 6524/12313 [4:52:56<4:17:02, 2.66s/it] {'loss': 0.5533, 'grad_norm': 5.187560295102548, 'learning_rate': 2.380030046270668e-06, 'epoch': 0.53} 53%|█████▎ | 6524/12313 [4:52:56<4:17:02, 2.66s/it] 53%|█████▎ | 6525/12313 [4:52:59<4:11:30, 2.61s/it] {'loss': 0.5801, 'grad_norm': 5.2980757209337215, 'learning_rate': 2.379373185891908e-06, 'epoch': 0.53} 53%|█████▎ | 6525/12313 [4:52:59<4:11:30, 2.61s/it] 53%|█████▎ | 6526/12313 [4:53:01<4:18:16, 2.68s/it] {'loss': 0.6238, 'grad_norm': 4.379144750425471, 'learning_rate': 2.3787163338598854e-06, 'epoch': 0.53} 53%|█████▎ | 6526/12313 [4:53:01<4:18:16, 2.68s/it] 53%|█████▎ | 6527/12313 [4:53:04<4:20:24, 2.70s/it] {'loss': 0.5211, 'grad_norm': 5.906641818249719, 'learning_rate': 2.3780594902200515e-06, 'epoch': 0.53} 53%|█████▎ | 6527/12313 [4:53:04<4:20:24, 2.70s/it] 53%|█████▎ | 6528/12313 [4:53:07<4:15:22, 2.65s/it] {'loss': 0.4426, 'grad_norm': 5.654605677438833, 'learning_rate': 2.377402655017854e-06, 'epoch': 0.53} 53%|█████▎ | 6528/12313 [4:53:07<4:15:22, 2.65s/it] 53%|█████▎ | 6529/12313 [4:53:09<4:08:23, 2.58s/it] {'loss': 0.4189, 'grad_norm': 9.600316549753119, 'learning_rate': 2.376745828298745e-06, 'epoch': 0.53} 53%|█████▎ | 6529/12313 [4:53:09<4:08:23, 2.58s/it] 53%|█████▎ | 6530/12313 [4:53:12<4:07:31, 2.57s/it] {'loss': 0.5635, 'grad_norm': 4.652507037908199, 'learning_rate': 2.376089010108172e-06, 'epoch': 0.53} 53%|█████▎ | 6530/12313 [4:53:12<4:07:31, 2.57s/it] 53%|█████▎ | 6531/12313 [4:53:14<4:10:04, 2.60s/it] {'loss': 0.452, 'grad_norm': 6.872480563300781, 'learning_rate': 2.3754322004915837e-06, 'epoch': 0.53} 53%|█████▎ | 6531/12313 [4:53:14<4:10:04, 2.60s/it] 53%|█████▎ | 6532/12313 [4:53:17<4:14:46, 2.64s/it] {'loss': 0.5212, 'grad_norm': 6.133795123377434, 'learning_rate': 2.3747753994944283e-06, 'epoch': 0.53} 53%|█████▎ | 6532/12313 [4:53:17<4:14:46, 2.64s/it] 53%|█████▎ | 6533/12313 [4:53:20<4:14:44, 2.64s/it] {'loss': 0.5803, 'grad_norm': 5.688402297871484, 'learning_rate': 2.3741186071621523e-06, 'epoch': 0.53} 53%|█████▎ | 6533/12313 [4:53:20<4:14:44, 2.64s/it] 53%|█████▎ | 6534/12313 [4:53:22<4:13:49, 2.64s/it] {'loss': 0.4938, 'grad_norm': 4.147307899835006, 'learning_rate': 2.373461823540202e-06, 'epoch': 0.53} 53%|█████▎ | 6534/12313 [4:53:22<4:13:49, 2.64s/it] 53%|█████▎ | 6535/12313 [4:53:25<4:18:49, 2.69s/it] {'loss': 0.4921, 'grad_norm': 4.16411315568013, 'learning_rate': 2.3728050486740244e-06, 'epoch': 0.53} 53%|█████▎ | 6535/12313 [4:53:25<4:18:49, 2.69s/it] 53%|█████▎ | 6536/12313 [4:53:28<4:11:37, 2.61s/it] {'loss': 0.5561, 'grad_norm': 4.86479630808414, 'learning_rate': 2.3721482826090643e-06, 'epoch': 0.53} 53%|█████▎ | 6536/12313 [4:53:28<4:11:37, 2.61s/it] 53%|█████▎ | 6537/12313 [4:53:30<4:16:18, 2.66s/it] {'loss': 0.4727, 'grad_norm': 4.549881463792877, 'learning_rate': 2.3714915253907657e-06, 'epoch': 0.53} 53%|█████▎ | 6537/12313 [4:53:30<4:16:18, 2.66s/it] 53%|█████▎ | 6538/12313 [4:53:33<4:11:57, 2.62s/it] {'loss': 0.5856, 'grad_norm': 3.1960496743376745, 'learning_rate': 2.370834777064574e-06, 'epoch': 0.53} 53%|█████▎ | 6538/12313 [4:53:33<4:11:57, 2.62s/it] 53%|█████▎ | 6539/12313 [4:53:36<4:19:56, 2.70s/it] {'loss': 0.4186, 'grad_norm': 6.183831960146142, 'learning_rate': 2.3701780376759323e-06, 'epoch': 0.53} 53%|█████▎ | 6539/12313 [4:53:36<4:19:56, 2.70s/it] 53%|█████▎ | 6540/12313 [4:53:38<4:16:50, 2.67s/it] {'loss': 0.4421, 'grad_norm': 6.1626582028588075, 'learning_rate': 2.3695213072702834e-06, 'epoch': 0.53} 53%|█████▎ | 6540/12313 [4:53:38<4:16:50, 2.67s/it] 53%|█████▎ | 6541/12313 [4:53:41<4:17:46, 2.68s/it] {'loss': 0.5147, 'grad_norm': 4.399409989452963, 'learning_rate': 2.368864585893069e-06, 'epoch': 0.53} 53%|█████▎ | 6541/12313 [4:53:41<4:17:46, 2.68s/it] 53%|█████▎ | 6542/12313 [4:53:44<4:18:31, 2.69s/it] {'loss': 0.56, 'grad_norm': 4.73410977261382, 'learning_rate': 2.368207873589731e-06, 'epoch': 0.53} 53%|█████▎ | 6542/12313 [4:53:44<4:18:31, 2.69s/it] 53%|█████▎ | 6543/12313 [4:53:46<4:19:02, 2.69s/it] {'loss': 0.5333, 'grad_norm': 4.575239688507527, 'learning_rate': 2.3675511704057115e-06, 'epoch': 0.53} 53%|█████▎ | 6543/12313 [4:53:46<4:19:02, 2.69s/it] 53%|█████▎ | 6544/12313 [4:53:49<4:17:08, 2.67s/it] {'loss': 0.5508, 'grad_norm': 7.035858437872166, 'learning_rate': 2.3668944763864486e-06, 'epoch': 0.53} 53%|█████▎ | 6544/12313 [4:53:49<4:17:08, 2.67s/it] 53%|█████▎ | 6545/12313 [4:53:52<4:12:21, 2.63s/it] {'loss': 0.4836, 'grad_norm': 5.241353250145363, 'learning_rate': 2.3662377915773845e-06, 'epoch': 0.53} 53%|█████▎ | 6545/12313 [4:53:52<4:12:21, 2.63s/it] 53%|█████▎ | 6546/12313 [4:53:54<4:16:50, 2.67s/it] {'loss': 0.4395, 'grad_norm': 7.382490118024742, 'learning_rate': 2.365581116023958e-06, 'epoch': 0.53} 53%|█████▎ | 6546/12313 [4:53:54<4:16:50, 2.67s/it] 53%|█████▎ | 6547/12313 [4:53:57<4:21:46, 2.72s/it] {'loss': 0.427, 'grad_norm': 5.3401428892737455, 'learning_rate': 2.364924449771605e-06, 'epoch': 0.53} 53%|█████▎ | 6547/12313 [4:53:57<4:21:46, 2.72s/it] 53%|█████▎ | 6548/12313 [4:54:00<4:12:48, 2.63s/it] {'loss': 0.8927, 'grad_norm': 3.7061230832301066, 'learning_rate': 2.364267792865767e-06, 'epoch': 0.53} 53%|█████▎ | 6548/12313 [4:54:00<4:12:48, 2.63s/it] 53%|█████▎ | 6549/12313 [4:54:02<4:14:06, 2.65s/it] {'loss': 0.5167, 'grad_norm': 4.420036708733726, 'learning_rate': 2.363611145351879e-06, 'epoch': 0.53} 53%|█████▎ | 6549/12313 [4:54:02<4:14:06, 2.65s/it] 53%|█████▎ | 6550/12313 [4:54:05<4:16:17, 2.67s/it] {'loss': 0.5279, 'grad_norm': 4.783438045653566, 'learning_rate': 2.3629545072753767e-06, 'epoch': 0.53} 53%|█████▎ | 6550/12313 [4:54:05<4:16:17, 2.67s/it] 53%|█████▎ | 6551/12313 [4:54:08<4:21:31, 2.72s/it] {'loss': 0.4024, 'grad_norm': 4.749522841051883, 'learning_rate': 2.3622978786816984e-06, 'epoch': 0.53} 53%|█████▎ | 6551/12313 [4:54:08<4:21:31, 2.72s/it] 53%|█████▎ | 6552/12313 [4:54:10<4:16:46, 2.67s/it] {'loss': 0.5393, 'grad_norm': 7.7640179721652, 'learning_rate': 2.361641259616278e-06, 'epoch': 0.53} 53%|█████▎ | 6552/12313 [4:54:10<4:16:46, 2.67s/it] 53%|█████▎ | 6553/12313 [4:54:13<4:19:45, 2.71s/it] {'loss': 0.4325, 'grad_norm': 3.6547210717885865, 'learning_rate': 2.3609846501245494e-06, 'epoch': 0.53} 53%|█████▎ | 6553/12313 [4:54:13<4:19:45, 2.71s/it] 53%|█████▎ | 6554/12313 [4:54:16<4:16:15, 2.67s/it] {'loss': 0.5021, 'grad_norm': 7.3797843232620455, 'learning_rate': 2.3603280502519482e-06, 'epoch': 0.53} 53%|█████▎ | 6554/12313 [4:54:16<4:16:15, 2.67s/it] 53%|█████▎ | 6555/12313 [4:54:18<4:14:16, 2.65s/it] {'loss': 0.5116, 'grad_norm': 3.694146153232864, 'learning_rate': 2.3596714600439062e-06, 'epoch': 0.53} 53%|█████▎ | 6555/12313 [4:54:18<4:14:16, 2.65s/it] 53%|█████▎ | 6556/12313 [4:54:21<4:10:36, 2.61s/it] {'loss': 0.4945, 'grad_norm': 5.0985701151843354, 'learning_rate': 2.3590148795458577e-06, 'epoch': 0.53} 53%|█████▎ | 6556/12313 [4:54:21<4:10:36, 2.61s/it] 53%|█████▎ | 6557/12313 [4:54:23<4:03:12, 2.54s/it] {'loss': 0.5813, 'grad_norm': 5.131669229194393, 'learning_rate': 2.3583583088032313e-06, 'epoch': 0.53} 53%|█████▎ | 6557/12313 [4:54:23<4:03:12, 2.54s/it] 53%|█████▎ | 6558/12313 [4:54:26<4:08:19, 2.59s/it] {'loss': 0.5053, 'grad_norm': 4.005833213164181, 'learning_rate': 2.3577017478614613e-06, 'epoch': 0.53} 53%|█████▎ | 6558/12313 [4:54:26<4:08:19, 2.59s/it] 53%|█████▎ | 6559/12313 [4:54:29<4:10:48, 2.62s/it] {'loss': 0.6758, 'grad_norm': 14.883116766398912, 'learning_rate': 2.357045196765978e-06, 'epoch': 0.53} 53%|█████▎ | 6559/12313 [4:54:29<4:10:48, 2.62s/it] 53%|█████▎ | 6560/12313 [4:54:32<4:20:03, 2.71s/it] {'loss': 0.561, 'grad_norm': 4.9450151525987955, 'learning_rate': 2.3563886555622093e-06, 'epoch': 0.53} 53%|█████▎ | 6560/12313 [4:54:32<4:20:03, 2.71s/it] 53%|█████▎ | 6561/12313 [4:54:34<4:25:08, 2.77s/it] {'loss': 0.5195, 'grad_norm': 9.194980096203347, 'learning_rate': 2.355732124295586e-06, 'epoch': 0.53} 53%|█████▎ | 6561/12313 [4:54:34<4:25:08, 2.77s/it] 53%|█████▎ | 6562/12313 [4:54:37<4:21:45, 2.73s/it] {'loss': 0.6133, 'grad_norm': 5.200787441007623, 'learning_rate': 2.3550756030115364e-06, 'epoch': 0.53} 53%|█████▎ | 6562/12313 [4:54:37<4:21:45, 2.73s/it] 53%|█████▎ | 6563/12313 [4:54:40<4:19:53, 2.71s/it] {'loss': 0.456, 'grad_norm': 4.874445965778859, 'learning_rate': 2.3544190917554875e-06, 'epoch': 0.53} 53%|█████▎ | 6563/12313 [4:54:40<4:19:53, 2.71s/it] 53%|█████▎ | 6564/12313 [4:54:42<4:17:31, 2.69s/it] {'loss': 0.5303, 'grad_norm': 4.734097340063761, 'learning_rate': 2.3537625905728677e-06, 'epoch': 0.53} 53%|█████▎ | 6564/12313 [4:54:42<4:17:31, 2.69s/it] 53%|█████▎ | 6565/12313 [4:54:45<4:17:57, 2.69s/it] {'loss': 0.6195, 'grad_norm': 5.317506693611895, 'learning_rate': 2.3531060995091026e-06, 'epoch': 0.53} 53%|█████▎ | 6565/12313 [4:54:45<4:17:57, 2.69s/it] 53%|█████▎ | 6566/12313 [4:54:48<4:18:43, 2.70s/it] {'loss': 0.6773, 'grad_norm': 5.085875518247843, 'learning_rate': 2.352449618609617e-06, 'epoch': 0.53} 53%|█████▎ | 6566/12313 [4:54:48<4:18:43, 2.70s/it] 53%|█████▎ | 6567/12313 [4:54:51<4:17:19, 2.69s/it] {'loss': 0.4075, 'grad_norm': 5.216473307627287, 'learning_rate': 2.3517931479198383e-06, 'epoch': 0.53} 53%|█████▎ | 6567/12313 [4:54:51<4:17:19, 2.69s/it] 53%|█████▎ | 6568/12313 [4:54:53<4:18:11, 2.70s/it] {'loss': 0.5928, 'grad_norm': 4.73626486723172, 'learning_rate': 2.3511366874851885e-06, 'epoch': 0.53} 53%|█████▎ | 6568/12313 [4:54:53<4:18:11, 2.70s/it] 53%|█████▎ | 6569/12313 [4:54:56<4:13:07, 2.64s/it] {'loss': 0.5746, 'grad_norm': 4.068709628111098, 'learning_rate': 2.350480237351092e-06, 'epoch': 0.53} 53%|█████▎ | 6569/12313 [4:54:56<4:13:07, 2.64s/it] 53%|█████▎ | 6570/12313 [4:54:58<4:12:16, 2.64s/it] {'loss': 0.6726, 'grad_norm': 7.162771272344813, 'learning_rate': 2.3498237975629726e-06, 'epoch': 0.53} 53%|█████▎ | 6570/12313 [4:54:58<4:12:16, 2.64s/it] 53%|█████▎ | 6571/12313 [4:55:01<4:24:10, 2.76s/it] {'loss': 0.4304, 'grad_norm': 8.523429545622452, 'learning_rate': 2.349167368166251e-06, 'epoch': 0.53} 53%|█████▎ | 6571/12313 [4:55:01<4:24:10, 2.76s/it] 53%|█████▎ | 6572/12313 [4:55:04<4:22:32, 2.74s/it] {'loss': 0.5222, 'grad_norm': 5.274081883984625, 'learning_rate': 2.348510949206349e-06, 'epoch': 0.53} 53%|█████▎ | 6572/12313 [4:55:04<4:22:32, 2.74s/it] 53%|█████▎ | 6573/12313 [4:55:07<4:21:35, 2.73s/it] {'loss': 0.6353, 'grad_norm': 3.8799868155561406, 'learning_rate': 2.3478545407286883e-06, 'epoch': 0.53} 53%|█████▎ | 6573/12313 [4:55:07<4:21:35, 2.73s/it] 53%|█████▎ | 6574/12313 [4:55:09<4:13:10, 2.65s/it] {'loss': 0.5822, 'grad_norm': 3.6165853886009844, 'learning_rate': 2.3471981427786875e-06, 'epoch': 0.53} 53%|█████▎ | 6574/12313 [4:55:09<4:13:10, 2.65s/it] 53%|█████▎ | 6575/12313 [4:55:12<4:14:30, 2.66s/it] {'loss': 0.4707, 'grad_norm': 6.353757597164967, 'learning_rate': 2.3465417554017675e-06, 'epoch': 0.53} 53%|█████▎ | 6575/12313 [4:55:12<4:14:30, 2.66s/it] 53%|█████▎ | 6576/12313 [4:55:14<4:10:13, 2.62s/it] {'loss': 0.5357, 'grad_norm': 5.10414117668738, 'learning_rate': 2.3458853786433444e-06, 'epoch': 0.53} 53%|█████▎ | 6576/12313 [4:55:14<4:10:13, 2.62s/it] 53%|█████▎ | 6577/12313 [4:55:17<4:05:29, 2.57s/it] {'loss': 0.6855, 'grad_norm': 3.3925342392484317, 'learning_rate': 2.345229012548838e-06, 'epoch': 0.53} 53%|█████▎ | 6577/12313 [4:55:17<4:05:29, 2.57s/it] 53%|█████▎ | 6578/12313 [4:55:20<4:17:56, 2.70s/it] {'loss': 0.4581, 'grad_norm': 3.3042399836502776, 'learning_rate': 2.3445726571636656e-06, 'epoch': 0.53} 53%|█████▎ | 6578/12313 [4:55:20<4:17:56, 2.70s/it] 53%|█████▎ | 6579/12313 [4:55:23<4:15:09, 2.67s/it] {'loss': 0.4442, 'grad_norm': 4.792803765102656, 'learning_rate': 2.3439163125332415e-06, 'epoch': 0.53} 53%|█████▎ | 6579/12313 [4:55:23<4:15:09, 2.67s/it] 53%|█████▎ | 6580/12313 [4:55:25<4:18:25, 2.70s/it] {'loss': 0.5999, 'grad_norm': 53.09136619996998, 'learning_rate': 2.343259978702984e-06, 'epoch': 0.53} 53%|█████▎ | 6580/12313 [4:55:25<4:18:25, 2.70s/it] 53%|█████▎ | 6581/12313 [4:55:28<4:19:00, 2.71s/it] {'loss': 0.55, 'grad_norm': 4.366575308474977, 'learning_rate': 2.3426036557183056e-06, 'epoch': 0.53} 53%|█████▎ | 6581/12313 [4:55:28<4:19:00, 2.71s/it] 53%|█████▎ | 6582/12313 [4:55:31<4:22:57, 2.75s/it] {'loss': 0.6209, 'grad_norm': 4.219747570498485, 'learning_rate': 2.3419473436246206e-06, 'epoch': 0.53} 53%|█████▎ | 6582/12313 [4:55:31<4:22:57, 2.75s/it] 53%|█████▎ | 6583/12313 [4:55:34<4:18:21, 2.71s/it] {'loss': 0.5682, 'grad_norm': 5.505458655811916, 'learning_rate': 2.341291042467344e-06, 'epoch': 0.53} 53%|█████▎ | 6583/12313 [4:55:34<4:18:21, 2.71s/it] 53%|█████▎ | 6584/12313 [4:55:36<4:18:48, 2.71s/it] {'loss': 0.4587, 'grad_norm': 5.438215231835009, 'learning_rate': 2.3406347522918866e-06, 'epoch': 0.53} 53%|█████▎ | 6584/12313 [4:55:36<4:18:48, 2.71s/it] 53%|█████▎ | 6585/12313 [4:55:39<4:12:20, 2.64s/it] {'loss': 0.6507, 'grad_norm': 9.221582577454688, 'learning_rate': 2.339978473143661e-06, 'epoch': 0.53} 53%|█████▎ | 6585/12313 [4:55:39<4:12:20, 2.64s/it] 53%|█████▎ | 6586/12313 [4:55:41<4:14:07, 2.66s/it] {'loss': 0.5201, 'grad_norm': 6.264732038581897, 'learning_rate': 2.3393222050680788e-06, 'epoch': 0.53} 53%|█████▎ | 6586/12313 [4:55:41<4:14:07, 2.66s/it] 53%|█████▎ | 6587/12313 [4:55:44<4:08:43, 2.61s/it] {'loss': 0.5507, 'grad_norm': 5.801333658841605, 'learning_rate': 2.338665948110549e-06, 'epoch': 0.53} 53%|█████▎ | 6587/12313 [4:55:44<4:08:43, 2.61s/it] 54%|█████▎ | 6588/12313 [4:55:47<4:11:29, 2.64s/it] {'loss': 0.67, 'grad_norm': 6.810336490229261, 'learning_rate': 2.3380097023164813e-06, 'epoch': 0.54} 54%|█████▎ | 6588/12313 [4:55:47<4:11:29, 2.64s/it] 54%|█████▎ | 6589/12313 [4:55:49<4:07:04, 2.59s/it] {'loss': 0.391, 'grad_norm': 5.781581076596003, 'learning_rate': 2.337353467731286e-06, 'epoch': 0.54} 54%|█████▎ | 6589/12313 [4:55:49<4:07:04, 2.59s/it] 54%|█████▎ | 6590/12313 [4:55:52<4:16:36, 2.69s/it] {'loss': 0.5351, 'grad_norm': 4.370970902980655, 'learning_rate': 2.3366972444003698e-06, 'epoch': 0.54} 54%|█████▎ | 6590/12313 [4:55:52<4:16:36, 2.69s/it] 54%|█████▎ | 6591/12313 [4:55:55<4:14:56, 2.67s/it] {'loss': 0.4929, 'grad_norm': 12.469579377782821, 'learning_rate': 2.3360410323691386e-06, 'epoch': 0.54} 54%|█████▎ | 6591/12313 [4:55:55<4:14:56, 2.67s/it] 54%|█████▎ | 6592/12313 [4:55:57<4:16:45, 2.69s/it] {'loss': 0.4752, 'grad_norm': 10.008332152388622, 'learning_rate': 2.335384831683002e-06, 'epoch': 0.54} 54%|█████▎ | 6592/12313 [4:55:57<4:16:45, 2.69s/it] 54%|█████▎ | 6593/12313 [4:56:00<4:17:55, 2.71s/it] {'loss': 0.4668, 'grad_norm': 9.797876571328075, 'learning_rate': 2.334728642387363e-06, 'epoch': 0.54} 54%|█████▎ | 6593/12313 [4:56:00<4:17:55, 2.71s/it] 54%|█████▎ | 6594/12313 [4:56:03<4:18:07, 2.71s/it] {'loss': 0.3572, 'grad_norm': 5.754796284833849, 'learning_rate': 2.334072464527628e-06, 'epoch': 0.54} 54%|█████▎ | 6594/12313 [4:56:03<4:18:07, 2.71s/it] 54%|█████▎ | 6595/12313 [4:56:06<4:17:10, 2.70s/it] {'loss': 0.4839, 'grad_norm': 4.316813249696164, 'learning_rate': 2.333416298149199e-06, 'epoch': 0.54} 54%|█████▎ | 6595/12313 [4:56:06<4:17:10, 2.70s/it] 54%|█████▎ | 6596/12313 [4:56:08<4:18:11, 2.71s/it] {'loss': 0.3869, 'grad_norm': 4.974375179617703, 'learning_rate': 2.3327601432974817e-06, 'epoch': 0.54} 54%|█████▎ | 6596/12313 [4:56:08<4:18:11, 2.71s/it] 54%|█████▎ | 6597/12313 [4:56:11<4:18:09, 2.71s/it] {'loss': 0.4175, 'grad_norm': 3.985538587500043, 'learning_rate': 2.332104000017877e-06, 'epoch': 0.54} 54%|█████▎ | 6597/12313 [4:56:11<4:18:09, 2.71s/it] 54%|█████▎ | 6598/12313 [4:56:14<4:20:17, 2.73s/it] {'loss': 0.5612, 'grad_norm': 3.2055836403047064, 'learning_rate': 2.3314478683557863e-06, 'epoch': 0.54} 54%|█████▎ | 6598/12313 [4:56:14<4:20:17, 2.73s/it] 54%|█████▎ | 6599/12313 [4:56:16<4:13:55, 2.67s/it] {'loss': 0.6605, 'grad_norm': 5.214849251275779, 'learning_rate': 2.330791748356612e-06, 'epoch': 0.54} 54%|█████▎ | 6599/12313 [4:56:16<4:13:55, 2.67s/it] 54%|█████▎ | 6600/12313 [4:56:19<4:11:01, 2.64s/it] {'loss': 0.4432, 'grad_norm': 5.888955286356365, 'learning_rate': 2.3301356400657527e-06, 'epoch': 0.54} 54%|█████▎ | 6600/12313 [4:56:19<4:11:01, 2.64s/it] 54%|█████▎ | 6601/12313 [4:56:21<4:08:16, 2.61s/it] {'loss': 0.6324, 'grad_norm': 4.505062527177114, 'learning_rate': 2.3294795435286073e-06, 'epoch': 0.54} 54%|█████▎ | 6601/12313 [4:56:21<4:08:16, 2.61s/it] 54%|█████▎ | 6602/12313 [4:56:24<4:10:52, 2.64s/it] {'loss': 0.4921, 'grad_norm': 79.62251681114921, 'learning_rate': 2.3288234587905767e-06, 'epoch': 0.54} 54%|█████▎ | 6602/12313 [4:56:24<4:10:52, 2.64s/it] 54%|█████▎ | 6603/12313 [4:56:27<4:06:56, 2.59s/it] {'loss': 0.4798, 'grad_norm': 7.158021669325846, 'learning_rate': 2.328167385897056e-06, 'epoch': 0.54} 54%|█████▎ | 6603/12313 [4:56:27<4:06:56, 2.59s/it] 54%|█████▎ | 6604/12313 [4:56:29<4:07:37, 2.60s/it] {'loss': 0.6895, 'grad_norm': 5.336534392507409, 'learning_rate': 2.327511324893442e-06, 'epoch': 0.54} 54%|█████▎ | 6604/12313 [4:56:29<4:07:37, 2.60s/it] 54%|█████▎ | 6605/12313 [4:56:32<4:17:29, 2.71s/it] {'loss': 0.5348, 'grad_norm': 5.969891236348079, 'learning_rate': 2.3268552758251327e-06, 'epoch': 0.54} 54%|█████▎ | 6605/12313 [4:56:32<4:17:29, 2.71s/it] 54%|█████▎ | 6606/12313 [4:56:35<4:16:12, 2.69s/it] {'loss': 0.4408, 'grad_norm': 5.339767902188447, 'learning_rate': 2.3261992387375216e-06, 'epoch': 0.54} 54%|█████▎ | 6606/12313 [4:56:35<4:16:12, 2.69s/it] 54%|█████▎ | 6607/12313 [4:56:37<4:13:22, 2.66s/it] {'loss': 0.5484, 'grad_norm': 6.745067120318976, 'learning_rate': 2.3255432136760026e-06, 'epoch': 0.54} 54%|█████▎ | 6607/12313 [4:56:37<4:13:22, 2.66s/it] 54%|█████▎ | 6608/12313 [4:56:40<4:13:01, 2.66s/it] {'loss': 0.5867, 'grad_norm': 6.071978645783486, 'learning_rate': 2.324887200685971e-06, 'epoch': 0.54} 54%|█████▎ | 6608/12313 [4:56:40<4:13:01, 2.66s/it] 54%|█████▎ | 6609/12313 [4:56:43<4:11:08, 2.64s/it] {'loss': 0.5295, 'grad_norm': 6.2394195972445665, 'learning_rate': 2.3242311998128182e-06, 'epoch': 0.54} 54%|█████▎ | 6609/12313 [4:56:43<4:11:08, 2.64s/it] 54%|█████▎ | 6610/12313 [4:56:45<4:14:02, 2.67s/it] {'loss': 0.4169, 'grad_norm': 5.667495576556501, 'learning_rate': 2.3235752111019362e-06, 'epoch': 0.54} 54%|█████▎ | 6610/12313 [4:56:45<4:14:02, 2.67s/it] 54%|█████▎ | 6611/12313 [4:56:48<4:10:38, 2.64s/it] {'loss': 0.6219, 'grad_norm': 7.823798452635792, 'learning_rate': 2.3229192345987146e-06, 'epoch': 0.54} 54%|█████▎ | 6611/12313 [4:56:48<4:10:38, 2.64s/it] 54%|█████▎ | 6612/12313 [4:56:50<4:05:11, 2.58s/it] {'loss': 0.5665, 'grad_norm': 6.983866314486278, 'learning_rate': 2.322263270348546e-06, 'epoch': 0.54} 54%|█████▎ | 6612/12313 [4:56:50<4:05:11, 2.58s/it] 54%|█████▎ | 6613/12313 [4:56:53<4:06:14, 2.59s/it] {'loss': 0.3971, 'grad_norm': 6.526144378674829, 'learning_rate': 2.3216073183968184e-06, 'epoch': 0.54} 54%|█████▎ | 6613/12313 [4:56:53<4:06:14, 2.59s/it] 54%|█████▎ | 6614/12313 [4:56:55<4:01:06, 2.54s/it] {'loss': 0.6378, 'grad_norm': 10.243690491824887, 'learning_rate': 2.320951378788919e-06, 'epoch': 0.54} 54%|█████▎ | 6614/12313 [4:56:55<4:01:06, 2.54s/it] 54%|█████▎ | 6615/12313 [4:56:58<4:02:59, 2.56s/it] {'loss': 0.548, 'grad_norm': 4.954484713664851, 'learning_rate': 2.3202954515702384e-06, 'epoch': 0.54} 54%|█████▎ | 6615/12313 [4:56:58<4:02:59, 2.56s/it] 54%|█████▎ | 6616/12313 [4:57:01<4:08:31, 2.62s/it] {'loss': 0.5172, 'grad_norm': 5.184321630416869, 'learning_rate': 2.3196395367861605e-06, 'epoch': 0.54} 54%|█████▎ | 6616/12313 [4:57:01<4:08:31, 2.62s/it] 54%|█████▎ | 6617/12313 [4:57:04<4:14:58, 2.69s/it] {'loss': 0.4252, 'grad_norm': 5.640278703192089, 'learning_rate': 2.3189836344820717e-06, 'epoch': 0.54} 54%|█████▎ | 6617/12313 [4:57:04<4:14:58, 2.69s/it] 54%|█████▎ | 6618/12313 [4:57:06<4:17:07, 2.71s/it] {'loss': 0.5939, 'grad_norm': 6.756103044876508, 'learning_rate': 2.318327744703358e-06, 'epoch': 0.54} 54%|█████▎ | 6618/12313 [4:57:06<4:17:07, 2.71s/it] 54%|█████▍ | 6619/12313 [4:57:09<4:11:48, 2.65s/it] {'loss': 0.5293, 'grad_norm': 4.587869437489592, 'learning_rate': 2.317671867495403e-06, 'epoch': 0.54} 54%|█████▍ | 6619/12313 [4:57:09<4:11:48, 2.65s/it] 54%|█████▍ | 6620/12313 [4:57:12<4:15:33, 2.69s/it] {'loss': 0.4909, 'grad_norm': 5.744181498371175, 'learning_rate': 2.317016002903589e-06, 'epoch': 0.54} 54%|█████▍ | 6620/12313 [4:57:12<4:15:33, 2.69s/it] 54%|█████▍ | 6621/12313 [4:57:14<4:12:10, 2.66s/it] {'loss': 0.52, 'grad_norm': 5.53015069217161, 'learning_rate': 2.3163601509733e-06, 'epoch': 0.54} 54%|█████▍ | 6621/12313 [4:57:14<4:12:10, 2.66s/it] 54%|█████▍ | 6622/12313 [4:57:17<4:13:41, 2.67s/it] {'loss': 0.4972, 'grad_norm': 6.7221739959300555, 'learning_rate': 2.3157043117499174e-06, 'epoch': 0.54} 54%|█████▍ | 6622/12313 [4:57:17<4:13:41, 2.67s/it] 54%|█████▍ | 6623/12313 [4:57:20<4:16:20, 2.70s/it] {'loss': 0.4686, 'grad_norm': 4.038968186465272, 'learning_rate': 2.3150484852788186e-06, 'epoch': 0.54} 54%|█████▍ | 6623/12313 [4:57:20<4:16:20, 2.70s/it] 54%|█████▍ | 6624/12313 [4:57:22<4:15:05, 2.69s/it] {'loss': 0.4335, 'grad_norm': 5.334775479299188, 'learning_rate': 2.3143926716053876e-06, 'epoch': 0.54} 54%|█████▍ | 6624/12313 [4:57:22<4:15:05, 2.69s/it] 54%|█████▍ | 6625/12313 [4:57:25<4:15:49, 2.70s/it] {'loss': 0.6309, 'grad_norm': 8.59183543983034, 'learning_rate': 2.3137368707750018e-06, 'epoch': 0.54} 54%|█████▍ | 6625/12313 [4:57:25<4:15:49, 2.70s/it] 54%|█████▍ | 6626/12313 [4:57:28<4:18:59, 2.73s/it] {'loss': 0.6889, 'grad_norm': 4.578718128339019, 'learning_rate': 2.3130810828330375e-06, 'epoch': 0.54} 54%|█████▍ | 6626/12313 [4:57:28<4:18:59, 2.73s/it] 54%|█████▍ | 6627/12313 [4:57:31<4:17:06, 2.71s/it] {'loss': 0.43, 'grad_norm': 7.819423595541969, 'learning_rate': 2.3124253078248734e-06, 'epoch': 0.54} 54%|█████▍ | 6627/12313 [4:57:31<4:17:06, 2.71s/it] 54%|█████▍ | 6628/12313 [4:57:33<4:12:27, 2.66s/it] {'loss': 0.7549, 'grad_norm': 3.229990826836947, 'learning_rate': 2.3117695457958857e-06, 'epoch': 0.54} 54%|█████▍ | 6628/12313 [4:57:33<4:12:27, 2.66s/it] 54%|█████▍ | 6629/12313 [4:57:36<4:17:56, 2.72s/it] {'loss': 0.434, 'grad_norm': 4.562845996906024, 'learning_rate': 2.3111137967914492e-06, 'epoch': 0.54} 54%|█████▍ | 6629/12313 [4:57:36<4:17:56, 2.72s/it] 54%|█████▍ | 6630/12313 [4:57:38<4:10:36, 2.65s/it] {'loss': 0.3945, 'grad_norm': 5.915615501905179, 'learning_rate': 2.310458060856937e-06, 'epoch': 0.54} 54%|█████▍ | 6630/12313 [4:57:38<4:10:36, 2.65s/it] 54%|█████▍ | 6631/12313 [4:57:41<4:06:45, 2.61s/it] {'loss': 0.542, 'grad_norm': 10.277625736537166, 'learning_rate': 2.3098023380377257e-06, 'epoch': 0.54} 54%|█████▍ | 6631/12313 [4:57:41<4:06:45, 2.61s/it] 54%|█████▍ | 6632/12313 [4:57:44<4:14:11, 2.68s/it] {'loss': 0.4956, 'grad_norm': 3.9982941253903843, 'learning_rate': 2.309146628379185e-06, 'epoch': 0.54} 54%|█████▍ | 6632/12313 [4:57:44<4:14:11, 2.68s/it] 54%|█████▍ | 6633/12313 [4:57:47<4:15:16, 2.70s/it] {'loss': 0.3932, 'grad_norm': 5.587736755912178, 'learning_rate': 2.308490931926687e-06, 'epoch': 0.54} 54%|█████▍ | 6633/12313 [4:57:47<4:15:16, 2.70s/it] 54%|█████▍ | 6634/12313 [4:57:49<4:14:31, 2.69s/it] {'loss': 0.551, 'grad_norm': 6.249822940854045, 'learning_rate': 2.3078352487256045e-06, 'epoch': 0.54} 54%|█████▍ | 6634/12313 [4:57:49<4:14:31, 2.69s/it] 54%|█████▍ | 6635/12313 [4:57:52<4:14:01, 2.68s/it] {'loss': 0.419, 'grad_norm': 4.330260149960894, 'learning_rate': 2.3071795788213047e-06, 'epoch': 0.54} 54%|█████▍ | 6635/12313 [4:57:52<4:14:01, 2.68s/it] 54%|█████▍ | 6636/12313 [4:57:54<4:06:13, 2.60s/it] {'loss': 0.912, 'grad_norm': 3.9213426854190554, 'learning_rate': 2.3065239222591574e-06, 'epoch': 0.54} 54%|█████▍ | 6636/12313 [4:57:54<4:06:13, 2.60s/it] 54%|█████▍ | 6637/12313 [4:57:57<4:04:23, 2.58s/it] {'loss': 0.582, 'grad_norm': 5.204419720367973, 'learning_rate': 2.3058682790845314e-06, 'epoch': 0.54} 54%|█████▍ | 6637/12313 [4:57:57<4:04:23, 2.58s/it] 54%|█████▍ | 6638/12313 [4:57:59<4:04:24, 2.58s/it] {'loss': 0.6209, 'grad_norm': 6.739029726775385, 'learning_rate': 2.3052126493427934e-06, 'epoch': 0.54} 54%|█████▍ | 6638/12313 [4:57:59<4:04:24, 2.58s/it] 54%|█████▍ | 6639/12313 [4:58:02<4:09:38, 2.64s/it] {'loss': 0.431, 'grad_norm': 4.96555228670431, 'learning_rate': 2.304557033079308e-06, 'epoch': 0.54} 54%|█████▍ | 6639/12313 [4:58:02<4:09:38, 2.64s/it] 54%|█████▍ | 6640/12313 [4:58:05<4:14:16, 2.69s/it] {'loss': 0.3933, 'grad_norm': 6.401695121326637, 'learning_rate': 2.303901430339442e-06, 'epoch': 0.54} 54%|█████▍ | 6640/12313 [4:58:05<4:14:16, 2.69s/it] 54%|█████▍ | 6641/12313 [4:58:08<4:12:37, 2.67s/it] {'loss': 0.5304, 'grad_norm': 6.539736744319487, 'learning_rate': 2.30324584116856e-06, 'epoch': 0.54} 54%|█████▍ | 6641/12313 [4:58:08<4:12:37, 2.67s/it] 54%|█████▍ | 6642/12313 [4:58:10<4:11:25, 2.66s/it] {'loss': 0.4851, 'grad_norm': 4.494895959996267, 'learning_rate': 2.302590265612023e-06, 'epoch': 0.54} 54%|█████▍ | 6642/12313 [4:58:10<4:11:25, 2.66s/it] 54%|█████▍ | 6643/12313 [4:58:13<4:10:47, 2.65s/it] {'loss': 0.542, 'grad_norm': 5.068900846135241, 'learning_rate': 2.301934703715196e-06, 'epoch': 0.54} 54%|█████▍ | 6643/12313 [4:58:13<4:10:47, 2.65s/it] 54%|█████▍ | 6644/12313 [4:58:16<4:11:57, 2.67s/it] {'loss': 0.4716, 'grad_norm': 3.670185354744509, 'learning_rate': 2.301279155523439e-06, 'epoch': 0.54} 54%|█████▍ | 6644/12313 [4:58:16<4:11:57, 2.67s/it] 54%|█████▍ | 6645/12313 [4:58:18<4:08:42, 2.63s/it] {'loss': 0.4095, 'grad_norm': 3.9777847287273436, 'learning_rate': 2.3006236210821127e-06, 'epoch': 0.54} 54%|█████▍ | 6645/12313 [4:58:18<4:08:42, 2.63s/it] 54%|█████▍ | 6646/12313 [4:58:21<4:06:17, 2.61s/it] {'loss': 0.4291, 'grad_norm': 8.621397414880517, 'learning_rate': 2.2999681004365755e-06, 'epoch': 0.54} 54%|█████▍ | 6646/12313 [4:58:21<4:06:17, 2.61s/it] 54%|█████▍ | 6647/12313 [4:58:23<4:04:49, 2.59s/it] {'loss': 0.5603, 'grad_norm': 4.348209634682363, 'learning_rate': 2.299312593632189e-06, 'epoch': 0.54} 54%|█████▍ | 6647/12313 [4:58:23<4:04:49, 2.59s/it] 54%|█████▍ | 6648/12313 [4:58:26<4:05:04, 2.60s/it] {'loss': 0.4495, 'grad_norm': 5.453003219785878, 'learning_rate': 2.298657100714308e-06, 'epoch': 0.54} 54%|█████▍ | 6648/12313 [4:58:26<4:05:04, 2.60s/it] 54%|█████▍ | 6649/12313 [4:58:28<4:03:38, 2.58s/it] {'loss': 0.5602, 'grad_norm': 6.222935801462418, 'learning_rate': 2.2980016217282892e-06, 'epoch': 0.54} 54%|█████▍ | 6649/12313 [4:58:28<4:03:38, 2.58s/it] 54%|█████▍ | 6650/12313 [4:58:31<4:03:52, 2.58s/it] {'loss': 0.6055, 'grad_norm': 6.456779266203264, 'learning_rate': 2.2973461567194903e-06, 'epoch': 0.54} 54%|█████▍ | 6650/12313 [4:58:31<4:03:52, 2.58s/it] 54%|█████▍ | 6651/12313 [4:58:34<4:08:00, 2.63s/it] {'loss': 0.5931, 'grad_norm': 2.812177928800366, 'learning_rate': 2.296690705733265e-06, 'epoch': 0.54} 54%|█████▍ | 6651/12313 [4:58:34<4:08:00, 2.63s/it] 54%|█████▍ | 6652/12313 [4:58:36<4:09:14, 2.64s/it] {'loss': 0.5323, 'grad_norm': 5.628644054902416, 'learning_rate': 2.2960352688149657e-06, 'epoch': 0.54} 54%|█████▍ | 6652/12313 [4:58:36<4:09:14, 2.64s/it] 54%|█████▍ | 6653/12313 [4:58:39<4:08:12, 2.63s/it] {'loss': 0.6064, 'grad_norm': 6.158902153206404, 'learning_rate': 2.295379846009947e-06, 'epoch': 0.54} 54%|█████▍ | 6653/12313 [4:58:39<4:08:12, 2.63s/it] 54%|█████▍ | 6654/12313 [4:58:42<4:13:07, 2.68s/it] {'loss': 0.5383, 'grad_norm': 4.596838238746014, 'learning_rate': 2.2947244373635608e-06, 'epoch': 0.54} 54%|█████▍ | 6654/12313 [4:58:42<4:13:07, 2.68s/it] 54%|█████▍ | 6655/12313 [4:58:45<4:11:54, 2.67s/it] {'loss': 0.4802, 'grad_norm': 10.141289999607887, 'learning_rate': 2.294069042921156e-06, 'epoch': 0.54} 54%|█████▍ | 6655/12313 [4:58:45<4:11:54, 2.67s/it] 54%|█████▍ | 6656/12313 [4:58:47<4:12:49, 2.68s/it] {'loss': 0.5922, 'grad_norm': 3.56367135718309, 'learning_rate': 2.2934136627280834e-06, 'epoch': 0.54} 54%|█████▍ | 6656/12313 [4:58:47<4:12:49, 2.68s/it] 54%|█████▍ | 6657/12313 [4:58:50<4:13:59, 2.69s/it] {'loss': 0.7018, 'grad_norm': 4.957818406371747, 'learning_rate': 2.292758296829693e-06, 'epoch': 0.54} 54%|█████▍ | 6657/12313 [4:58:50<4:13:59, 2.69s/it] 54%|█████▍ | 6658/12313 [4:58:53<4:10:44, 2.66s/it] {'loss': 0.5328, 'grad_norm': 4.552129764687744, 'learning_rate': 2.2921029452713305e-06, 'epoch': 0.54} 54%|█████▍ | 6658/12313 [4:58:53<4:10:44, 2.66s/it] 54%|█████▍ | 6659/12313 [4:58:55<4:09:56, 2.65s/it] {'loss': 0.5459, 'grad_norm': 5.962108877745079, 'learning_rate': 2.291447608098345e-06, 'epoch': 0.54} 54%|█████▍ | 6659/12313 [4:58:55<4:09:56, 2.65s/it] 54%|█████▍ | 6660/12313 [4:58:58<4:06:05, 2.61s/it] {'loss': 0.445, 'grad_norm': 3.831361014930282, 'learning_rate': 2.290792285356081e-06, 'epoch': 0.54} 54%|█████▍ | 6660/12313 [4:58:58<4:06:05, 2.61s/it] 54%|█████▍ | 6661/12313 [4:59:00<4:09:53, 2.65s/it] {'loss': 0.6447, 'grad_norm': 4.039508905590606, 'learning_rate': 2.290136977089883e-06, 'epoch': 0.54} 54%|█████▍ | 6661/12313 [4:59:00<4:09:53, 2.65s/it] 54%|█████▍ | 6662/12313 [4:59:03<4:14:40, 2.70s/it] {'loss': 0.4856, 'grad_norm': 4.701839781470267, 'learning_rate': 2.289481683345096e-06, 'epoch': 0.54} 54%|█████▍ | 6662/12313 [4:59:03<4:14:40, 2.70s/it] 54%|█████▍ | 6663/12313 [4:59:06<4:10:48, 2.66s/it] {'loss': 0.3886, 'grad_norm': 5.28893125359557, 'learning_rate': 2.2888264041670625e-06, 'epoch': 0.54} 54%|█████▍ | 6663/12313 [4:59:06<4:10:48, 2.66s/it] 54%|█████▍ | 6664/12313 [4:59:08<4:10:04, 2.66s/it] {'loss': 0.451, 'grad_norm': 4.226127643770201, 'learning_rate': 2.288171139601124e-06, 'epoch': 0.54} 54%|█████▍ | 6664/12313 [4:59:08<4:10:04, 2.66s/it] 54%|█████▍ | 6665/12313 [4:59:11<4:16:07, 2.72s/it] {'loss': 0.585, 'grad_norm': 4.150106558822198, 'learning_rate': 2.287515889692621e-06, 'epoch': 0.54} 54%|█████▍ | 6665/12313 [4:59:11<4:16:07, 2.72s/it] 54%|█████▍ | 6666/12313 [4:59:14<4:14:31, 2.70s/it] {'loss': 0.5057, 'grad_norm': 7.149187546807345, 'learning_rate': 2.2868606544868947e-06, 'epoch': 0.54} 54%|█████▍ | 6666/12313 [4:59:14<4:14:31, 2.70s/it] 54%|█████▍ | 6667/12313 [4:59:17<4:19:14, 2.76s/it] {'loss': 0.5628, 'grad_norm': 4.941704759283249, 'learning_rate': 2.2862054340292835e-06, 'epoch': 0.54} 54%|█████▍ | 6667/12313 [4:59:17<4:19:14, 2.76s/it] 54%|█████▍ | 6668/12313 [4:59:19<4:11:30, 2.67s/it] {'loss': 0.5538, 'grad_norm': 6.843536039362226, 'learning_rate': 2.2855502283651238e-06, 'epoch': 0.54} 54%|█████▍ | 6668/12313 [4:59:19<4:11:30, 2.67s/it] 54%|█████▍ | 6669/12313 [4:59:22<4:17:32, 2.74s/it] {'loss': 0.5538, 'grad_norm': 6.747892070688352, 'learning_rate': 2.284895037539753e-06, 'epoch': 0.54} 54%|█████▍ | 6669/12313 [4:59:22<4:17:32, 2.74s/it] 54%|█████▍ | 6670/12313 [4:59:25<4:16:04, 2.72s/it] {'loss': 0.5889, 'grad_norm': 4.985721031141658, 'learning_rate': 2.2842398615985086e-06, 'epoch': 0.54} 54%|█████▍ | 6670/12313 [4:59:25<4:16:04, 2.72s/it] 54%|█████▍ | 6671/12313 [4:59:28<4:23:31, 2.80s/it] {'loss': 0.4235, 'grad_norm': 3.9696526712453686, 'learning_rate': 2.283584700586723e-06, 'epoch': 0.54} 54%|█████▍ | 6671/12313 [4:59:28<4:23:31, 2.80s/it] 54%|█████▍ | 6672/12313 [4:59:31<4:20:46, 2.77s/it] {'loss': 0.565, 'grad_norm': 4.370310826153604, 'learning_rate': 2.2829295545497304e-06, 'epoch': 0.54} 54%|█████▍ | 6672/12313 [4:59:31<4:20:46, 2.77s/it] 54%|█████▍ | 6673/12313 [4:59:33<4:11:12, 2.67s/it] {'loss': 0.4592, 'grad_norm': 7.303076408805493, 'learning_rate': 2.282274423532865e-06, 'epoch': 0.54} 54%|█████▍ | 6673/12313 [4:59:33<4:11:12, 2.67s/it] 54%|█████▍ | 6674/12313 [4:59:36<4:12:32, 2.69s/it] {'loss': 0.519, 'grad_norm': 5.82847333716156, 'learning_rate': 2.2816193075814557e-06, 'epoch': 0.54} 54%|█████▍ | 6674/12313 [4:59:36<4:12:32, 2.69s/it] 54%|█████▍ | 6675/12313 [4:59:38<4:10:44, 2.67s/it] {'loss': 0.5885, 'grad_norm': 3.1507733594386704, 'learning_rate': 2.280964206740835e-06, 'epoch': 0.54} 54%|█████▍ | 6675/12313 [4:59:38<4:10:44, 2.67s/it] 54%|█████▍ | 6676/12313 [4:59:41<4:05:14, 2.61s/it] {'loss': 0.4888, 'grad_norm': 5.983802017759874, 'learning_rate': 2.280309121056333e-06, 'epoch': 0.54} 54%|█████▍ | 6676/12313 [4:59:41<4:05:14, 2.61s/it] 54%|█████▍ | 6677/12313 [4:59:44<4:26:32, 2.84s/it] {'loss': 0.4794, 'grad_norm': 3.19378590139564, 'learning_rate': 2.279654050573276e-06, 'epoch': 0.54} 54%|█████▍ | 6677/12313 [4:59:44<4:26:32, 2.84s/it] 54%|█████▍ | 6678/12313 [4:59:47<4:23:17, 2.80s/it] {'loss': 0.5408, 'grad_norm': 4.270799718455597, 'learning_rate': 2.2789989953369924e-06, 'epoch': 0.54} 54%|█████▍ | 6678/12313 [4:59:47<4:23:17, 2.80s/it] 54%|█████▍ | 6679/12313 [4:59:50<4:19:07, 2.76s/it] {'loss': 0.4466, 'grad_norm': 5.277701316589272, 'learning_rate': 2.27834395539281e-06, 'epoch': 0.54} 54%|█████▍ | 6679/12313 [4:59:50<4:19:07, 2.76s/it] 54%|█████▍ | 6680/12313 [4:59:52<4:13:23, 2.70s/it] {'loss': 0.4871, 'grad_norm': 5.962388380985894, 'learning_rate': 2.2776889307860513e-06, 'epoch': 0.54} 54%|█████▍ | 6680/12313 [4:59:52<4:13:23, 2.70s/it] 54%|█████▍ | 6681/12313 [4:59:55<4:06:44, 2.63s/it] {'loss': 0.4022, 'grad_norm': 4.937531617716085, 'learning_rate': 2.2770339215620433e-06, 'epoch': 0.54} 54%|█████▍ | 6681/12313 [4:59:55<4:06:44, 2.63s/it] 54%|█████▍ | 6682/12313 [4:59:57<4:05:47, 2.62s/it] {'loss': 0.4676, 'grad_norm': 4.989377028856358, 'learning_rate': 2.2763789277661077e-06, 'epoch': 0.54} 54%|█████▍ | 6682/12313 [4:59:57<4:05:47, 2.62s/it] 54%|█████▍ | 6683/12313 [5:00:00<4:12:08, 2.69s/it] {'loss': 0.4234, 'grad_norm': 3.814875085144808, 'learning_rate': 2.2757239494435666e-06, 'epoch': 0.54} 54%|█████▍ | 6683/12313 [5:00:00<4:12:08, 2.69s/it] 54%|█████▍ | 6684/12313 [5:00:03<4:14:32, 2.71s/it] {'loss': 0.6319, 'grad_norm': 6.832068792784659, 'learning_rate': 2.2750689866397407e-06, 'epoch': 0.54} 54%|█████▍ | 6684/12313 [5:00:03<4:14:32, 2.71s/it] 54%|█████▍ | 6685/12313 [5:00:06<4:15:07, 2.72s/it] {'loss': 0.3713, 'grad_norm': 26.304115965543602, 'learning_rate': 2.2744140393999507e-06, 'epoch': 0.54} 54%|█████▍ | 6685/12313 [5:00:06<4:15:07, 2.72s/it] 54%|█████▍ | 6686/12313 [5:00:08<4:12:28, 2.69s/it] {'loss': 0.6063, 'grad_norm': 7.2769279119321135, 'learning_rate': 2.273759107769516e-06, 'epoch': 0.54} 54%|█████▍ | 6686/12313 [5:00:08<4:12:28, 2.69s/it] 54%|█████▍ | 6687/12313 [5:00:11<4:17:55, 2.75s/it] {'loss': 0.5373, 'grad_norm': 4.819897180382021, 'learning_rate': 2.2731041917937524e-06, 'epoch': 0.54} 54%|█████▍ | 6687/12313 [5:00:11<4:17:55, 2.75s/it] 54%|█████▍ | 6688/12313 [5:00:14<4:12:25, 2.69s/it] {'loss': 0.5572, 'grad_norm': 5.898798059857416, 'learning_rate': 2.2724492915179787e-06, 'epoch': 0.54} 54%|█████▍ | 6688/12313 [5:00:14<4:12:25, 2.69s/it] 54%|█████▍ | 6689/12313 [5:00:16<4:09:33, 2.66s/it] {'loss': 0.6711, 'grad_norm': 4.441345383933851, 'learning_rate': 2.27179440698751e-06, 'epoch': 0.54} 54%|█████▍ | 6689/12313 [5:00:16<4:09:33, 2.66s/it] 54%|█████▍ | 6690/12313 [5:00:19<4:04:48, 2.61s/it] {'loss': 0.6712, 'grad_norm': 3.8157128193761185, 'learning_rate': 2.2711395382476595e-06, 'epoch': 0.54} 54%|█████▍ | 6690/12313 [5:00:19<4:04:48, 2.61s/it] 54%|█████▍ | 6691/12313 [5:00:21<4:05:19, 2.62s/it] {'loss': 0.5042, 'grad_norm': 4.388902512240195, 'learning_rate': 2.2704846853437424e-06, 'epoch': 0.54} 54%|█████▍ | 6691/12313 [5:00:21<4:05:19, 2.62s/it] 54%|█████▍ | 6692/12313 [5:00:24<4:08:36, 2.65s/it] {'loss': 0.4374, 'grad_norm': 7.873275069114776, 'learning_rate': 2.269829848321071e-06, 'epoch': 0.54} 54%|█████▍ | 6692/12313 [5:00:24<4:08:36, 2.65s/it] 54%|█████▍ | 6693/12313 [5:00:27<4:04:49, 2.61s/it] {'loss': 0.3936, 'grad_norm': 5.506546269665942, 'learning_rate': 2.2691750272249545e-06, 'epoch': 0.54} 54%|█████▍ | 6693/12313 [5:00:27<4:04:49, 2.61s/it] 54%|█████▍ | 6694/12313 [5:00:30<4:14:11, 2.71s/it] {'loss': 0.66, 'grad_norm': 4.315922170242298, 'learning_rate': 2.2685202221007057e-06, 'epoch': 0.54} 54%|█████▍ | 6694/12313 [5:00:30<4:14:11, 2.71s/it] 54%|█████▍ | 6695/12313 [5:00:32<4:16:36, 2.74s/it] {'loss': 0.4566, 'grad_norm': 41.128866529656484, 'learning_rate': 2.2678654329936322e-06, 'epoch': 0.54} 54%|█████▍ | 6695/12313 [5:00:32<4:16:36, 2.74s/it] 54%|█████▍ | 6696/12313 [5:00:35<4:25:22, 2.83s/it] {'loss': 0.5562, 'grad_norm': 4.547047318098447, 'learning_rate': 2.267210659949042e-06, 'epoch': 0.54} 54%|█████▍ | 6696/12313 [5:00:35<4:25:22, 2.83s/it] 54%|█████▍ | 6697/12313 [5:00:38<4:22:55, 2.81s/it] {'loss': 0.4198, 'grad_norm': 5.996426610029105, 'learning_rate': 2.2665559030122424e-06, 'epoch': 0.54} 54%|█████▍ | 6697/12313 [5:00:38<4:22:55, 2.81s/it] 54%|█████▍ | 6698/12313 [5:00:41<4:25:13, 2.83s/it] {'loss': 0.468, 'grad_norm': 4.366420114566318, 'learning_rate': 2.2659011622285383e-06, 'epoch': 0.54} 54%|█████▍ | 6698/12313 [5:00:41<4:25:13, 2.83s/it] 54%|█████▍ | 6699/12313 [5:00:44<4:20:32, 2.78s/it] {'loss': 0.4206, 'grad_norm': 4.742561220461673, 'learning_rate': 2.265246437643236e-06, 'epoch': 0.54} 54%|█████▍ | 6699/12313 [5:00:44<4:20:32, 2.78s/it] 54%|█████▍ | 6700/12313 [5:00:46<4:17:24, 2.75s/it] {'loss': 0.6577, 'grad_norm': 5.981908620139316, 'learning_rate': 2.2645917293016363e-06, 'epoch': 0.54} 54%|█████▍ | 6700/12313 [5:00:46<4:17:24, 2.75s/it] 54%|█████▍ | 6701/12313 [5:00:49<4:21:24, 2.79s/it] {'loss': 0.442, 'grad_norm': 3.9854122807772723, 'learning_rate': 2.2639370372490434e-06, 'epoch': 0.54} 54%|█████▍ | 6701/12313 [5:00:49<4:21:24, 2.79s/it] 54%|█████▍ | 6702/12313 [5:00:52<4:17:52, 2.76s/it] {'loss': 0.6317, 'grad_norm': 4.317869994988899, 'learning_rate': 2.263282361530759e-06, 'epoch': 0.54} 54%|█████▍ | 6702/12313 [5:00:52<4:17:52, 2.76s/it] 54%|█████▍ | 6703/12313 [5:00:55<4:19:41, 2.78s/it] {'loss': 0.4615, 'grad_norm': 3.2344627144715874, 'learning_rate': 2.2626277021920813e-06, 'epoch': 0.54} 54%|█████▍ | 6703/12313 [5:00:55<4:19:41, 2.78s/it] 54%|█████▍ | 6704/12313 [5:00:58<4:16:43, 2.75s/it] {'loss': 0.5471, 'grad_norm': 6.42718836422913, 'learning_rate': 2.2619730592783108e-06, 'epoch': 0.54} 54%|█████▍ | 6704/12313 [5:00:58<4:16:43, 2.75s/it] 54%|█████▍ | 6705/12313 [5:01:00<4:13:07, 2.71s/it] {'loss': 0.4092, 'grad_norm': 4.488283000655936, 'learning_rate': 2.2613184328347453e-06, 'epoch': 0.54} 54%|█████▍ | 6705/12313 [5:01:00<4:13:07, 2.71s/it] 54%|█████▍ | 6706/12313 [5:01:03<4:10:16, 2.68s/it] {'loss': 0.6374, 'grad_norm': 6.305967356057076, 'learning_rate': 2.2606638229066802e-06, 'epoch': 0.54} 54%|█████▍ | 6706/12313 [5:01:03<4:10:16, 2.68s/it] 54%|█████▍ | 6707/12313 [5:01:06<4:13:15, 2.71s/it] {'loss': 0.5745, 'grad_norm': 4.2258432165081565, 'learning_rate': 2.2600092295394125e-06, 'epoch': 0.54} 54%|█████▍ | 6707/12313 [5:01:06<4:13:15, 2.71s/it] 54%|█████▍ | 6708/12313 [5:01:08<4:08:41, 2.66s/it] {'loss': 0.433, 'grad_norm': 4.384272018008662, 'learning_rate': 2.2593546527782362e-06, 'epoch': 0.54} 54%|█████▍ | 6708/12313 [5:01:08<4:08:41, 2.66s/it] 54%|█████▍ | 6709/12313 [5:01:11<4:08:52, 2.66s/it] {'loss': 0.5326, 'grad_norm': 6.576525934865906, 'learning_rate': 2.2587000926684432e-06, 'epoch': 0.54} 54%|█████▍ | 6709/12313 [5:01:11<4:08:52, 2.66s/it] 54%|█████▍ | 6710/12313 [5:01:13<4:05:41, 2.63s/it] {'loss': 0.4881, 'grad_norm': 5.555291917382588, 'learning_rate': 2.258045549255328e-06, 'epoch': 0.54} 54%|█████▍ | 6710/12313 [5:01:13<4:05:41, 2.63s/it] 55%|█████▍ | 6711/12313 [5:01:16<4:03:24, 2.61s/it] {'loss': 0.479, 'grad_norm': 5.546197578764206, 'learning_rate': 2.25739102258418e-06, 'epoch': 0.55} 55%|█████▍ | 6711/12313 [5:01:16<4:03:24, 2.61s/it] 55%|█████▍ | 6712/12313 [5:01:18<4:01:15, 2.58s/it] {'loss': 0.4543, 'grad_norm': 6.1439652570444325, 'learning_rate': 2.256736512700288e-06, 'epoch': 0.55} 55%|█████▍ | 6712/12313 [5:01:18<4:01:15, 2.58s/it] 55%|█████▍ | 6713/12313 [5:01:21<4:03:17, 2.61s/it] {'loss': 0.5768, 'grad_norm': 5.653213757078279, 'learning_rate': 2.2560820196489437e-06, 'epoch': 0.55} 55%|█████▍ | 6713/12313 [5:01:21<4:03:17, 2.61s/it] 55%|█████▍ | 6714/12313 [5:01:24<4:02:54, 2.60s/it] {'loss': 0.4927, 'grad_norm': 4.21691379783433, 'learning_rate': 2.255427543475432e-06, 'epoch': 0.55} 55%|█████▍ | 6714/12313 [5:01:24<4:02:54, 2.60s/it] 55%|█████▍ | 6715/12313 [5:01:26<4:08:04, 2.66s/it] {'loss': 0.4237, 'grad_norm': 6.798777933865642, 'learning_rate': 2.254773084225039e-06, 'epoch': 0.55} 55%|█████▍ | 6715/12313 [5:01:26<4:08:04, 2.66s/it] 55%|█████▍ | 6716/12313 [5:01:29<4:08:50, 2.67s/it] {'loss': 0.4288, 'grad_norm': 4.819909494343674, 'learning_rate': 2.254118641943052e-06, 'epoch': 0.55} 55%|█████▍ | 6716/12313 [5:01:29<4:08:50, 2.67s/it] 55%|█████▍ | 6717/12313 [5:01:32<4:02:36, 2.60s/it] {'loss': 0.482, 'grad_norm': 7.439931633022933, 'learning_rate': 2.253464216674753e-06, 'epoch': 0.55} 55%|█████▍ | 6717/12313 [5:01:32<4:02:36, 2.60s/it] 55%|█████▍ | 6718/12313 [5:01:34<4:04:47, 2.63s/it] {'loss': 0.614, 'grad_norm': 7.792402778020558, 'learning_rate': 2.2528098084654262e-06, 'epoch': 0.55} 55%|█████▍ | 6718/12313 [5:01:34<4:04:47, 2.63s/it] 55%|█████▍ | 6719/12313 [5:01:37<4:08:22, 2.66s/it] {'loss': 0.4562, 'grad_norm': 4.495435217507245, 'learning_rate': 2.2521554173603513e-06, 'epoch': 0.55} 55%|█████▍ | 6719/12313 [5:01:37<4:08:22, 2.66s/it] 55%|█████▍ | 6720/12313 [5:01:40<4:17:09, 2.76s/it] {'loss': 0.5417, 'grad_norm': 4.419124752755961, 'learning_rate': 2.25150104340481e-06, 'epoch': 0.55} 55%|█████▍ | 6720/12313 [5:01:40<4:17:09, 2.76s/it] 55%|█████▍ | 6721/12313 [5:01:43<4:13:03, 2.72s/it] {'loss': 0.4154, 'grad_norm': 6.332131149232858, 'learning_rate': 2.2508466866440824e-06, 'epoch': 0.55} 55%|█████▍ | 6721/12313 [5:01:43<4:13:03, 2.72s/it] 55%|█████▍ | 6722/12313 [5:01:45<4:10:01, 2.68s/it] {'loss': 0.49, 'grad_norm': 6.610876686409083, 'learning_rate': 2.2501923471234444e-06, 'epoch': 0.55} 55%|█████▍ | 6722/12313 [5:01:45<4:10:01, 2.68s/it] 55%|█████▍ | 6723/12313 [5:01:48<4:10:34, 2.69s/it] {'loss': 0.4733, 'grad_norm': 17.017885967526706, 'learning_rate': 2.249538024888174e-06, 'epoch': 0.55} 55%|█████▍ | 6723/12313 [5:01:48<4:10:34, 2.69s/it] 55%|█████▍ | 6724/12313 [5:01:51<4:14:15, 2.73s/it] {'loss': 0.6985, 'grad_norm': 7.28235162312456, 'learning_rate': 2.2488837199835477e-06, 'epoch': 0.55} 55%|█████▍ | 6724/12313 [5:01:51<4:14:15, 2.73s/it] 55%|█████▍ | 6725/12313 [5:01:53<4:14:37, 2.73s/it] {'loss': 0.5424, 'grad_norm': 6.165256861279543, 'learning_rate': 2.2482294324548376e-06, 'epoch': 0.55} 55%|█████▍ | 6725/12313 [5:01:53<4:14:37, 2.73s/it] 55%|█████▍ | 6726/12313 [5:01:57<4:26:40, 2.86s/it] {'loss': 0.6049, 'grad_norm': 7.302972322656833, 'learning_rate': 2.2475751623473193e-06, 'epoch': 0.55} 55%|█████▍ | 6726/12313 [5:01:57<4:26:40, 2.86s/it] 55%|█████▍ | 6727/12313 [5:01:59<4:17:31, 2.77s/it] {'loss': 0.4342, 'grad_norm': 4.71990096240607, 'learning_rate': 2.2469209097062637e-06, 'epoch': 0.55} 55%|█████▍ | 6727/12313 [5:01:59<4:17:31, 2.77s/it] 55%|█████▍ | 6728/12313 [5:02:02<4:17:33, 2.77s/it] {'loss': 0.4572, 'grad_norm': 6.900667478651783, 'learning_rate': 2.246266674576941e-06, 'epoch': 0.55} 55%|█████▍ | 6728/12313 [5:02:02<4:17:33, 2.77s/it] 55%|█████▍ | 6729/12313 [5:02:05<4:14:28, 2.73s/it] {'loss': 0.4723, 'grad_norm': 2.982802060269296, 'learning_rate': 2.245612457004622e-06, 'epoch': 0.55} 55%|█████▍ | 6729/12313 [5:02:05<4:14:28, 2.73s/it] 55%|█████▍ | 6730/12313 [5:02:07<4:09:18, 2.68s/it] {'loss': 0.4191, 'grad_norm': 4.176653744332102, 'learning_rate': 2.244958257034575e-06, 'epoch': 0.55} 55%|█████▍ | 6730/12313 [5:02:07<4:09:18, 2.68s/it] 55%|█████▍ | 6731/12313 [5:02:10<4:03:53, 2.62s/it] {'loss': 0.4973, 'grad_norm': 4.511011865460297, 'learning_rate': 2.244304074712066e-06, 'epoch': 0.55} 55%|█████▍ | 6731/12313 [5:02:10<4:03:53, 2.62s/it] 55%|█████▍ | 6732/12313 [5:02:12<4:05:29, 2.64s/it] {'loss': 0.6125, 'grad_norm': 4.202369107086116, 'learning_rate': 2.243649910082363e-06, 'epoch': 0.55} 55%|█████▍ | 6732/12313 [5:02:12<4:05:29, 2.64s/it] 55%|█████▍ | 6733/12313 [5:02:15<4:06:11, 2.65s/it] {'loss': 0.444, 'grad_norm': 7.339054515036435, 'learning_rate': 2.2429957631907285e-06, 'epoch': 0.55} 55%|█████▍ | 6733/12313 [5:02:15<4:06:11, 2.65s/it] 55%|█████▍ | 6734/12313 [5:02:18<4:06:49, 2.65s/it] {'loss': 0.4205, 'grad_norm': 4.854339006065371, 'learning_rate': 2.2423416340824266e-06, 'epoch': 0.55} 55%|█████▍ | 6734/12313 [5:02:18<4:06:49, 2.65s/it] 55%|█████▍ | 6735/12313 [5:02:20<4:07:40, 2.66s/it] {'loss': 0.5089, 'grad_norm': 5.693035824826316, 'learning_rate': 2.241687522802721e-06, 'epoch': 0.55} 55%|█████▍ | 6735/12313 [5:02:20<4:07:40, 2.66s/it] 55%|█████▍ | 6736/12313 [5:02:23<4:09:24, 2.68s/it] {'loss': 0.43, 'grad_norm': 4.855926081415383, 'learning_rate': 2.2410334293968716e-06, 'epoch': 0.55} 55%|█████▍ | 6736/12313 [5:02:23<4:09:24, 2.68s/it] 55%|█████▍ | 6737/12313 [5:02:26<4:09:57, 2.69s/it] {'loss': 0.4145, 'grad_norm': 4.266917082773724, 'learning_rate': 2.2403793539101387e-06, 'epoch': 0.55} 55%|█████▍ | 6737/12313 [5:02:26<4:09:57, 2.69s/it] 55%|█████▍ | 6738/12313 [5:02:28<4:04:33, 2.63s/it] {'loss': 0.5073, 'grad_norm': 5.411690627496338, 'learning_rate': 2.2397252963877795e-06, 'epoch': 0.55} 55%|█████▍ | 6738/12313 [5:02:28<4:04:33, 2.63s/it] 55%|█████▍ | 6739/12313 [5:02:31<4:09:12, 2.68s/it] {'loss': 0.5423, 'grad_norm': 2.6098814011197096, 'learning_rate': 2.239071256875053e-06, 'epoch': 0.55} 55%|█████▍ | 6739/12313 [5:02:31<4:09:12, 2.68s/it] 55%|█████▍ | 6740/12313 [5:02:34<4:04:07, 2.63s/it] {'loss': 0.4069, 'grad_norm': 5.3581580919616885, 'learning_rate': 2.238417235417214e-06, 'epoch': 0.55} 55%|█████▍ | 6740/12313 [5:02:34<4:04:07, 2.63s/it] 55%|█████▍ | 6741/12313 [5:02:36<4:06:34, 2.66s/it] {'loss': 0.4756, 'grad_norm': 4.544117591382317, 'learning_rate': 2.237763232059518e-06, 'epoch': 0.55} 55%|█████▍ | 6741/12313 [5:02:36<4:06:34, 2.66s/it] 55%|█████▍ | 6742/12313 [5:02:39<4:00:07, 2.59s/it] {'loss': 0.5058, 'grad_norm': 18.0511878541294, 'learning_rate': 2.2371092468472193e-06, 'epoch': 0.55} 55%|█████▍ | 6742/12313 [5:02:39<4:00:07, 2.59s/it] 55%|█████▍ | 6743/12313 [5:02:41<3:55:38, 2.54s/it] {'loss': 0.4712, 'grad_norm': 6.268300319177437, 'learning_rate': 2.236455279825569e-06, 'epoch': 0.55} 55%|█████▍ | 6743/12313 [5:02:41<3:55:38, 2.54s/it] 55%|█████▍ | 6744/12313 [5:02:44<4:01:53, 2.61s/it] {'loss': 0.5031, 'grad_norm': 3.663452778757879, 'learning_rate': 2.2358013310398174e-06, 'epoch': 0.55} 55%|█████▍ | 6744/12313 [5:02:44<4:01:53, 2.61s/it] 55%|█████▍ | 6745/12313 [5:02:47<4:07:20, 2.67s/it] {'loss': 0.4914, 'grad_norm': 4.936451070405641, 'learning_rate': 2.235147400535217e-06, 'epoch': 0.55} 55%|█████▍ | 6745/12313 [5:02:47<4:07:20, 2.67s/it] 55%|█████▍ | 6746/12313 [5:02:49<4:02:21, 2.61s/it] {'loss': 0.5417, 'grad_norm': 5.976409510815185, 'learning_rate': 2.2344934883570143e-06, 'epoch': 0.55} 55%|█████▍ | 6746/12313 [5:02:49<4:02:21, 2.61s/it] 55%|█████▍ | 6747/12313 [5:02:52<4:00:48, 2.60s/it] {'loss': 0.525, 'grad_norm': 7.332839588724573, 'learning_rate': 2.2338395945504557e-06, 'epoch': 0.55} 55%|█████▍ | 6747/12313 [5:02:52<4:00:48, 2.60s/it] 55%|█████▍ | 6748/12313 [5:02:54<4:01:52, 2.61s/it] {'loss': 0.6906, 'grad_norm': 5.026062475739817, 'learning_rate': 2.23318571916079e-06, 'epoch': 0.55} 55%|█████▍ | 6748/12313 [5:02:54<4:01:52, 2.61s/it] 55%|█████▍ | 6749/12313 [5:02:57<4:10:45, 2.70s/it] {'loss': 0.4563, 'grad_norm': 5.070833605866744, 'learning_rate': 2.2325318622332606e-06, 'epoch': 0.55} 55%|█████▍ | 6749/12313 [5:02:57<4:10:45, 2.70s/it] 55%|█████▍ | 6750/12313 [5:03:00<4:14:00, 2.74s/it] {'loss': 0.5338, 'grad_norm': 4.9673547801455955, 'learning_rate': 2.2318780238131095e-06, 'epoch': 0.55} 55%|█████▍ | 6750/12313 [5:03:00<4:14:00, 2.74s/it] 55%|█████▍ | 6751/12313 [5:03:03<4:12:27, 2.72s/it] {'loss': 0.503, 'grad_norm': 3.9729119791581744, 'learning_rate': 2.2312242039455816e-06, 'epoch': 0.55} 55%|█████▍ | 6751/12313 [5:03:03<4:12:27, 2.72s/it] 55%|█████▍ | 6752/12313 [5:03:05<4:05:33, 2.65s/it] {'loss': 0.3857, 'grad_norm': 3.9917626185437878, 'learning_rate': 2.230570402675916e-06, 'epoch': 0.55} 55%|█████▍ | 6752/12313 [5:03:05<4:05:33, 2.65s/it] 55%|█████▍ | 6753/12313 [5:03:08<4:08:33, 2.68s/it] {'loss': 0.4657, 'grad_norm': 4.014300920446785, 'learning_rate': 2.2299166200493526e-06, 'epoch': 0.55} 55%|█████▍ | 6753/12313 [5:03:08<4:08:33, 2.68s/it] 55%|█████▍ | 6754/12313 [5:03:11<4:04:07, 2.63s/it] {'loss': 0.4921, 'grad_norm': 5.148556943279063, 'learning_rate': 2.2292628561111285e-06, 'epoch': 0.55} 55%|█████▍ | 6754/12313 [5:03:11<4:04:07, 2.63s/it] 55%|█████▍ | 6755/12313 [5:03:13<4:08:19, 2.68s/it] {'loss': 0.4815, 'grad_norm': 3.2886345422623715, 'learning_rate': 2.228609110906483e-06, 'epoch': 0.55} 55%|█████▍ | 6755/12313 [5:03:13<4:08:19, 2.68s/it] 55%|█████▍ | 6756/12313 [5:03:16<4:07:38, 2.67s/it] {'loss': 0.5324, 'grad_norm': 9.805600602297103, 'learning_rate': 2.2279553844806506e-06, 'epoch': 0.55} 55%|█████▍ | 6756/12313 [5:03:16<4:07:38, 2.67s/it] 55%|█████▍ | 6757/12313 [5:03:19<4:08:12, 2.68s/it] {'loss': 0.4963, 'grad_norm': 7.9338566902011385, 'learning_rate': 2.2273016768788653e-06, 'epoch': 0.55} 55%|█████▍ | 6757/12313 [5:03:19<4:08:12, 2.68s/it] 55%|█████▍ | 6758/12313 [5:03:22<4:13:47, 2.74s/it] {'loss': 0.4975, 'grad_norm': 4.352389013230028, 'learning_rate': 2.2266479881463614e-06, 'epoch': 0.55} 55%|█████▍ | 6758/12313 [5:03:22<4:13:47, 2.74s/it] 55%|█████▍ | 6759/12313 [5:03:24<4:13:20, 2.74s/it] {'loss': 0.5316, 'grad_norm': 5.035317373817535, 'learning_rate': 2.2259943183283696e-06, 'epoch': 0.55} 55%|█████▍ | 6759/12313 [5:03:24<4:13:20, 2.74s/it] 55%|█████▍ | 6760/12313 [5:03:27<4:18:18, 2.79s/it] {'loss': 0.4758, 'grad_norm': 6.329671082512638, 'learning_rate': 2.2253406674701206e-06, 'epoch': 0.55} 55%|█████▍ | 6760/12313 [5:03:27<4:18:18, 2.79s/it] 55%|█████▍ | 6761/12313 [5:03:30<4:08:00, 2.68s/it] {'loss': 0.3731, 'grad_norm': 4.350812102992751, 'learning_rate': 2.2246870356168447e-06, 'epoch': 0.55} 55%|█████▍ | 6761/12313 [5:03:30<4:08:00, 2.68s/it] 55%|█████▍ | 6762/12313 [5:03:32<4:04:52, 2.65s/it] {'loss': 0.4601, 'grad_norm': 4.411297111261324, 'learning_rate': 2.224033422813768e-06, 'epoch': 0.55} 55%|█████▍ | 6762/12313 [5:03:32<4:04:52, 2.65s/it] 55%|█████▍ | 6763/12313 [5:03:35<4:05:01, 2.65s/it] {'loss': 0.5048, 'grad_norm': 3.3546196364460172, 'learning_rate': 2.2233798291061177e-06, 'epoch': 0.55} 55%|█████▍ | 6763/12313 [5:03:35<4:05:01, 2.65s/it] 55%|█████▍ | 6764/12313 [5:03:38<4:06:38, 2.67s/it] {'loss': 0.5135, 'grad_norm': 5.75969034058185, 'learning_rate': 2.2227262545391204e-06, 'epoch': 0.55} 55%|█████▍ | 6764/12313 [5:03:38<4:06:38, 2.67s/it] 55%|█████▍ | 6765/12313 [5:03:40<4:08:24, 2.69s/it] {'loss': 0.382, 'grad_norm': 4.676293581342606, 'learning_rate': 2.222072699157998e-06, 'epoch': 0.55} 55%|█████▍ | 6765/12313 [5:03:40<4:08:24, 2.69s/it] 55%|█████▍ | 6766/12313 [5:03:43<4:09:25, 2.70s/it] {'loss': 0.4019, 'grad_norm': 4.301771743349808, 'learning_rate': 2.2214191630079733e-06, 'epoch': 0.55} 55%|█████▍ | 6766/12313 [5:03:43<4:09:25, 2.70s/it] 55%|█████▍ | 6767/12313 [5:03:46<4:06:04, 2.66s/it] {'loss': 0.486, 'grad_norm': 8.356786267103574, 'learning_rate': 2.2207656461342696e-06, 'epoch': 0.55} 55%|█████▍ | 6767/12313 [5:03:46<4:06:04, 2.66s/it] 55%|█████▍ | 6768/12313 [5:03:48<4:06:37, 2.67s/it] {'loss': 0.5106, 'grad_norm': 5.187735340758213, 'learning_rate': 2.2201121485821053e-06, 'epoch': 0.55} 55%|█████▍ | 6768/12313 [5:03:48<4:06:37, 2.67s/it] 55%|█████▍ | 6769/12313 [5:03:51<4:14:47, 2.76s/it] {'loss': 0.5786, 'grad_norm': 14.002790250484441, 'learning_rate': 2.2194586703966976e-06, 'epoch': 0.55} 55%|█████▍ | 6769/12313 [5:03:51<4:14:47, 2.76s/it] 55%|█████▍ | 6770/12313 [5:03:54<4:07:50, 2.68s/it] {'loss': 0.6202, 'grad_norm': 4.598645970086092, 'learning_rate': 2.218805211623266e-06, 'epoch': 0.55} 55%|█████▍ | 6770/12313 [5:03:54<4:07:50, 2.68s/it] 55%|█████▍ | 6771/12313 [5:03:57<4:13:11, 2.74s/it] {'loss': 0.3767, 'grad_norm': 7.035714823525913, 'learning_rate': 2.2181517723070263e-06, 'epoch': 0.55} 55%|█████▍ | 6771/12313 [5:03:57<4:13:11, 2.74s/it] 55%|█████▍ | 6772/12313 [5:03:59<4:15:25, 2.77s/it] {'loss': 0.4665, 'grad_norm': 3.789629383739064, 'learning_rate': 2.2174983524931916e-06, 'epoch': 0.55} 55%|█████▍ | 6772/12313 [5:04:00<4:15:25, 2.77s/it] 55%|█████▌ | 6773/12313 [5:04:03<4:22:05, 2.84s/it] {'loss': 0.5535, 'grad_norm': 5.153278702932309, 'learning_rate': 2.216844952226975e-06, 'epoch': 0.55} 55%|█████▌ | 6773/12313 [5:04:03<4:22:05, 2.84s/it] 55%|█████▌ | 6774/12313 [5:04:05<4:17:00, 2.78s/it] {'loss': 0.5416, 'grad_norm': 5.090968399667655, 'learning_rate': 2.2161915715535903e-06, 'epoch': 0.55} 55%|█████▌ | 6774/12313 [5:04:05<4:17:00, 2.78s/it] 55%|█████▌ | 6775/12313 [5:04:08<4:20:23, 2.82s/it] {'loss': 0.528, 'grad_norm': 4.880285899644429, 'learning_rate': 2.2155382105182462e-06, 'epoch': 0.55} 55%|█████▌ | 6775/12313 [5:04:08<4:20:23, 2.82s/it] 55%|█████▌ | 6776/12313 [5:04:11<4:15:29, 2.77s/it] {'loss': 0.4645, 'grad_norm': 7.133831222495926, 'learning_rate': 2.214884869166152e-06, 'epoch': 0.55} 55%|█████▌ | 6776/12313 [5:04:11<4:15:29, 2.77s/it] 55%|█████▌ | 6777/12313 [5:04:13<4:10:01, 2.71s/it] {'loss': 0.5628, 'grad_norm': 5.111553296411102, 'learning_rate': 2.214231547542517e-06, 'epoch': 0.55} 55%|█████▌ | 6777/12313 [5:04:13<4:10:01, 2.71s/it] 55%|█████▌ | 6778/12313 [5:04:16<4:09:38, 2.71s/it] {'loss': 0.4838, 'grad_norm': 4.076454306491572, 'learning_rate': 2.213578245692546e-06, 'epoch': 0.55} 55%|█████▌ | 6778/12313 [5:04:16<4:09:38, 2.71s/it] 55%|█████▌ | 6779/12313 [5:04:19<4:04:41, 2.65s/it] {'loss': 0.563, 'grad_norm': 5.211719936927261, 'learning_rate': 2.2129249636614443e-06, 'epoch': 0.55} 55%|█████▌ | 6779/12313 [5:04:19<4:04:41, 2.65s/it] 55%|█████▌ | 6780/12313 [5:04:21<4:12:29, 2.74s/it] {'loss': 0.4786, 'grad_norm': 12.302787876796701, 'learning_rate': 2.2122717014944167e-06, 'epoch': 0.55} 55%|█████▌ | 6780/12313 [5:04:21<4:12:29, 2.74s/it] 55%|█████▌ | 6781/12313 [5:04:24<4:11:55, 2.73s/it] {'loss': 0.5747, 'grad_norm': 4.330340043687264, 'learning_rate': 2.2116184592366643e-06, 'epoch': 0.55} 55%|█████▌ | 6781/12313 [5:04:24<4:11:55, 2.73s/it] 55%|█████▌ | 6782/12313 [5:04:27<4:04:04, 2.65s/it] {'loss': 0.4252, 'grad_norm': 5.034937065269623, 'learning_rate': 2.2109652369333873e-06, 'epoch': 0.55} 55%|█████▌ | 6782/12313 [5:04:27<4:04:04, 2.65s/it] 55%|█████▌ | 6783/12313 [5:04:29<4:04:54, 2.66s/it] {'loss': 0.4404, 'grad_norm': 7.614248716990825, 'learning_rate': 2.2103120346297864e-06, 'epoch': 0.55} 55%|█████▌ | 6783/12313 [5:04:29<4:04:54, 2.66s/it] 55%|█████▌ | 6784/12313 [5:04:32<4:04:30, 2.65s/it] {'loss': 0.5373, 'grad_norm': 7.1625705175917425, 'learning_rate': 2.2096588523710606e-06, 'epoch': 0.55} 55%|█████▌ | 6784/12313 [5:04:32<4:04:30, 2.65s/it] 55%|█████▌ | 6785/12313 [5:04:34<3:58:06, 2.58s/it] {'loss': 0.4773, 'grad_norm': 4.064726317144616, 'learning_rate': 2.2090056902024045e-06, 'epoch': 0.55} 55%|█████▌ | 6785/12313 [5:04:34<3:58:06, 2.58s/it] 55%|█████▌ | 6786/12313 [5:04:37<4:03:26, 2.64s/it] {'loss': 0.5596, 'grad_norm': 4.1799707253524065, 'learning_rate': 2.208352548169015e-06, 'epoch': 0.55} 55%|█████▌ | 6786/12313 [5:04:37<4:03:26, 2.64s/it] 55%|█████▌ | 6787/12313 [5:04:40<4:03:33, 2.64s/it] {'loss': 0.5455, 'grad_norm': 5.2430508668568425, 'learning_rate': 2.2076994263160863e-06, 'epoch': 0.55} 55%|█████▌ | 6787/12313 [5:04:40<4:03:33, 2.64s/it] 55%|█████▌ | 6788/12313 [5:04:42<4:03:01, 2.64s/it] {'loss': 0.4733, 'grad_norm': 5.514510034389823, 'learning_rate': 2.2070463246888094e-06, 'epoch': 0.55} 55%|█████▌ | 6788/12313 [5:04:42<4:03:01, 2.64s/it] 55%|█████▌ | 6789/12313 [5:04:45<4:01:40, 2.63s/it] {'loss': 0.5448, 'grad_norm': 4.50738218564209, 'learning_rate': 2.206393243332376e-06, 'epoch': 0.55} 55%|█████▌ | 6789/12313 [5:04:45<4:01:40, 2.63s/it] 55%|█████▌ | 6790/12313 [5:04:48<4:03:58, 2.65s/it] {'loss': 0.5013, 'grad_norm': 4.34724451667795, 'learning_rate': 2.2057401822919775e-06, 'epoch': 0.55} 55%|█████▌ | 6790/12313 [5:04:48<4:03:58, 2.65s/it] 55%|█████▌ | 6791/12313 [5:04:50<4:05:15, 2.66s/it] {'loss': 0.5814, 'grad_norm': 5.578815799717539, 'learning_rate': 2.2050871416128005e-06, 'epoch': 0.55} 55%|█████▌ | 6791/12313 [5:04:50<4:05:15, 2.66s/it] 55%|█████▌ | 6792/12313 [5:04:53<4:06:08, 2.67s/it] {'loss': 0.3959, 'grad_norm': 6.701219984709083, 'learning_rate': 2.204434121340032e-06, 'epoch': 0.55} 55%|█████▌ | 6792/12313 [5:04:53<4:06:08, 2.67s/it] 55%|█████▌ | 6793/12313 [5:04:56<4:01:37, 2.63s/it] {'loss': 0.536, 'grad_norm': 3.6134666388120404, 'learning_rate': 2.203781121518859e-06, 'epoch': 0.55} 55%|█████▌ | 6793/12313 [5:04:56<4:01:37, 2.63s/it] 55%|█████▌ | 6794/12313 [5:04:58<3:59:16, 2.60s/it] {'loss': 0.4515, 'grad_norm': 6.117283840596765, 'learning_rate': 2.2031281421944643e-06, 'epoch': 0.55} 55%|█████▌ | 6794/12313 [5:04:58<3:59:16, 2.60s/it] 55%|█████▌ | 6795/12313 [5:05:01<4:01:31, 2.63s/it] {'loss': 0.4622, 'grad_norm': 3.9112049782621003, 'learning_rate': 2.2024751834120302e-06, 'epoch': 0.55} 55%|█████▌ | 6795/12313 [5:05:01<4:01:31, 2.63s/it] 55%|█████▌ | 6796/12313 [5:05:04<4:02:48, 2.64s/it] {'loss': 0.5821, 'grad_norm': 3.8079103420465437, 'learning_rate': 2.20182224521674e-06, 'epoch': 0.55} 55%|█████▌ | 6796/12313 [5:05:04<4:02:48, 2.64s/it] 55%|█████▌ | 6797/12313 [5:05:06<4:05:55, 2.68s/it] {'loss': 0.6276, 'grad_norm': 5.019201884297034, 'learning_rate': 2.2011693276537722e-06, 'epoch': 0.55} 55%|█████▌ | 6797/12313 [5:05:06<4:05:55, 2.68s/it] 55%|█████▌ | 6798/12313 [5:05:09<4:02:44, 2.64s/it] {'loss': 0.4406, 'grad_norm': 5.4407005403339825, 'learning_rate': 2.2005164307683047e-06, 'epoch': 0.55} 55%|█████▌ | 6798/12313 [5:05:09<4:02:44, 2.64s/it] 55%|█████▌ | 6799/12313 [5:05:11<4:02:49, 2.64s/it] {'loss': 0.5167, 'grad_norm': 4.301332089064263, 'learning_rate': 2.199863554605515e-06, 'epoch': 0.55} 55%|█████▌ | 6799/12313 [5:05:12<4:02:49, 2.64s/it] 55%|█████▌ | 6800/12313 [5:05:14<4:01:46, 2.63s/it] {'loss': 0.5069, 'grad_norm': 7.493043226544131, 'learning_rate': 2.19921069921058e-06, 'epoch': 0.55} 55%|█████▌ | 6800/12313 [5:05:14<4:01:46, 2.63s/it] 55%|█████▌ | 6801/12313 [5:05:17<4:10:03, 2.72s/it] {'loss': 0.6306, 'grad_norm': 3.8437035341706967, 'learning_rate': 2.1985578646286717e-06, 'epoch': 0.55} 55%|█████▌ | 6801/12313 [5:05:17<4:10:03, 2.72s/it] 55%|█████▌ | 6802/12313 [5:05:20<4:26:47, 2.90s/it] {'loss': 0.4094, 'grad_norm': 3.805644122254566, 'learning_rate': 2.197905050904964e-06, 'epoch': 0.55} 55%|█████▌ | 6802/12313 [5:05:20<4:26:47, 2.90s/it] 55%|█████▌ | 6803/12313 [5:05:23<4:22:05, 2.85s/it] {'loss': 0.4034, 'grad_norm': 6.899378255822541, 'learning_rate': 2.197252258084629e-06, 'epoch': 0.55} 55%|█████▌ | 6803/12313 [5:05:23<4:22:05, 2.85s/it] 55%|█████▌ | 6804/12313 [5:05:26<4:15:35, 2.78s/it] {'loss': 0.5314, 'grad_norm': 7.710934181517784, 'learning_rate': 2.196599486212834e-06, 'epoch': 0.55} 55%|█████▌ | 6804/12313 [5:05:26<4:15:35, 2.78s/it] 55%|█████▌ | 6805/12313 [5:05:28<4:08:15, 2.70s/it] {'loss': 0.4281, 'grad_norm': 3.3575643225535856, 'learning_rate': 2.1959467353347494e-06, 'epoch': 0.55} 55%|█████▌ | 6805/12313 [5:05:28<4:08:15, 2.70s/it] 55%|█████▌ | 6806/12313 [5:05:31<4:20:44, 2.84s/it] {'loss': 0.5778, 'grad_norm': 3.714838991436428, 'learning_rate': 2.195294005495542e-06, 'epoch': 0.55} 55%|█████▌ | 6806/12313 [5:05:31<4:20:44, 2.84s/it] 55%|█████▌ | 6807/12313 [5:05:34<4:14:17, 2.77s/it] {'loss': 0.5032, 'grad_norm': 6.171375505356923, 'learning_rate': 2.1946412967403763e-06, 'epoch': 0.55} 55%|█████▌ | 6807/12313 [5:05:34<4:14:17, 2.77s/it] 55%|█████▌ | 6808/12313 [5:05:37<4:16:50, 2.80s/it] {'loss': 0.4943, 'grad_norm': 5.843750427655172, 'learning_rate': 2.1939886091144165e-06, 'epoch': 0.55} 55%|█████▌ | 6808/12313 [5:05:37<4:16:50, 2.80s/it] 55%|█████▌ | 6809/12313 [5:05:40<4:28:34, 2.93s/it] {'loss': 0.4707, 'grad_norm': 3.222741223173242, 'learning_rate': 2.193335942662826e-06, 'epoch': 0.55} 55%|█████▌ | 6809/12313 [5:05:40<4:28:34, 2.93s/it] 55%|█████▌ | 6810/12313 [5:05:43<4:20:08, 2.84s/it] {'loss': 0.4572, 'grad_norm': 3.916188730239071, 'learning_rate': 2.192683297430766e-06, 'epoch': 0.55} 55%|█████▌ | 6810/12313 [5:05:43<4:20:08, 2.84s/it] 55%|█████▌ | 6811/12313 [5:05:46<4:19:24, 2.83s/it] {'loss': 0.4173, 'grad_norm': 4.698273133823479, 'learning_rate': 2.1920306734633932e-06, 'epoch': 0.55} 55%|█████▌ | 6811/12313 [5:05:46<4:19:24, 2.83s/it] 55%|█████▌ | 6812/12313 [5:05:48<4:13:14, 2.76s/it] {'loss': 0.5129, 'grad_norm': 5.974971528842556, 'learning_rate': 2.1913780708058694e-06, 'epoch': 0.55} 55%|█████▌ | 6812/12313 [5:05:48<4:13:14, 2.76s/it] 55%|█████▌ | 6813/12313 [5:05:51<4:13:20, 2.76s/it] {'loss': 0.4309, 'grad_norm': 4.899970058534168, 'learning_rate': 2.19072548950335e-06, 'epoch': 0.55} 55%|█████▌ | 6813/12313 [5:05:51<4:13:20, 2.76s/it] 55%|█████▌ | 6814/12313 [5:05:54<4:10:01, 2.73s/it] {'loss': 0.6603, 'grad_norm': 4.224538543746903, 'learning_rate': 2.190072929600989e-06, 'epoch': 0.55} 55%|█████▌ | 6814/12313 [5:05:54<4:10:01, 2.73s/it] 55%|█████▌ | 6815/12313 [5:05:56<4:06:08, 2.69s/it] {'loss': 0.4734, 'grad_norm': 9.421465257051333, 'learning_rate': 2.189420391143941e-06, 'epoch': 0.55} 55%|█████▌ | 6815/12313 [5:05:56<4:06:08, 2.69s/it] 55%|█████▌ | 6816/12313 [5:05:59<4:05:47, 2.68s/it] {'loss': 0.5195, 'grad_norm': 6.665428434932173, 'learning_rate': 2.1887678741773592e-06, 'epoch': 0.55} 55%|█████▌ | 6816/12313 [5:05:59<4:05:47, 2.68s/it] 55%|█████▌ | 6817/12313 [5:06:02<4:12:37, 2.76s/it] {'loss': 0.4763, 'grad_norm': 2.89387484029611, 'learning_rate': 2.188115378746392e-06, 'epoch': 0.55} 55%|█████▌ | 6817/12313 [5:06:02<4:12:37, 2.76s/it] 55%|█████▌ | 6818/12313 [5:06:04<4:06:56, 2.70s/it] {'loss': 0.5321, 'grad_norm': 6.823773735416532, 'learning_rate': 2.1874629048961904e-06, 'epoch': 0.55} 55%|█████▌ | 6818/12313 [5:06:04<4:06:56, 2.70s/it] 55%|█████▌ | 6819/12313 [5:06:07<4:14:54, 2.78s/it] {'loss': 0.5816, 'grad_norm': 4.521778809799006, 'learning_rate': 2.1868104526719023e-06, 'epoch': 0.55} 55%|█████▌ | 6819/12313 [5:06:07<4:14:54, 2.78s/it] 55%|█████▌ | 6820/12313 [5:06:10<4:09:25, 2.72s/it] {'loss': 0.4168, 'grad_norm': 3.3396433922902906, 'learning_rate': 2.1861580221186726e-06, 'epoch': 0.55} 55%|█████▌ | 6820/12313 [5:06:10<4:09:25, 2.72s/it] 55%|█████▌ | 6821/12313 [5:06:12<4:06:14, 2.69s/it] {'loss': 0.4614, 'grad_norm': 3.3374583977198036, 'learning_rate': 2.185505613281647e-06, 'epoch': 0.55} 55%|█████▌ | 6821/12313 [5:06:12<4:06:14, 2.69s/it] 55%|█████▌ | 6822/12313 [5:06:15<4:04:25, 2.67s/it] {'loss': 0.5797, 'grad_norm': 6.068612637720992, 'learning_rate': 2.1848532262059696e-06, 'epoch': 0.55} 55%|█████▌ | 6822/12313 [5:06:15<4:04:25, 2.67s/it] 55%|█████▌ | 6823/12313 [5:06:18<4:07:43, 2.71s/it] {'loss': 0.6781, 'grad_norm': 5.0334311376181144, 'learning_rate': 2.1842008609367794e-06, 'epoch': 0.55} 55%|█████▌ | 6823/12313 [5:06:18<4:07:43, 2.71s/it] 55%|█████▌ | 6824/12313 [5:06:21<4:06:35, 2.70s/it] {'loss': 0.4113, 'grad_norm': 4.81150612829465, 'learning_rate': 2.183548517519219e-06, 'epoch': 0.55} 55%|█████▌ | 6824/12313 [5:06:21<4:06:35, 2.70s/it] 55%|█████▌ | 6825/12313 [5:06:23<4:10:31, 2.74s/it] {'loss': 0.502, 'grad_norm': 7.137103277777565, 'learning_rate': 2.1828961959984267e-06, 'epoch': 0.55} 55%|█████▌ | 6825/12313 [5:06:23<4:10:31, 2.74s/it] 55%|█████▌ | 6826/12313 [5:06:26<4:00:18, 2.63s/it] {'loss': 0.3781, 'grad_norm': 4.93103788302294, 'learning_rate': 2.18224389641954e-06, 'epoch': 0.55} 55%|█████▌ | 6826/12313 [5:06:26<4:00:18, 2.63s/it] 55%|█████▌ | 6827/12313 [5:06:29<4:06:38, 2.70s/it] {'loss': 0.5585, 'grad_norm': 4.17706299919582, 'learning_rate': 2.1815916188276925e-06, 'epoch': 0.55} 55%|█████▌ | 6827/12313 [5:06:29<4:06:38, 2.70s/it] 55%|█████▌ | 6828/12313 [5:06:31<4:02:38, 2.65s/it] {'loss': 0.5219, 'grad_norm': 6.194948711383427, 'learning_rate': 2.18093936326802e-06, 'epoch': 0.55} 55%|█████▌ | 6828/12313 [5:06:31<4:02:38, 2.65s/it] 55%|█████▌ | 6829/12313 [5:06:34<4:01:00, 2.64s/it] {'loss': 0.4665, 'grad_norm': 7.761708704297234, 'learning_rate': 2.180287129785656e-06, 'epoch': 0.55} 55%|█████▌ | 6829/12313 [5:06:34<4:01:00, 2.64s/it] 55%|█████▌ | 6830/12313 [5:06:37<4:05:27, 2.69s/it] {'loss': 0.4973, 'grad_norm': 5.591231620591249, 'learning_rate': 2.1796349184257294e-06, 'epoch': 0.55} 55%|█████▌ | 6830/12313 [5:06:37<4:05:27, 2.69s/it] 55%|█████▌ | 6831/12313 [5:06:39<4:06:49, 2.70s/it] {'loss': 0.3763, 'grad_norm': 3.9216643074294595, 'learning_rate': 2.1789827292333717e-06, 'epoch': 0.55} 55%|█████▌ | 6831/12313 [5:06:39<4:06:49, 2.70s/it] 55%|█████▌ | 6832/12313 [5:06:42<4:08:18, 2.72s/it] {'loss': 0.522, 'grad_norm': 4.444201370705435, 'learning_rate': 2.1783305622537106e-06, 'epoch': 0.55} 55%|█████▌ | 6832/12313 [5:06:42<4:08:18, 2.72s/it] 55%|█████▌ | 6833/12313 [5:06:45<4:06:44, 2.70s/it] {'loss': 0.6173, 'grad_norm': 3.5234577542993972, 'learning_rate': 2.1776784175318705e-06, 'epoch': 0.55} 55%|█████▌ | 6833/12313 [5:06:45<4:06:44, 2.70s/it] 56%|█████▌ | 6834/12313 [5:06:47<4:05:32, 2.69s/it] {'loss': 0.4304, 'grad_norm': 5.522357961927791, 'learning_rate': 2.1770262951129792e-06, 'epoch': 0.56} 56%|█████▌ | 6834/12313 [5:06:47<4:05:32, 2.69s/it] 56%|█████▌ | 6835/12313 [5:06:50<4:03:02, 2.66s/it] {'loss': 0.5176, 'grad_norm': 2.753675801015695, 'learning_rate': 2.1763741950421595e-06, 'epoch': 0.56} 56%|█████▌ | 6835/12313 [5:06:50<4:03:02, 2.66s/it] 56%|█████▌ | 6836/12313 [5:06:53<4:03:49, 2.67s/it] {'loss': 0.6027, 'grad_norm': 5.5643466943030715, 'learning_rate': 2.175722117364531e-06, 'epoch': 0.56} 56%|█████▌ | 6836/12313 [5:06:53<4:03:49, 2.67s/it] 56%|█████▌ | 6837/12313 [5:06:56<4:09:08, 2.73s/it] {'loss': 0.472, 'grad_norm': 2.6499029128692957, 'learning_rate': 2.175070062125217e-06, 'epoch': 0.56} 56%|█████▌ | 6837/12313 [5:06:56<4:09:08, 2.73s/it] 56%|█████▌ | 6838/12313 [5:06:58<4:07:57, 2.72s/it] {'loss': 0.5573, 'grad_norm': 11.241112599357884, 'learning_rate': 2.1744180293693355e-06, 'epoch': 0.56} 56%|█████▌ | 6838/12313 [5:06:58<4:07:57, 2.72s/it] 56%|█████▌ | 6839/12313 [5:07:01<4:05:26, 2.69s/it] {'loss': 0.4747, 'grad_norm': 5.094638009416633, 'learning_rate': 2.173766019142002e-06, 'epoch': 0.56} 56%|█████▌ | 6839/12313 [5:07:01<4:05:26, 2.69s/it] 56%|█████▌ | 6840/12313 [5:07:03<3:58:43, 2.62s/it] {'loss': 0.5245, 'grad_norm': 6.191994268871611, 'learning_rate': 2.1731140314883346e-06, 'epoch': 0.56} 56%|█████▌ | 6840/12313 [5:07:03<3:58:43, 2.62s/it] 56%|█████▌ | 6841/12313 [5:07:06<3:58:37, 2.62s/it] {'loss': 0.508, 'grad_norm': 4.2743419408023495, 'learning_rate': 2.1724620664534453e-06, 'epoch': 0.56} 56%|█████▌ | 6841/12313 [5:07:06<3:58:37, 2.62s/it] 56%|█████▌ | 6842/12313 [5:07:08<3:53:12, 2.56s/it] {'loss': 0.6018, 'grad_norm': 12.369287470212933, 'learning_rate': 2.1718101240824485e-06, 'epoch': 0.56} 56%|█████▌ | 6842/12313 [5:07:08<3:53:12, 2.56s/it] 56%|█████▌ | 6843/12313 [5:07:11<3:53:28, 2.56s/it] {'loss': 0.5897, 'grad_norm': 6.34277452834017, 'learning_rate': 2.171158204420453e-06, 'epoch': 0.56} 56%|█████▌ | 6843/12313 [5:07:11<3:53:28, 2.56s/it] 56%|█████▌ | 6844/12313 [5:07:14<3:55:09, 2.58s/it] {'loss': 0.3811, 'grad_norm': 3.6842271055521763, 'learning_rate': 2.17050630751257e-06, 'epoch': 0.56} 56%|█████▌ | 6844/12313 [5:07:14<3:55:09, 2.58s/it] 56%|█████▌ | 6845/12313 [5:07:16<4:00:30, 2.64s/it] {'loss': 0.7328, 'grad_norm': 4.1774312445120545, 'learning_rate': 2.169854433403907e-06, 'epoch': 0.56} 56%|█████▌ | 6845/12313 [5:07:16<4:00:30, 2.64s/it] 56%|█████▌ | 6846/12313 [5:07:19<3:59:18, 2.63s/it] {'loss': 0.5899, 'grad_norm': 4.702814140349739, 'learning_rate': 2.169202582139569e-06, 'epoch': 0.56} 56%|█████▌ | 6846/12313 [5:07:19<3:59:18, 2.63s/it] 56%|█████▌ | 6847/12313 [5:07:21<3:55:12, 2.58s/it] {'loss': 0.4751, 'grad_norm': 5.095589215614225, 'learning_rate': 2.1685507537646622e-06, 'epoch': 0.56} 56%|█████▌ | 6847/12313 [5:07:21<3:55:12, 2.58s/it] 56%|█████▌ | 6848/12313 [5:07:24<3:54:03, 2.57s/it] {'loss': 0.4403, 'grad_norm': 5.703171345481269, 'learning_rate': 2.1678989483242896e-06, 'epoch': 0.56} 56%|█████▌ | 6848/12313 [5:07:24<3:54:03, 2.57s/it] 56%|█████▌ | 6849/12313 [5:07:26<3:48:51, 2.51s/it] {'loss': 0.5135, 'grad_norm': 4.75054750690896, 'learning_rate': 2.1672471658635506e-06, 'epoch': 0.56} 56%|█████▌ | 6849/12313 [5:07:26<3:48:51, 2.51s/it] 56%|█████▌ | 6850/12313 [5:07:29<3:58:19, 2.62s/it] {'loss': 0.6971, 'grad_norm': 3.3902339048047305, 'learning_rate': 2.166595406427548e-06, 'epoch': 0.56} 56%|█████▌ | 6850/12313 [5:07:29<3:58:19, 2.62s/it] 56%|█████▌ | 6851/12313 [5:07:32<3:55:52, 2.59s/it] {'loss': 0.4691, 'grad_norm': 5.852734969187135, 'learning_rate': 2.1659436700613787e-06, 'epoch': 0.56} 56%|█████▌ | 6851/12313 [5:07:32<3:55:52, 2.59s/it] 56%|█████▌ | 6852/12313 [5:07:34<3:58:00, 2.62s/it] {'loss': 0.5747, 'grad_norm': 3.727932064595546, 'learning_rate': 2.1652919568101386e-06, 'epoch': 0.56} 56%|█████▌ | 6852/12313 [5:07:34<3:58:00, 2.62s/it] 56%|█████▌ | 6853/12313 [5:07:37<4:02:03, 2.66s/it] {'loss': 0.4892, 'grad_norm': 5.153139343207781, 'learning_rate': 2.1646402667189245e-06, 'epoch': 0.56} 56%|█████▌ | 6853/12313 [5:07:37<4:02:03, 2.66s/it] 56%|█████▌ | 6854/12313 [5:07:40<4:02:58, 2.67s/it] {'loss': 0.4627, 'grad_norm': 3.8440190223031125, 'learning_rate': 2.1639885998328293e-06, 'epoch': 0.56} 56%|█████▌ | 6854/12313 [5:07:40<4:02:58, 2.67s/it] 56%|█████▌ | 6855/12313 [5:07:43<4:03:14, 2.67s/it] {'loss': 0.4342, 'grad_norm': 8.606115530142445, 'learning_rate': 2.1633369561969435e-06, 'epoch': 0.56} 56%|█████▌ | 6855/12313 [5:07:43<4:03:14, 2.67s/it] 56%|█████▌ | 6856/12313 [5:07:45<4:07:05, 2.72s/it] {'loss': 0.5365, 'grad_norm': 3.6671273924850407, 'learning_rate': 2.1626853358563595e-06, 'epoch': 0.56} 56%|█████▌ | 6856/12313 [5:07:45<4:07:05, 2.72s/it] 56%|█████▌ | 6857/12313 [5:07:48<4:00:48, 2.65s/it] {'loss': 0.491, 'grad_norm': 3.835808189282639, 'learning_rate': 2.162033738856165e-06, 'epoch': 0.56} 56%|█████▌ | 6857/12313 [5:07:48<4:00:48, 2.65s/it] 56%|█████▌ | 6858/12313 [5:07:51<4:03:53, 2.68s/it] {'loss': 0.4935, 'grad_norm': 6.132790993906602, 'learning_rate': 2.161382165241446e-06, 'epoch': 0.56} 56%|█████▌ | 6858/12313 [5:07:51<4:03:53, 2.68s/it] 56%|█████▌ | 6859/12313 [5:07:53<4:01:24, 2.66s/it] {'loss': 0.5225, 'grad_norm': 4.4718395237709485, 'learning_rate': 2.1607306150572905e-06, 'epoch': 0.56} 56%|█████▌ | 6859/12313 [5:07:53<4:01:24, 2.66s/it] 56%|█████▌ | 6860/12313 [5:07:56<3:56:34, 2.60s/it] {'loss': 0.4914, 'grad_norm': 5.4291132456188445, 'learning_rate': 2.1600790883487805e-06, 'epoch': 0.56} 56%|█████▌ | 6860/12313 [5:07:56<3:56:34, 2.60s/it] 56%|█████▌ | 6861/12313 [5:07:59<4:04:10, 2.69s/it] {'loss': 0.6867, 'grad_norm': 3.396009857568952, 'learning_rate': 2.159427585160999e-06, 'epoch': 0.56} 56%|█████▌ | 6861/12313 [5:07:59<4:04:10, 2.69s/it] 56%|█████▌ | 6862/12313 [5:08:01<3:58:12, 2.62s/it] {'loss': 0.3675, 'grad_norm': 6.2945750984242945, 'learning_rate': 2.1587761055390247e-06, 'epoch': 0.56} 56%|█████▌ | 6862/12313 [5:08:01<3:58:12, 2.62s/it] 56%|█████▌ | 6863/12313 [5:08:04<3:59:59, 2.64s/it] {'loss': 0.4676, 'grad_norm': 10.344490630116207, 'learning_rate': 2.1581246495279388e-06, 'epoch': 0.56} 56%|█████▌ | 6863/12313 [5:08:04<3:59:59, 2.64s/it] 56%|█████▌ | 6864/12313 [5:08:07<4:05:00, 2.70s/it] {'loss': 0.5014, 'grad_norm': 4.03264581660929, 'learning_rate': 2.1574732171728187e-06, 'epoch': 0.56} 56%|█████▌ | 6864/12313 [5:08:07<4:05:00, 2.70s/it] 56%|█████▌ | 6865/12313 [5:08:09<3:58:07, 2.62s/it] {'loss': 0.5567, 'grad_norm': 5.595299416089842, 'learning_rate': 2.1568218085187375e-06, 'epoch': 0.56} 56%|█████▌ | 6865/12313 [5:08:09<3:58:07, 2.62s/it] 56%|█████▌ | 6866/12313 [5:08:12<3:58:55, 2.63s/it] {'loss': 0.4441, 'grad_norm': 4.615982346308546, 'learning_rate': 2.1561704236107715e-06, 'epoch': 0.56} 56%|█████▌ | 6866/12313 [5:08:12<3:58:55, 2.63s/it] 56%|█████▌ | 6867/12313 [5:08:15<4:05:34, 2.71s/it] {'loss': 0.4734, 'grad_norm': 3.371649845113746, 'learning_rate': 2.1555190624939933e-06, 'epoch': 0.56} 56%|█████▌ | 6867/12313 [5:08:15<4:05:34, 2.71s/it] 56%|█████▌ | 6868/12313 [5:08:17<4:05:21, 2.70s/it] {'loss': 0.5791, 'grad_norm': 4.993213905650965, 'learning_rate': 2.154867725213472e-06, 'epoch': 0.56} 56%|█████▌ | 6868/12313 [5:08:17<4:05:21, 2.70s/it] 56%|█████▌ | 6869/12313 [5:08:20<4:00:09, 2.65s/it] {'loss': 0.6413, 'grad_norm': 5.433984223606404, 'learning_rate': 2.154216411814278e-06, 'epoch': 0.56} 56%|█████▌ | 6869/12313 [5:08:20<4:00:09, 2.65s/it] 56%|█████▌ | 6870/12313 [5:08:22<4:01:38, 2.66s/it] {'loss': 0.4065, 'grad_norm': 4.1089167750307505, 'learning_rate': 2.1535651223414783e-06, 'epoch': 0.56} 56%|█████▌ | 6870/12313 [5:08:22<4:01:38, 2.66s/it] 56%|█████▌ | 6871/12313 [5:08:25<3:58:21, 2.63s/it] {'loss': 0.4132, 'grad_norm': 6.410732794666789, 'learning_rate': 2.1529138568401377e-06, 'epoch': 0.56} 56%|█████▌ | 6871/12313 [5:08:25<3:58:21, 2.63s/it] 56%|█████▌ | 6872/12313 [5:08:28<4:08:17, 2.74s/it] {'loss': 0.5569, 'grad_norm': 2.952240951187637, 'learning_rate': 2.1522626153553224e-06, 'epoch': 0.56} 56%|█████▌ | 6872/12313 [5:08:28<4:08:17, 2.74s/it] 56%|█████▌ | 6873/12313 [5:08:31<4:05:09, 2.70s/it] {'loss': 0.615, 'grad_norm': 5.778680431168145, 'learning_rate': 2.1516113979320937e-06, 'epoch': 0.56} 56%|█████▌ | 6873/12313 [5:08:31<4:05:09, 2.70s/it] 56%|█████▌ | 6874/12313 [5:08:33<4:03:45, 2.69s/it] {'loss': 0.4005, 'grad_norm': 5.40543144487348, 'learning_rate': 2.150960204615511e-06, 'epoch': 0.56} 56%|█████▌ | 6874/12313 [5:08:33<4:03:45, 2.69s/it] 56%|█████▌ | 6875/12313 [5:08:36<4:01:27, 2.66s/it] {'loss': 0.4878, 'grad_norm': 5.360050799209055, 'learning_rate': 2.1503090354506366e-06, 'epoch': 0.56} 56%|█████▌ | 6875/12313 [5:08:36<4:01:27, 2.66s/it] 56%|█████▌ | 6876/12313 [5:08:38<3:59:06, 2.64s/it] {'loss': 0.606, 'grad_norm': 4.768908334280203, 'learning_rate': 2.1496578904825253e-06, 'epoch': 0.56} 56%|█████▌ | 6876/12313 [5:08:38<3:59:06, 2.64s/it] 56%|█████▌ | 6877/12313 [5:08:41<4:02:25, 2.68s/it] {'loss': 0.3823, 'grad_norm': 4.8420372746526255, 'learning_rate': 2.149006769756234e-06, 'epoch': 0.56} 56%|█████▌ | 6877/12313 [5:08:41<4:02:25, 2.68s/it] 56%|█████▌ | 6878/12313 [5:08:44<3:57:42, 2.62s/it] {'loss': 0.4933, 'grad_norm': 7.2191793614780835, 'learning_rate': 2.148355673316817e-06, 'epoch': 0.56} 56%|█████▌ | 6878/12313 [5:08:44<3:57:42, 2.62s/it] 56%|█████▌ | 6879/12313 [5:08:47<4:03:56, 2.69s/it] {'loss': 0.4284, 'grad_norm': 5.635609934888895, 'learning_rate': 2.1477046012093263e-06, 'epoch': 0.56} 56%|█████▌ | 6879/12313 [5:08:47<4:03:56, 2.69s/it] 56%|█████▌ | 6880/12313 [5:08:49<4:01:18, 2.66s/it] {'loss': 0.4215, 'grad_norm': 4.860255565023494, 'learning_rate': 2.147053553478813e-06, 'epoch': 0.56} 56%|█████▌ | 6880/12313 [5:08:49<4:01:18, 2.66s/it] 56%|█████▌ | 6881/12313 [5:08:52<4:06:07, 2.72s/it] {'loss': 0.3612, 'grad_norm': 6.326189480768318, 'learning_rate': 2.1464025301703243e-06, 'epoch': 0.56} 56%|█████▌ | 6881/12313 [5:08:52<4:06:07, 2.72s/it] 56%|█████▌ | 6882/12313 [5:08:55<4:07:01, 2.73s/it] {'loss': 0.638, 'grad_norm': 10.398795327314401, 'learning_rate': 2.145751531328911e-06, 'epoch': 0.56} 56%|█████▌ | 6882/12313 [5:08:55<4:07:01, 2.73s/it] 56%|█████▌ | 6883/12313 [5:08:57<4:06:31, 2.72s/it] {'loss': 0.4752, 'grad_norm': 5.9840259751672455, 'learning_rate': 2.1451005569996157e-06, 'epoch': 0.56} 56%|█████▌ | 6883/12313 [5:08:57<4:06:31, 2.72s/it] 56%|█████▌ | 6884/12313 [5:09:00<4:07:52, 2.74s/it] {'loss': 0.4235, 'grad_norm': 6.729449032368889, 'learning_rate': 2.144449607227483e-06, 'epoch': 0.56} 56%|█████▌ | 6884/12313 [5:09:00<4:07:52, 2.74s/it] 56%|█████▌ | 6885/12313 [5:09:03<4:09:14, 2.76s/it] {'loss': 0.565, 'grad_norm': 4.99524830446243, 'learning_rate': 2.143798682057558e-06, 'epoch': 0.56} 56%|█████▌ | 6885/12313 [5:09:03<4:09:14, 2.76s/it] 56%|█████▌ | 6886/12313 [5:09:06<4:12:37, 2.79s/it] {'loss': 0.644, 'grad_norm': 11.826585338715425, 'learning_rate': 2.1431477815348775e-06, 'epoch': 0.56} 56%|█████▌ | 6886/12313 [5:09:06<4:12:37, 2.79s/it] 56%|█████▌ | 6887/12313 [5:09:09<4:09:59, 2.76s/it] {'loss': 0.6231, 'grad_norm': 4.772181141872521, 'learning_rate': 2.1424969057044815e-06, 'epoch': 0.56} 56%|█████▌ | 6887/12313 [5:09:09<4:09:59, 2.76s/it] 56%|█████▌ | 6888/12313 [5:09:11<4:08:54, 2.75s/it] {'loss': 0.4135, 'grad_norm': 5.569287892562355, 'learning_rate': 2.1418460546114087e-06, 'epoch': 0.56} 56%|█████▌ | 6888/12313 [5:09:11<4:08:54, 2.75s/it] 56%|█████▌ | 6889/12313 [5:09:14<4:16:51, 2.84s/it] {'loss': 0.4882, 'grad_norm': 3.637243154196817, 'learning_rate': 2.141195228300693e-06, 'epoch': 0.56} 56%|█████▌ | 6889/12313 [5:09:14<4:16:51, 2.84s/it] 56%|█████▌ | 6890/12313 [5:09:17<4:10:55, 2.78s/it] {'loss': 0.4451, 'grad_norm': 4.970081667441746, 'learning_rate': 2.140544426817368e-06, 'epoch': 0.56} 56%|█████▌ | 6890/12313 [5:09:17<4:10:55, 2.78s/it] 56%|█████▌ | 6891/12313 [5:09:20<4:10:16, 2.77s/it] {'loss': 0.4589, 'grad_norm': 6.0593249594190715, 'learning_rate': 2.139893650206467e-06, 'epoch': 0.56} 56%|█████▌ | 6891/12313 [5:09:20<4:10:16, 2.77s/it] 56%|█████▌ | 6892/12313 [5:09:23<4:09:37, 2.76s/it] {'loss': 0.4929, 'grad_norm': 4.272966835485669, 'learning_rate': 2.1392428985130192e-06, 'epoch': 0.56} 56%|█████▌ | 6892/12313 [5:09:23<4:09:37, 2.76s/it] 56%|█████▌ | 6893/12313 [5:09:25<4:05:36, 2.72s/it] {'loss': 0.6183, 'grad_norm': 4.935701907357544, 'learning_rate': 2.138592171782053e-06, 'epoch': 0.56} 56%|█████▌ | 6893/12313 [5:09:25<4:05:36, 2.72s/it] 56%|█████▌ | 6894/12313 [5:09:28<4:01:34, 2.67s/it] {'loss': 0.5274, 'grad_norm': 4.965836504024703, 'learning_rate': 2.137941470058597e-06, 'epoch': 0.56} 56%|█████▌ | 6894/12313 [5:09:28<4:01:34, 2.67s/it] 56%|█████▌ | 6895/12313 [5:09:30<4:01:05, 2.67s/it] {'loss': 0.6235, 'grad_norm': 3.8763705524472245, 'learning_rate': 2.1372907933876745e-06, 'epoch': 0.56} 56%|█████▌ | 6895/12313 [5:09:30<4:01:05, 2.67s/it] 56%|█████▌ | 6896/12313 [5:09:33<4:07:59, 2.75s/it] {'loss': 0.4554, 'grad_norm': 3.7922687516844387, 'learning_rate': 2.13664014181431e-06, 'epoch': 0.56} 56%|█████▌ | 6896/12313 [5:09:33<4:07:59, 2.75s/it] 56%|█████▌ | 6897/12313 [5:09:36<4:06:10, 2.73s/it] {'loss': 0.7668, 'grad_norm': 5.143643713764753, 'learning_rate': 2.1359895153835235e-06, 'epoch': 0.56} 56%|█████▌ | 6897/12313 [5:09:36<4:06:10, 2.73s/it] 56%|█████▌ | 6898/12313 [5:09:39<4:02:26, 2.69s/it] {'loss': 0.5105, 'grad_norm': 3.9929186611253686, 'learning_rate': 2.1353389141403373e-06, 'epoch': 0.56} 56%|█████▌ | 6898/12313 [5:09:39<4:02:26, 2.69s/it] 56%|█████▌ | 6899/12313 [5:09:41<3:58:35, 2.64s/it] {'loss': 0.3908, 'grad_norm': 7.4139111080640605, 'learning_rate': 2.134688338129768e-06, 'epoch': 0.56} 56%|█████▌ | 6899/12313 [5:09:41<3:58:35, 2.64s/it] 56%|█████▌ | 6900/12313 [5:09:44<3:53:58, 2.59s/it] {'loss': 0.3907, 'grad_norm': 4.298081345320913, 'learning_rate': 2.1340377873968313e-06, 'epoch': 0.56} 56%|█████▌ | 6900/12313 [5:09:44<3:53:58, 2.59s/it] 56%|█████▌ | 6901/12313 [5:09:46<3:52:40, 2.58s/it] {'loss': 0.4315, 'grad_norm': 3.6665218877774404, 'learning_rate': 2.133387261986544e-06, 'epoch': 0.56} 56%|█████▌ | 6901/12313 [5:09:46<3:52:40, 2.58s/it] 56%|█████▌ | 6902/12313 [5:09:49<3:54:52, 2.60s/it] {'loss': 0.4787, 'grad_norm': 3.345078936648102, 'learning_rate': 2.132736761943917e-06, 'epoch': 0.56} 56%|█████▌ | 6902/12313 [5:09:49<3:54:52, 2.60s/it] 56%|█████▌ | 6903/12313 [5:09:51<3:55:25, 2.61s/it] {'loss': 0.517, 'grad_norm': 5.603195213119475, 'learning_rate': 2.1320862873139627e-06, 'epoch': 0.56} 56%|█████▌ | 6903/12313 [5:09:51<3:55:25, 2.61s/it] 56%|█████▌ | 6904/12313 [5:09:54<3:56:06, 2.62s/it] {'loss': 0.4446, 'grad_norm': 4.582210135280795, 'learning_rate': 2.1314358381416906e-06, 'epoch': 0.56} 56%|█████▌ | 6904/12313 [5:09:54<3:56:06, 2.62s/it] 56%|█████▌ | 6905/12313 [5:09:57<3:59:52, 2.66s/it] {'loss': 0.5447, 'grad_norm': 5.520444524716127, 'learning_rate': 2.130785414472108e-06, 'epoch': 0.56} 56%|█████▌ | 6905/12313 [5:09:57<3:59:52, 2.66s/it] 56%|█████▌ | 6906/12313 [5:09:59<3:58:06, 2.64s/it] {'loss': 0.4709, 'grad_norm': 3.594858666968153, 'learning_rate': 2.1301350163502194e-06, 'epoch': 0.56} 56%|█████▌ | 6906/12313 [5:09:59<3:58:06, 2.64s/it] 56%|█████▌ | 6907/12313 [5:10:02<3:59:25, 2.66s/it] {'loss': 0.5108, 'grad_norm': 5.775944364817623, 'learning_rate': 2.1294846438210316e-06, 'epoch': 0.56} 56%|█████▌ | 6907/12313 [5:10:02<3:59:25, 2.66s/it] 56%|█████▌ | 6908/12313 [5:10:05<4:05:29, 2.73s/it] {'loss': 0.458, 'grad_norm': 4.911293017998235, 'learning_rate': 2.128834296929545e-06, 'epoch': 0.56} 56%|█████▌ | 6908/12313 [5:10:05<4:05:29, 2.73s/it] 56%|█████▌ | 6909/12313 [5:10:08<4:03:42, 2.71s/it] {'loss': 0.5297, 'grad_norm': 5.313549191131658, 'learning_rate': 2.12818397572076e-06, 'epoch': 0.56} 56%|█████▌ | 6909/12313 [5:10:08<4:03:42, 2.71s/it] 56%|█████▌ | 6910/12313 [5:10:10<4:03:34, 2.70s/it] {'loss': 0.6024, 'grad_norm': 5.235033497622972, 'learning_rate': 2.1275336802396775e-06, 'epoch': 0.56} 56%|█████▌ | 6910/12313 [5:10:10<4:03:34, 2.70s/it] 56%|█████▌ | 6911/12313 [5:10:13<4:01:10, 2.68s/it] {'loss': 0.4589, 'grad_norm': 8.31540179839902, 'learning_rate': 2.1268834105312926e-06, 'epoch': 0.56} 56%|█████▌ | 6911/12313 [5:10:13<4:01:10, 2.68s/it] 56%|█████▌ | 6912/12313 [5:10:16<3:58:21, 2.65s/it] {'loss': 0.687, 'grad_norm': 5.844285401664914, 'learning_rate': 2.1262331666406003e-06, 'epoch': 0.56} 56%|█████▌ | 6912/12313 [5:10:16<3:58:21, 2.65s/it] 56%|█████▌ | 6913/12313 [5:10:18<3:54:27, 2.61s/it] {'loss': 0.3788, 'grad_norm': 5.506426123033813, 'learning_rate': 2.125582948612595e-06, 'epoch': 0.56} 56%|█████▌ | 6913/12313 [5:10:18<3:54:27, 2.61s/it] 56%|█████▌ | 6914/12313 [5:10:21<4:01:55, 2.69s/it] {'loss': 0.4786, 'grad_norm': 4.938418738241158, 'learning_rate': 2.124932756492269e-06, 'epoch': 0.56} 56%|█████▌ | 6914/12313 [5:10:21<4:01:55, 2.69s/it] 56%|█████▌ | 6915/12313 [5:10:24<4:02:14, 2.69s/it] {'loss': 0.5104, 'grad_norm': 5.011490627219829, 'learning_rate': 2.1242825903246104e-06, 'epoch': 0.56} 56%|█████▌ | 6915/12313 [5:10:24<4:02:14, 2.69s/it] 56%|█████▌ | 6916/12313 [5:10:27<4:09:51, 2.78s/it] {'loss': 0.5, 'grad_norm': 4.130351403264876, 'learning_rate': 2.1236324501546073e-06, 'epoch': 0.56} 56%|█████▌ | 6916/12313 [5:10:27<4:09:51, 2.78s/it] 56%|█████▌ | 6917/12313 [5:10:29<4:09:58, 2.78s/it] {'loss': 0.5647, 'grad_norm': 5.131299996818871, 'learning_rate': 2.1229823360272483e-06, 'epoch': 0.56} 56%|█████▌ | 6917/12313 [5:10:29<4:09:58, 2.78s/it] 56%|█████▌ | 6918/12313 [5:10:32<4:12:09, 2.80s/it] {'loss': 0.6934, 'grad_norm': 4.930401157312547, 'learning_rate': 2.1223322479875157e-06, 'epoch': 0.56} 56%|█████▌ | 6918/12313 [5:10:32<4:12:09, 2.80s/it] 56%|█████▌ | 6919/12313 [5:10:35<4:04:21, 2.72s/it] {'loss': 0.5194, 'grad_norm': 4.218515382201296, 'learning_rate': 2.1216821860803922e-06, 'epoch': 0.56} 56%|█████▌ | 6919/12313 [5:10:35<4:04:21, 2.72s/it] 56%|█████▌ | 6920/12313 [5:10:37<4:01:03, 2.68s/it] {'loss': 0.548, 'grad_norm': 4.955917407462902, 'learning_rate': 2.12103215035086e-06, 'epoch': 0.56} 56%|█████▌ | 6920/12313 [5:10:37<4:01:03, 2.68s/it] 56%|█████▌ | 6921/12313 [5:10:40<3:52:43, 2.59s/it] {'loss': 0.3326, 'grad_norm': 7.097785428246493, 'learning_rate': 2.1203821408438973e-06, 'epoch': 0.56} 56%|█████▌ | 6921/12313 [5:10:40<3:52:43, 2.59s/it] 56%|█████▌ | 6922/12313 [5:10:42<3:55:36, 2.62s/it] {'loss': 0.5434, 'grad_norm': 5.512144520285594, 'learning_rate': 2.1197321576044803e-06, 'epoch': 0.56} 56%|█████▌ | 6922/12313 [5:10:42<3:55:36, 2.62s/it] 56%|█████▌ | 6923/12313 [5:10:45<3:53:03, 2.59s/it] {'loss': 0.706, 'grad_norm': 4.559962704037825, 'learning_rate': 2.119082200677587e-06, 'epoch': 0.56} 56%|█████▌ | 6923/12313 [5:10:45<3:53:03, 2.59s/it] 56%|█████▌ | 6924/12313 [5:10:48<3:53:30, 2.60s/it] {'loss': 0.4711, 'grad_norm': 3.716157164787584, 'learning_rate': 2.1184322701081884e-06, 'epoch': 0.56} 56%|█████▌ | 6924/12313 [5:10:48<3:53:30, 2.60s/it] 56%|█████▌ | 6925/12313 [5:10:50<3:54:58, 2.62s/it] {'loss': 0.6163, 'grad_norm': 5.005561099415801, 'learning_rate': 2.117782365941257e-06, 'epoch': 0.56} 56%|█████▌ | 6925/12313 [5:10:50<3:54:58, 2.62s/it] 56%|█████▌ | 6926/12313 [5:10:53<4:01:01, 2.68s/it] {'loss': 0.5811, 'grad_norm': 3.3652327554641808, 'learning_rate': 2.1171324882217644e-06, 'epoch': 0.56} 56%|█████▌ | 6926/12313 [5:10:53<4:01:01, 2.68s/it] 56%|█████▋ | 6927/12313 [5:10:56<4:02:29, 2.70s/it] {'loss': 0.4976, 'grad_norm': 3.4936659274070485, 'learning_rate': 2.116482636994677e-06, 'epoch': 0.56} 56%|█████▋ | 6927/12313 [5:10:56<4:02:29, 2.70s/it] 56%|█████▋ | 6928/12313 [5:10:59<4:01:57, 2.70s/it] {'loss': 0.3923, 'grad_norm': 15.606156521919054, 'learning_rate': 2.11583281230496e-06, 'epoch': 0.56} 56%|█████▋ | 6928/12313 [5:10:59<4:01:57, 2.70s/it] 56%|█████▋ | 6929/12313 [5:11:01<4:02:08, 2.70s/it] {'loss': 0.5009, 'grad_norm': 8.796650533681337, 'learning_rate': 2.11518301419758e-06, 'epoch': 0.56} 56%|█████▋ | 6929/12313 [5:11:01<4:02:08, 2.70s/it] 56%|█████▋ | 6930/12313 [5:11:04<3:58:40, 2.66s/it] {'loss': 0.3586, 'grad_norm': 5.043513216497029, 'learning_rate': 2.1145332427174995e-06, 'epoch': 0.56} 56%|█████▋ | 6930/12313 [5:11:04<3:58:40, 2.66s/it] 56%|█████▋ | 6931/12313 [5:11:07<4:03:21, 2.71s/it] {'loss': 0.4416, 'grad_norm': 5.302763968985061, 'learning_rate': 2.1138834979096778e-06, 'epoch': 0.56} 56%|█████▋ | 6931/12313 [5:11:07<4:03:21, 2.71s/it] 56%|█████▋ | 6932/12313 [5:11:10<4:12:19, 2.81s/it] {'loss': 0.5591, 'grad_norm': 4.618030423336223, 'learning_rate': 2.1132337798190743e-06, 'epoch': 0.56} 56%|█████▋ | 6932/12313 [5:11:10<4:12:19, 2.81s/it] 56%|█████▋ | 6933/12313 [5:11:12<4:07:42, 2.76s/it] {'loss': 0.6132, 'grad_norm': 5.305923379739884, 'learning_rate': 2.112584088490647e-06, 'epoch': 0.56} 56%|█████▋ | 6933/12313 [5:11:12<4:07:42, 2.76s/it] 56%|█████▋ | 6934/12313 [5:11:15<4:04:13, 2.72s/it] {'loss': 0.4011, 'grad_norm': 8.514031471933345, 'learning_rate': 2.11193442396935e-06, 'epoch': 0.56} 56%|█████▋ | 6934/12313 [5:11:15<4:04:13, 2.72s/it] 56%|█████▋ | 6935/12313 [5:11:18<4:12:36, 2.82s/it] {'loss': 0.6595, 'grad_norm': 4.100042747142503, 'learning_rate': 2.111284786300137e-06, 'epoch': 0.56} 56%|█████▋ | 6935/12313 [5:11:18<4:12:36, 2.82s/it] 56%|█████▋ | 6936/12313 [5:11:21<4:08:16, 2.77s/it] {'loss': 0.4761, 'grad_norm': 6.623608750925487, 'learning_rate': 2.11063517552796e-06, 'epoch': 0.56} 56%|█████▋ | 6936/12313 [5:11:21<4:08:16, 2.77s/it] 56%|█████▋ | 6937/12313 [5:11:23<4:06:00, 2.75s/it] {'loss': 0.4361, 'grad_norm': 4.043561752629549, 'learning_rate': 2.1099855916977676e-06, 'epoch': 0.56} 56%|█████▋ | 6937/12313 [5:11:23<4:06:00, 2.75s/it] 56%|█████▋ | 6938/12313 [5:11:26<4:06:16, 2.75s/it] {'loss': 0.6809, 'grad_norm': 3.492107617173054, 'learning_rate': 2.109336034854508e-06, 'epoch': 0.56} 56%|█████▋ | 6938/12313 [5:11:26<4:06:16, 2.75s/it] 56%|█████▋ | 6939/12313 [5:11:29<4:13:01, 2.82s/it] {'loss': 0.4603, 'grad_norm': 5.163084090773521, 'learning_rate': 2.1086865050431283e-06, 'epoch': 0.56} 56%|█████▋ | 6939/12313 [5:11:29<4:13:01, 2.82s/it] 56%|█████▋ | 6940/12313 [5:11:32<4:08:45, 2.78s/it] {'loss': 0.6142, 'grad_norm': 4.691574509899781, 'learning_rate': 2.1080370023085713e-06, 'epoch': 0.56} 56%|█████▋ | 6940/12313 [5:11:32<4:08:45, 2.78s/it] 56%|█████▋ | 6941/12313 [5:11:35<4:14:35, 2.84s/it] {'loss': 0.4824, 'grad_norm': 4.128014703538365, 'learning_rate': 2.107387526695778e-06, 'epoch': 0.56} 56%|█████▋ | 6941/12313 [5:11:35<4:14:35, 2.84s/it] 56%|█████▋ | 6942/12313 [5:11:37<4:08:33, 2.78s/it] {'loss': 0.5197, 'grad_norm': 11.05696154296604, 'learning_rate': 2.106738078249691e-06, 'epoch': 0.56} 56%|█████▋ | 6942/12313 [5:11:37<4:08:33, 2.78s/it] 56%|█████▋ | 6943/12313 [5:11:40<4:12:20, 2.82s/it] {'loss': 0.4981, 'grad_norm': 5.222620839074155, 'learning_rate': 2.1060886570152477e-06, 'epoch': 0.56} 56%|█████▋ | 6943/12313 [5:11:40<4:12:20, 2.82s/it] 56%|█████▋ | 6944/12313 [5:11:43<4:01:13, 2.70s/it] {'loss': 0.4387, 'grad_norm': 4.697414322539678, 'learning_rate': 2.105439263037384e-06, 'epoch': 0.56} 56%|█████▋ | 6944/12313 [5:11:43<4:01:13, 2.70s/it] 56%|█████▋ | 6945/12313 [5:11:45<4:01:43, 2.70s/it] {'loss': 0.4677, 'grad_norm': 3.8406258019033004, 'learning_rate': 2.1047898963610354e-06, 'epoch': 0.56} 56%|█████▋ | 6945/12313 [5:11:45<4:01:43, 2.70s/it] 56%|█████▋ | 6946/12313 [5:11:48<3:54:31, 2.62s/it] {'loss': 0.3368, 'grad_norm': 5.2714849848498115, 'learning_rate': 2.1041405570311348e-06, 'epoch': 0.56} 56%|█████▋ | 6946/12313 [5:11:48<3:54:31, 2.62s/it] 56%|█████▋ | 6947/12313 [5:11:50<3:54:36, 2.62s/it] {'loss': 0.4286, 'grad_norm': 7.429954267821541, 'learning_rate': 2.1034912450926114e-06, 'epoch': 0.56} 56%|█████▋ | 6947/12313 [5:11:50<3:54:36, 2.62s/it] 56%|█████▋ | 6948/12313 [5:11:53<3:58:02, 2.66s/it] {'loss': 0.4246, 'grad_norm': 4.461747197120303, 'learning_rate': 2.102841960590396e-06, 'epoch': 0.56} 56%|█████▋ | 6948/12313 [5:11:53<3:58:02, 2.66s/it] 56%|█████▋ | 6949/12313 [5:11:56<3:57:43, 2.66s/it] {'loss': 0.4819, 'grad_norm': 6.852938483133885, 'learning_rate': 2.102192703569416e-06, 'epoch': 0.56} 56%|█████▋ | 6949/12313 [5:11:56<3:57:43, 2.66s/it] 56%|█████▋ | 6950/12313 [5:11:59<4:02:17, 2.71s/it] {'loss': 0.5224, 'grad_norm': 4.945485942357694, 'learning_rate': 2.1015434740745944e-06, 'epoch': 0.56} 56%|█████▋ | 6950/12313 [5:11:59<4:02:17, 2.71s/it] 56%|█████▋ | 6951/12313 [5:12:01<4:01:35, 2.70s/it] {'loss': 0.4534, 'grad_norm': 5.1220985098046095, 'learning_rate': 2.1008942721508553e-06, 'epoch': 0.56} 56%|█████▋ | 6951/12313 [5:12:01<4:01:35, 2.70s/it] 56%|█████▋ | 6952/12313 [5:12:04<4:04:21, 2.73s/it] {'loss': 0.4874, 'grad_norm': 4.645911138798781, 'learning_rate': 2.1002450978431216e-06, 'epoch': 0.56} 56%|█████▋ | 6952/12313 [5:12:04<4:04:21, 2.73s/it] 56%|█████▋ | 6953/12313 [5:12:07<4:06:49, 2.76s/it] {'loss': 0.4335, 'grad_norm': 4.138063372246911, 'learning_rate': 2.099595951196311e-06, 'epoch': 0.56} 56%|█████▋ | 6953/12313 [5:12:07<4:06:49, 2.76s/it] 56%|█████▋ | 6954/12313 [5:12:10<4:02:49, 2.72s/it] {'loss': 0.5575, 'grad_norm': 5.760613534796062, 'learning_rate': 2.09894683225534e-06, 'epoch': 0.56} 56%|█████▋ | 6954/12313 [5:12:10<4:02:49, 2.72s/it] 56%|█████▋ | 6955/12313 [5:12:12<4:01:36, 2.71s/it] {'loss': 0.5573, 'grad_norm': 5.719720985157515, 'learning_rate': 2.0982977410651276e-06, 'epoch': 0.56} 56%|█████▋ | 6955/12313 [5:12:12<4:01:36, 2.71s/it] 56%|█████▋ | 6956/12313 [5:12:15<4:05:03, 2.74s/it] {'loss': 0.4523, 'grad_norm': 5.21317528533535, 'learning_rate': 2.0976486776705853e-06, 'epoch': 0.56} 56%|█████▋ | 6956/12313 [5:12:15<4:05:03, 2.74s/it] 57%|█████▋ | 6957/12313 [5:12:18<4:06:20, 2.76s/it] {'loss': 0.5827, 'grad_norm': 6.455893264664072, 'learning_rate': 2.0969996421166243e-06, 'epoch': 0.57} 57%|█████▋ | 6957/12313 [5:12:18<4:06:20, 2.76s/it] 57%|█████▋ | 6958/12313 [5:12:20<3:56:31, 2.65s/it] {'loss': 0.4638, 'grad_norm': 3.152544873395565, 'learning_rate': 2.0963506344481556e-06, 'epoch': 0.57} 57%|█████▋ | 6958/12313 [5:12:20<3:56:31, 2.65s/it] 57%|█████▋ | 6959/12313 [5:12:23<3:54:13, 2.62s/it] {'loss': 0.5174, 'grad_norm': 3.711902575933437, 'learning_rate': 2.0957016547100867e-06, 'epoch': 0.57} 57%|█████▋ | 6959/12313 [5:12:23<3:54:13, 2.62s/it] 57%|█████▋ | 6960/12313 [5:12:26<3:59:41, 2.69s/it] {'loss': 0.5568, 'grad_norm': 3.1937905928416774, 'learning_rate': 2.095052702947323e-06, 'epoch': 0.57} 57%|█████▋ | 6960/12313 [5:12:26<3:59:41, 2.69s/it] 57%|█████▋ | 6961/12313 [5:12:29<4:02:13, 2.72s/it] {'loss': 0.6765, 'grad_norm': 3.8506487142937336, 'learning_rate': 2.09440377920477e-06, 'epoch': 0.57} 57%|█████▋ | 6961/12313 [5:12:29<4:02:13, 2.72s/it] 57%|█████▋ | 6962/12313 [5:12:31<3:57:38, 2.66s/it] {'loss': 0.4279, 'grad_norm': 4.161307544084725, 'learning_rate': 2.0937548835273285e-06, 'epoch': 0.57} 57%|█████▋ | 6962/12313 [5:12:31<3:57:38, 2.66s/it] 57%|█████▋ | 6963/12313 [5:12:34<4:02:13, 2.72s/it] {'loss': 0.483, 'grad_norm': 5.783786158584384, 'learning_rate': 2.0931060159598986e-06, 'epoch': 0.57} 57%|█████▋ | 6963/12313 [5:12:34<4:02:13, 2.72s/it] 57%|█████▋ | 6964/12313 [5:12:37<4:03:44, 2.73s/it] {'loss': 0.5543, 'grad_norm': 4.391669807993531, 'learning_rate': 2.0924571765473793e-06, 'epoch': 0.57} 57%|█████▋ | 6964/12313 [5:12:37<4:03:44, 2.73s/it] 57%|█████▋ | 6965/12313 [5:12:39<4:01:06, 2.70s/it] {'loss': 0.5861, 'grad_norm': 11.934622866843036, 'learning_rate': 2.091808365334667e-06, 'epoch': 0.57} 57%|█████▋ | 6965/12313 [5:12:39<4:01:06, 2.70s/it] 57%|█████▋ | 6966/12313 [5:12:42<3:55:07, 2.64s/it] {'loss': 0.4361, 'grad_norm': 6.272947824575495, 'learning_rate': 2.091159582366655e-06, 'epoch': 0.57} 57%|█████▋ | 6966/12313 [5:12:42<3:55:07, 2.64s/it] 57%|█████▋ | 6967/12313 [5:12:45<4:09:01, 2.79s/it] {'loss': 0.4904, 'grad_norm': 2.896511468026234, 'learning_rate': 2.0905108276882356e-06, 'epoch': 0.57} 57%|█████▋ | 6967/12313 [5:12:45<4:09:01, 2.79s/it] 57%|█████▋ | 6968/12313 [5:12:48<4:04:36, 2.75s/it] {'loss': 0.4592, 'grad_norm': 4.7888166758801205, 'learning_rate': 2.089862101344301e-06, 'epoch': 0.57} 57%|█████▋ | 6968/12313 [5:12:48<4:04:36, 2.75s/it] 57%|█████▋ | 6969/12313 [5:12:50<4:05:14, 2.75s/it] {'loss': 0.4861, 'grad_norm': 10.0912728353721, 'learning_rate': 2.0892134033797383e-06, 'epoch': 0.57} 57%|█████▋ | 6969/12313 [5:12:50<4:05:14, 2.75s/it] 57%|█████▋ | 6970/12313 [5:12:53<4:07:37, 2.78s/it] {'loss': 0.3952, 'grad_norm': 2.573138862813564, 'learning_rate': 2.088564733839433e-06, 'epoch': 0.57} 57%|█████▋ | 6970/12313 [5:12:53<4:07:37, 2.78s/it] 57%|█████▋ | 6971/12313 [5:12:56<3:57:32, 2.67s/it] {'loss': 0.4989, 'grad_norm': 2.924408214469329, 'learning_rate': 2.087916092768271e-06, 'epoch': 0.57} 57%|█████▋ | 6971/12313 [5:12:56<3:57:32, 2.67s/it] 57%|█████▋ | 6972/12313 [5:12:58<3:56:50, 2.66s/it] {'loss': 0.593, 'grad_norm': 5.175327347055258, 'learning_rate': 2.087267480211135e-06, 'epoch': 0.57} 57%|█████▋ | 6972/12313 [5:12:58<3:56:50, 2.66s/it] 57%|█████▋ | 6973/12313 [5:13:02<4:25:18, 2.98s/it] {'loss': 0.4824, 'grad_norm': 5.327737397653494, 'learning_rate': 2.086618896212904e-06, 'epoch': 0.57} 57%|█████▋ | 6973/12313 [5:13:02<4:25:18, 2.98s/it] 57%|█████▋ | 6974/12313 [5:13:05<4:15:49, 2.88s/it] {'loss': 0.569, 'grad_norm': 3.5981451480047344, 'learning_rate': 2.0859703408184583e-06, 'epoch': 0.57} 57%|█████▋ | 6974/12313 [5:13:05<4:15:49, 2.88s/it] 57%|█████▋ | 6975/12313 [5:13:07<4:14:42, 2.86s/it] {'loss': 0.5055, 'grad_norm': 3.98952188483392, 'learning_rate': 2.085321814072674e-06, 'epoch': 0.57} 57%|█████▋ | 6975/12313 [5:13:07<4:14:42, 2.86s/it] 57%|█████▋ | 6976/12313 [5:13:10<4:08:53, 2.80s/it] {'loss': 0.5048, 'grad_norm': 4.760263855647576, 'learning_rate': 2.0846733160204244e-06, 'epoch': 0.57} 57%|█████▋ | 6976/12313 [5:13:10<4:08:53, 2.80s/it] 57%|█████▋ | 6977/12313 [5:13:13<4:08:35, 2.80s/it] {'loss': 0.474, 'grad_norm': 4.040158060702142, 'learning_rate': 2.084024846706584e-06, 'epoch': 0.57} 57%|█████▋ | 6977/12313 [5:13:13<4:08:35, 2.80s/it] 57%|█████▋ | 6978/12313 [5:13:16<4:08:12, 2.79s/it] {'loss': 0.6342, 'grad_norm': 3.3443500978774394, 'learning_rate': 2.083376406176023e-06, 'epoch': 0.57} 57%|█████▋ | 6978/12313 [5:13:16<4:08:12, 2.79s/it] 57%|█████▋ | 6979/12313 [5:13:18<4:07:25, 2.78s/it] {'loss': 0.4007, 'grad_norm': 6.809697122943693, 'learning_rate': 2.082727994473609e-06, 'epoch': 0.57} 57%|█████▋ | 6979/12313 [5:13:18<4:07:25, 2.78s/it] 57%|█████▋ | 6980/12313 [5:13:21<4:07:15, 2.78s/it] {'loss': 0.3879, 'grad_norm': 4.007751918991223, 'learning_rate': 2.08207961164421e-06, 'epoch': 0.57} 57%|█████▋ | 6980/12313 [5:13:21<4:07:15, 2.78s/it] 57%|█████▋ | 6981/12313 [5:13:24<4:03:07, 2.74s/it] {'loss': 0.4471, 'grad_norm': 4.107776861520255, 'learning_rate': 2.08143125773269e-06, 'epoch': 0.57} 57%|█████▋ | 6981/12313 [5:13:24<4:03:07, 2.74s/it] 57%|█████▋ | 6982/12313 [5:13:27<4:03:31, 2.74s/it] {'loss': 0.4513, 'grad_norm': 3.806832164663562, 'learning_rate': 2.080782932783911e-06, 'epoch': 0.57} 57%|█████▋ | 6982/12313 [5:13:27<4:03:31, 2.74s/it] 57%|█████▋ | 6983/12313 [5:13:29<4:01:41, 2.72s/it] {'loss': 0.6031, 'grad_norm': 6.942546340030462, 'learning_rate': 2.0801346368427356e-06, 'epoch': 0.57} 57%|█████▋ | 6983/12313 [5:13:29<4:01:41, 2.72s/it] 57%|█████▋ | 6984/12313 [5:13:32<4:00:40, 2.71s/it] {'loss': 0.6745, 'grad_norm': 3.225972794340873, 'learning_rate': 2.0794863699540206e-06, 'epoch': 0.57} 57%|█████▋ | 6984/12313 [5:13:32<4:00:40, 2.71s/it] 57%|█████▋ | 6985/12313 [5:13:35<4:01:23, 2.72s/it] {'loss': 0.531, 'grad_norm': 4.7018910207675955, 'learning_rate': 2.0788381321626237e-06, 'epoch': 0.57} 57%|█████▋ | 6985/12313 [5:13:35<4:01:23, 2.72s/it] 57%|█████▋ | 6986/12313 [5:13:38<4:11:31, 2.83s/it] {'loss': 0.6038, 'grad_norm': 3.88377602984253, 'learning_rate': 2.0781899235133984e-06, 'epoch': 0.57} 57%|█████▋ | 6986/12313 [5:13:38<4:11:31, 2.83s/it] 57%|█████▋ | 6987/12313 [5:13:41<4:07:54, 2.79s/it] {'loss': 0.5044, 'grad_norm': 4.743493354886736, 'learning_rate': 2.077541744051198e-06, 'epoch': 0.57} 57%|█████▋ | 6987/12313 [5:13:41<4:07:54, 2.79s/it] 57%|█████▋ | 6988/12313 [5:13:43<3:59:36, 2.70s/it] {'loss': 0.545, 'grad_norm': 3.725682821827385, 'learning_rate': 2.0768935938208735e-06, 'epoch': 0.57} 57%|█████▋ | 6988/12313 [5:13:43<3:59:36, 2.70s/it] 57%|█████▋ | 6989/12313 [5:13:46<4:00:32, 2.71s/it] {'loss': 0.6513, 'grad_norm': 4.23366272227734, 'learning_rate': 2.0762454728672727e-06, 'epoch': 0.57} 57%|█████▋ | 6989/12313 [5:13:46<4:00:32, 2.71s/it] 57%|█████▋ | 6990/12313 [5:13:49<4:03:19, 2.74s/it] {'loss': 0.5489, 'grad_norm': 3.219731697269269, 'learning_rate': 2.0755973812352424e-06, 'epoch': 0.57} 57%|█████▋ | 6990/12313 [5:13:49<4:03:19, 2.74s/it] 57%|█████▋ | 6991/12313 [5:13:51<3:54:28, 2.64s/it] {'loss': 0.5292, 'grad_norm': 5.426598112663618, 'learning_rate': 2.074949318969628e-06, 'epoch': 0.57} 57%|█████▋ | 6991/12313 [5:13:51<3:54:28, 2.64s/it] 57%|█████▋ | 6992/12313 [5:13:54<4:08:05, 2.80s/it] {'loss': 0.4586, 'grad_norm': 5.501771039935805, 'learning_rate': 2.07430128611527e-06, 'epoch': 0.57} 57%|█████▋ | 6992/12313 [5:13:54<4:08:05, 2.80s/it] 57%|█████▋ | 6993/12313 [5:13:57<4:11:30, 2.84s/it] {'loss': 0.414, 'grad_norm': 4.038443635513911, 'learning_rate': 2.0736532827170107e-06, 'epoch': 0.57} 57%|█████▋ | 6993/12313 [5:13:57<4:11:30, 2.84s/it] 57%|█████▋ | 6994/12313 [5:14:00<4:07:37, 2.79s/it] {'loss': 0.4727, 'grad_norm': 5.939660501414922, 'learning_rate': 2.0730053088196883e-06, 'epoch': 0.57} 57%|█████▋ | 6994/12313 [5:14:00<4:07:37, 2.79s/it] 57%|█████▋ | 6995/12313 [5:14:02<4:01:23, 2.72s/it] {'loss': 0.4108, 'grad_norm': 4.71954105859692, 'learning_rate': 2.072357364468138e-06, 'epoch': 0.57} 57%|█████▋ | 6995/12313 [5:14:02<4:01:23, 2.72s/it] 57%|█████▋ | 6996/12313 [5:14:06<4:26:18, 3.01s/it] {'loss': 0.4805, 'grad_norm': 12.024015711126426, 'learning_rate': 2.0717094497071945e-06, 'epoch': 0.57} 57%|█████▋ | 6996/12313 [5:14:06<4:26:18, 3.01s/it] 57%|█████▋ | 6997/12313 [5:14:09<4:16:44, 2.90s/it] {'loss': 0.4311, 'grad_norm': 4.1133512422429455, 'learning_rate': 2.0710615645816913e-06, 'epoch': 0.57} 57%|█████▋ | 6997/12313 [5:14:09<4:16:44, 2.90s/it] 57%|█████▋ | 6998/12313 [5:14:11<4:12:12, 2.85s/it] {'loss': 0.5012, 'grad_norm': 3.993669100784519, 'learning_rate': 2.0704137091364568e-06, 'epoch': 0.57} 57%|█████▋ | 6998/12313 [5:14:11<4:12:12, 2.85s/it] 57%|█████▋ | 6999/12313 [5:14:14<4:06:01, 2.78s/it] {'loss': 0.43, 'grad_norm': 5.302686271074888, 'learning_rate': 2.069765883416321e-06, 'epoch': 0.57} 57%|█████▋ | 6999/12313 [5:14:14<4:06:01, 2.78s/it] 57%|█████▋ | 7000/12313 [5:14:17<4:00:48, 2.72s/it] {'loss': 0.5289, 'grad_norm': 4.755390559997945, 'learning_rate': 2.0691180874661086e-06, 'epoch': 0.57} 57%|█████▋ | 7000/12313 [5:14:17<4:00:48, 2.72s/it] 57%|█████▋ | 7001/12313 [5:14:20<4:07:33, 2.80s/it] {'loss': 0.4691, 'grad_norm': 2.9673265351316704, 'learning_rate': 2.0684703213306435e-06, 'epoch': 0.57} 57%|█████▋ | 7001/12313 [5:14:20<4:07:33, 2.80s/it] 57%|█████▋ | 7002/12313 [5:14:22<4:00:37, 2.72s/it] {'loss': 0.5234, 'grad_norm': 3.9211031513326486, 'learning_rate': 2.0678225850547497e-06, 'epoch': 0.57} 57%|█████▋ | 7002/12313 [5:14:22<4:00:37, 2.72s/it] 57%|█████▋ | 7003/12313 [5:14:25<4:00:45, 2.72s/it] {'loss': 0.5117, 'grad_norm': 4.581102578387739, 'learning_rate': 2.0671748786832447e-06, 'epoch': 0.57} 57%|█████▋ | 7003/12313 [5:14:25<4:00:45, 2.72s/it] 57%|█████▋ | 7004/12313 [5:14:28<4:04:49, 2.77s/it] {'loss': 0.5086, 'grad_norm': 6.219007379440202, 'learning_rate': 2.0665272022609482e-06, 'epoch': 0.57} 57%|█████▋ | 7004/12313 [5:14:28<4:04:49, 2.77s/it] 57%|█████▋ | 7005/12313 [5:14:30<4:02:38, 2.74s/it] {'loss': 0.4348, 'grad_norm': 6.0686864713771485, 'learning_rate': 2.0658795558326745e-06, 'epoch': 0.57} 57%|█████▋ | 7005/12313 [5:14:30<4:02:38, 2.74s/it] 57%|█████▋ | 7006/12313 [5:14:33<4:07:57, 2.80s/it] {'loss': 0.4322, 'grad_norm': 3.6037676191639885, 'learning_rate': 2.065231939443238e-06, 'epoch': 0.57} 57%|█████▋ | 7006/12313 [5:14:33<4:07:57, 2.80s/it] 57%|█████▋ | 7007/12313 [5:14:36<4:07:25, 2.80s/it] {'loss': 0.4967, 'grad_norm': 3.160547290178154, 'learning_rate': 2.064584353137451e-06, 'epoch': 0.57} 57%|█████▋ | 7007/12313 [5:14:36<4:07:25, 2.80s/it] 57%|█████▋ | 7008/12313 [5:14:39<4:04:35, 2.77s/it] {'loss': 0.4513, 'grad_norm': 4.373900459536156, 'learning_rate': 2.0639367969601215e-06, 'epoch': 0.57} 57%|█████▋ | 7008/12313 [5:14:39<4:04:35, 2.77s/it] 57%|█████▋ | 7009/12313 [5:14:41<4:03:57, 2.76s/it] {'loss': 0.4657, 'grad_norm': 10.5295783218965, 'learning_rate': 2.063289270956058e-06, 'epoch': 0.57} 57%|█████▋ | 7009/12313 [5:14:41<4:03:57, 2.76s/it] 57%|█████▋ | 7010/12313 [5:14:44<3:57:42, 2.69s/it] {'loss': 0.3973, 'grad_norm': 6.441791257822959, 'learning_rate': 2.0626417751700664e-06, 'epoch': 0.57} 57%|█████▋ | 7010/12313 [5:14:44<3:57:42, 2.69s/it] 57%|█████▋ | 7011/12313 [5:14:47<3:56:37, 2.68s/it] {'loss': 0.4463, 'grad_norm': 7.65548642747038, 'learning_rate': 2.0619943096469484e-06, 'epoch': 0.57} 57%|█████▋ | 7011/12313 [5:14:47<3:56:37, 2.68s/it] 57%|█████▋ | 7012/12313 [5:14:49<4:00:06, 2.72s/it] {'loss': 0.5198, 'grad_norm': 2.7981688902513864, 'learning_rate': 2.061346874431507e-06, 'epoch': 0.57} 57%|█████▋ | 7012/12313 [5:14:49<4:00:06, 2.72s/it] 57%|█████▋ | 7013/12313 [5:14:52<4:07:34, 2.80s/it] {'loss': 0.3686, 'grad_norm': 3.8831757918226586, 'learning_rate': 2.0606994695685396e-06, 'epoch': 0.57} 57%|█████▋ | 7013/12313 [5:14:52<4:07:34, 2.80s/it] 57%|█████▋ | 7014/12313 [5:14:55<4:03:38, 2.76s/it] {'loss': 0.5688, 'grad_norm': 3.8701501681776156, 'learning_rate': 2.0600520951028437e-06, 'epoch': 0.57} 57%|█████▋ | 7014/12313 [5:14:55<4:03:38, 2.76s/it] 57%|█████▋ | 7015/12313 [5:14:58<4:14:52, 2.89s/it] {'loss': 0.4693, 'grad_norm': 3.537194498397044, 'learning_rate': 2.059404751079215e-06, 'epoch': 0.57} 57%|█████▋ | 7015/12313 [5:14:58<4:14:52, 2.89s/it] 57%|█████▋ | 7016/12313 [5:15:01<4:02:58, 2.75s/it] {'loss': 0.4932, 'grad_norm': 4.5238431030094794, 'learning_rate': 2.0587574375424456e-06, 'epoch': 0.57} 57%|█████▋ | 7016/12313 [5:15:01<4:02:58, 2.75s/it] 57%|█████▋ | 7017/12313 [5:15:03<3:59:51, 2.72s/it] {'loss': 0.5723, 'grad_norm': 6.314721658579622, 'learning_rate': 2.0581101545373255e-06, 'epoch': 0.57} 57%|█████▋ | 7017/12313 [5:15:03<3:59:51, 2.72s/it] 57%|█████▋ | 7018/12313 [5:15:06<3:57:44, 2.69s/it] {'loss': 0.6108, 'grad_norm': 4.543665245055623, 'learning_rate': 2.057462902108645e-06, 'epoch': 0.57} 57%|█████▋ | 7018/12313 [5:15:06<3:57:44, 2.69s/it] 57%|█████▋ | 7019/12313 [5:15:08<3:51:11, 2.62s/it] {'loss': 0.5064, 'grad_norm': 4.326213404795697, 'learning_rate': 2.0568156803011897e-06, 'epoch': 0.57} 57%|█████▋ | 7019/12313 [5:15:08<3:51:11, 2.62s/it] 57%|█████▋ | 7020/12313 [5:15:11<3:58:06, 2.70s/it] {'loss': 0.4286, 'grad_norm': 6.344731487919164, 'learning_rate': 2.056168489159744e-06, 'epoch': 0.57} 57%|█████▋ | 7020/12313 [5:15:11<3:58:06, 2.70s/it] 57%|█████▋ | 7021/12313 [5:15:14<3:55:53, 2.67s/it] {'loss': 0.4281, 'grad_norm': 6.407501842611541, 'learning_rate': 2.0555213287290886e-06, 'epoch': 0.57} 57%|█████▋ | 7021/12313 [5:15:14<3:55:53, 2.67s/it] 57%|█████▋ | 7022/12313 [5:15:17<3:59:04, 2.71s/it] {'loss': 0.5229, 'grad_norm': 3.9016889868541873, 'learning_rate': 2.0548741990540057e-06, 'epoch': 0.57} 57%|█████▋ | 7022/12313 [5:15:17<3:59:04, 2.71s/it] 57%|█████▋ | 7023/12313 [5:15:19<3:52:03, 2.63s/it] {'loss': 0.4988, 'grad_norm': 4.485329708138932, 'learning_rate': 2.0542271001792726e-06, 'epoch': 0.57} 57%|█████▋ | 7023/12313 [5:15:19<3:52:03, 2.63s/it] 57%|█████▋ | 7024/12313 [5:15:22<3:51:44, 2.63s/it] {'loss': 0.4174, 'grad_norm': 7.011271727948778, 'learning_rate': 2.0535800321496645e-06, 'epoch': 0.57} 57%|█████▋ | 7024/12313 [5:15:22<3:51:44, 2.63s/it] 57%|█████▋ | 7025/12313 [5:15:24<3:50:15, 2.61s/it] {'loss': 0.566, 'grad_norm': 21.104012028716472, 'learning_rate': 2.0529329950099554e-06, 'epoch': 0.57} 57%|█████▋ | 7025/12313 [5:15:24<3:50:15, 2.61s/it] 57%|█████▋ | 7026/12313 [5:15:27<3:52:14, 2.64s/it] {'loss': 0.4475, 'grad_norm': 5.5743586700224, 'learning_rate': 2.052285988804918e-06, 'epoch': 0.57} 57%|█████▋ | 7026/12313 [5:15:27<3:52:14, 2.64s/it] 57%|█████▋ | 7027/12313 [5:15:30<3:54:37, 2.66s/it] {'loss': 0.3382, 'grad_norm': 4.416468163895723, 'learning_rate': 2.0516390135793192e-06, 'epoch': 0.57} 57%|█████▋ | 7027/12313 [5:15:30<3:54:37, 2.66s/it] 57%|█████▋ | 7028/12313 [5:15:33<3:55:56, 2.68s/it] {'loss': 0.3999, 'grad_norm': 4.986468294717369, 'learning_rate': 2.050992069377929e-06, 'epoch': 0.57} 57%|█████▋ | 7028/12313 [5:15:33<3:55:56, 2.68s/it] 57%|█████▋ | 7029/12313 [5:15:35<3:57:25, 2.70s/it] {'loss': 0.4653, 'grad_norm': 9.4114543685177, 'learning_rate': 2.050345156245511e-06, 'epoch': 0.57} 57%|█████▋ | 7029/12313 [5:15:35<3:57:25, 2.70s/it] 57%|█████▋ | 7030/12313 [5:15:38<3:56:24, 2.68s/it] {'loss': 0.3624, 'grad_norm': 4.613503552061298, 'learning_rate': 2.0496982742268273e-06, 'epoch': 0.57} 57%|█████▋ | 7030/12313 [5:15:38<3:56:24, 2.68s/it] 57%|█████▋ | 7031/12313 [5:15:41<3:54:02, 2.66s/it] {'loss': 0.5138, 'grad_norm': 6.062105080879825, 'learning_rate': 2.0490514233666413e-06, 'epoch': 0.57} 57%|█████▋ | 7031/12313 [5:15:41<3:54:02, 2.66s/it] 57%|█████▋ | 7032/12313 [5:15:43<3:57:56, 2.70s/it] {'loss': 0.4635, 'grad_norm': 3.9841008788202132, 'learning_rate': 2.04840460370971e-06, 'epoch': 0.57} 57%|█████▋ | 7032/12313 [5:15:43<3:57:56, 2.70s/it] 57%|█████▋ | 7033/12313 [5:15:46<3:58:11, 2.71s/it] {'loss': 0.4516, 'grad_norm': 5.48691904012669, 'learning_rate': 2.0477578153007887e-06, 'epoch': 0.57} 57%|█████▋ | 7033/12313 [5:15:46<3:58:11, 2.71s/it] 57%|█████▋ | 7034/12313 [5:15:49<3:56:07, 2.68s/it] {'loss': 0.5242, 'grad_norm': 3.3836486599021636, 'learning_rate': 2.047111058184635e-06, 'epoch': 0.57} 57%|█████▋ | 7034/12313 [5:15:49<3:56:07, 2.68s/it] 57%|█████▋ | 7035/12313 [5:15:52<3:59:01, 2.72s/it] {'loss': 0.5653, 'grad_norm': 3.8672222024944856, 'learning_rate': 2.046464332405998e-06, 'epoch': 0.57} 57%|█████▋ | 7035/12313 [5:15:52<3:59:01, 2.72s/it] 57%|█████▋ | 7036/12313 [5:15:54<3:55:31, 2.68s/it] {'loss': 0.5402, 'grad_norm': 4.314382703074949, 'learning_rate': 2.045817638009629e-06, 'epoch': 0.57} 57%|█████▋ | 7036/12313 [5:15:54<3:55:31, 2.68s/it] 57%|█████▋ | 7037/12313 [5:15:57<3:55:06, 2.67s/it] {'loss': 0.4213, 'grad_norm': 3.567072195815995, 'learning_rate': 2.045170975040276e-06, 'epoch': 0.57} 57%|█████▋ | 7037/12313 [5:15:57<3:55:06, 2.67s/it] 57%|█████▋ | 7038/12313 [5:15:59<3:56:09, 2.69s/it] {'loss': 0.6564, 'grad_norm': 4.48375869912088, 'learning_rate': 2.0445243435426847e-06, 'epoch': 0.57} 57%|█████▋ | 7038/12313 [5:15:59<3:56:09, 2.69s/it] 57%|█████▋ | 7039/12313 [5:16:02<4:02:52, 2.76s/it] {'loss': 0.5135, 'grad_norm': 3.552656399851541, 'learning_rate': 2.043877743561598e-06, 'epoch': 0.57} 57%|█████▋ | 7039/12313 [5:16:02<4:02:52, 2.76s/it] 57%|█████▋ | 7040/12313 [5:16:05<3:53:21, 2.66s/it] {'loss': 0.6527, 'grad_norm': 3.8377915597339873, 'learning_rate': 2.0432311751417568e-06, 'epoch': 0.57} 57%|█████▋ | 7040/12313 [5:16:05<3:53:21, 2.66s/it] 57%|█████▋ | 7041/12313 [5:16:08<3:55:21, 2.68s/it] {'loss': 0.3723, 'grad_norm': 6.426418479233554, 'learning_rate': 2.042584638327902e-06, 'epoch': 0.57} 57%|█████▋ | 7041/12313 [5:16:08<3:55:21, 2.68s/it] 57%|█████▋ | 7042/12313 [5:16:10<3:53:40, 2.66s/it] {'loss': 0.4957, 'grad_norm': 4.241032012671777, 'learning_rate': 2.0419381331647687e-06, 'epoch': 0.57} 57%|█████▋ | 7042/12313 [5:16:10<3:53:40, 2.66s/it] 57%|█████▋ | 7043/12313 [5:16:13<4:03:05, 2.77s/it] {'loss': 0.566, 'grad_norm': 3.735539966685467, 'learning_rate': 2.0412916596970918e-06, 'epoch': 0.57} 57%|█████▋ | 7043/12313 [5:16:13<4:03:05, 2.77s/it] 57%|█████▋ | 7044/12313 [5:16:16<3:58:01, 2.71s/it] {'loss': 0.4935, 'grad_norm': 3.8956911217710917, 'learning_rate': 2.040645217969606e-06, 'epoch': 0.57} 57%|█████▋ | 7044/12313 [5:16:16<3:58:01, 2.71s/it] 57%|█████▋ | 7045/12313 [5:16:18<3:53:34, 2.66s/it] {'loss': 0.4629, 'grad_norm': 20.493625970905985, 'learning_rate': 2.0399988080270384e-06, 'epoch': 0.57} 57%|█████▋ | 7045/12313 [5:16:18<3:53:34, 2.66s/it] 57%|█████▋ | 7046/12313 [5:16:21<3:54:54, 2.68s/it] {'loss': 0.4411, 'grad_norm': 29.937743604580216, 'learning_rate': 2.039352429914119e-06, 'epoch': 0.57} 57%|█████▋ | 7046/12313 [5:16:21<3:54:54, 2.68s/it] 57%|█████▋ | 7047/12313 [5:16:24<3:55:26, 2.68s/it] {'loss': 0.4187, 'grad_norm': 6.109372993647007, 'learning_rate': 2.038706083675574e-06, 'epoch': 0.57} 57%|█████▋ | 7047/12313 [5:16:24<3:55:26, 2.68s/it] 57%|█████▋ | 7048/12313 [5:16:26<3:53:23, 2.66s/it] {'loss': 0.5831, 'grad_norm': 4.3015816031863086, 'learning_rate': 2.038059769356127e-06, 'epoch': 0.57} 57%|█████▋ | 7048/12313 [5:16:26<3:53:23, 2.66s/it] 57%|█████▋ | 7049/12313 [5:16:29<3:54:46, 2.68s/it] {'loss': 0.4969, 'grad_norm': 4.347753553845655, 'learning_rate': 2.037413487000498e-06, 'epoch': 0.57} 57%|█████▋ | 7049/12313 [5:16:29<3:54:46, 2.68s/it] 57%|█████▋ | 7050/12313 [5:16:32<4:00:58, 2.75s/it] {'loss': 0.4838, 'grad_norm': 5.01369388436399, 'learning_rate': 2.0367672366534087e-06, 'epoch': 0.57} 57%|█████▋ | 7050/12313 [5:16:32<4:00:58, 2.75s/it] 57%|█████▋ | 7051/12313 [5:16:35<4:06:01, 2.81s/it] {'loss': 0.4995, 'grad_norm': 6.956995413047726, 'learning_rate': 2.036121018359574e-06, 'epoch': 0.57} 57%|█████▋ | 7051/12313 [5:16:35<4:06:01, 2.81s/it] 57%|█████▋ | 7052/12313 [5:16:38<4:06:15, 2.81s/it] {'loss': 0.5013, 'grad_norm': 7.014322634335082, 'learning_rate': 2.03547483216371e-06, 'epoch': 0.57} 57%|█████▋ | 7052/12313 [5:16:38<4:06:15, 2.81s/it] 57%|█████▋ | 7053/12313 [5:16:40<4:03:15, 2.77s/it] {'loss': 0.4124, 'grad_norm': 6.622483650279607, 'learning_rate': 2.0348286781105302e-06, 'epoch': 0.57} 57%|█████▋ | 7053/12313 [5:16:40<4:03:15, 2.77s/it] 57%|█████▋ | 7054/12313 [5:16:43<4:00:28, 2.74s/it] {'loss': 0.6059, 'grad_norm': 4.159914158121862, 'learning_rate': 2.0341825562447427e-06, 'epoch': 0.57} 57%|█████▋ | 7054/12313 [5:16:43<4:00:28, 2.74s/it] 57%|█████▋ | 7055/12313 [5:16:46<3:53:33, 2.67s/it] {'loss': 0.4856, 'grad_norm': 5.011801450523911, 'learning_rate': 2.0335364666110572e-06, 'epoch': 0.57} 57%|█████▋ | 7055/12313 [5:16:46<3:53:33, 2.67s/it] 57%|█████▋ | 7056/12313 [5:16:48<3:53:15, 2.66s/it] {'loss': 0.4473, 'grad_norm': 4.952363770440026, 'learning_rate': 2.03289040925418e-06, 'epoch': 0.57} 57%|█████▋ | 7056/12313 [5:16:48<3:53:15, 2.66s/it] 57%|█████▋ | 7057/12313 [5:16:51<3:51:55, 2.65s/it] {'loss': 0.6729, 'grad_norm': 7.83403745870332, 'learning_rate': 2.032244384218815e-06, 'epoch': 0.57} 57%|█████▋ | 7057/12313 [5:16:51<3:51:55, 2.65s/it] 57%|█████▋ | 7058/12313 [5:16:53<3:49:57, 2.63s/it] {'loss': 0.5622, 'grad_norm': 6.993413026032686, 'learning_rate': 2.031598391549662e-06, 'epoch': 0.57} 57%|█████▋ | 7058/12313 [5:16:53<3:49:57, 2.63s/it] 57%|█████▋ | 7059/12313 [5:16:56<3:51:46, 2.65s/it] {'loss': 0.5174, 'grad_norm': 6.260135249773573, 'learning_rate': 2.030952431291421e-06, 'epoch': 0.57} 57%|█████▋ | 7059/12313 [5:16:56<3:51:46, 2.65s/it] 57%|█████▋ | 7060/12313 [5:16:59<3:53:31, 2.67s/it] {'loss': 0.5199, 'grad_norm': 4.075720359030366, 'learning_rate': 2.0303065034887904e-06, 'epoch': 0.57} 57%|█████▋ | 7060/12313 [5:16:59<3:53:31, 2.67s/it] 57%|█████▋ | 7061/12313 [5:17:02<4:00:09, 2.74s/it] {'loss': 0.5269, 'grad_norm': 3.9202159123881475, 'learning_rate': 2.0296606081864634e-06, 'epoch': 0.57} 57%|█████▋ | 7061/12313 [5:17:02<4:00:09, 2.74s/it] 57%|█████▋ | 7062/12313 [5:17:05<4:05:35, 2.81s/it] {'loss': 0.4606, 'grad_norm': 4.8268240982250274, 'learning_rate': 2.0290147454291323e-06, 'epoch': 0.57} 57%|█████▋ | 7062/12313 [5:17:05<4:05:35, 2.81s/it] 57%|█████▋ | 7063/12313 [5:17:07<4:01:05, 2.76s/it] {'loss': 0.6205, 'grad_norm': 5.603064677109073, 'learning_rate': 2.0283689152614896e-06, 'epoch': 0.57} 57%|█████▋ | 7063/12313 [5:17:07<4:01:05, 2.76s/it] 57%|█████▋ | 7064/12313 [5:17:10<4:00:40, 2.75s/it] {'loss': 0.5806, 'grad_norm': 4.633814523131622, 'learning_rate': 2.0277231177282213e-06, 'epoch': 0.57} 57%|█████▋ | 7064/12313 [5:17:10<4:00:40, 2.75s/it] 57%|█████▋ | 7065/12313 [5:17:13<4:00:40, 2.75s/it] {'loss': 0.5424, 'grad_norm': 11.093950775524117, 'learning_rate': 2.0270773528740127e-06, 'epoch': 0.57} 57%|█████▋ | 7065/12313 [5:17:13<4:00:40, 2.75s/it] 57%|█████▋ | 7066/12313 [5:17:16<4:07:10, 2.83s/it] {'loss': 0.5403, 'grad_norm': 4.963416503789594, 'learning_rate': 2.02643162074355e-06, 'epoch': 0.57} 57%|█████▋ | 7066/12313 [5:17:16<4:07:10, 2.83s/it] 57%|█████▋ | 7067/12313 [5:17:18<4:01:55, 2.77s/it] {'loss': 0.649, 'grad_norm': 7.269597973886369, 'learning_rate': 2.0257859213815123e-06, 'epoch': 0.57} 57%|█████▋ | 7067/12313 [5:17:18<4:01:55, 2.77s/it] 57%|█████▋ | 7068/12313 [5:17:21<4:01:10, 2.76s/it] {'loss': 0.6087, 'grad_norm': 4.685273966619034, 'learning_rate': 2.0251402548325783e-06, 'epoch': 0.57} 57%|█████▋ | 7068/12313 [5:17:21<4:01:10, 2.76s/it] 57%|█████▋ | 7069/12313 [5:17:24<3:57:32, 2.72s/it] {'loss': 0.5899, 'grad_norm': 4.846673111295444, 'learning_rate': 2.0244946211414267e-06, 'epoch': 0.57} 57%|█████▋ | 7069/12313 [5:17:24<3:57:32, 2.72s/it] 57%|█████▋ | 7070/12313 [5:17:27<3:57:12, 2.71s/it] {'loss': 0.5397, 'grad_norm': 3.9278515943561625, 'learning_rate': 2.0238490203527307e-06, 'epoch': 0.57} 57%|█████▋ | 7070/12313 [5:17:27<3:57:12, 2.71s/it] 57%|█████▋ | 7071/12313 [5:17:29<3:54:19, 2.68s/it] {'loss': 0.5535, 'grad_norm': 4.034403858762683, 'learning_rate': 2.0232034525111617e-06, 'epoch': 0.57} 57%|█████▋ | 7071/12313 [5:17:29<3:54:19, 2.68s/it] 57%|█████▋ | 7072/12313 [5:17:32<3:51:33, 2.65s/it] {'loss': 0.4494, 'grad_norm': 6.517927780743827, 'learning_rate': 2.0225579176613905e-06, 'epoch': 0.57} 57%|█████▋ | 7072/12313 [5:17:32<3:51:33, 2.65s/it] 57%|█████▋ | 7073/12313 [5:17:34<3:50:23, 2.64s/it] {'loss': 0.3643, 'grad_norm': 3.068515611615937, 'learning_rate': 2.0219124158480853e-06, 'epoch': 0.57} 57%|█████▋ | 7073/12313 [5:17:34<3:50:23, 2.64s/it] 57%|█████▋ | 7074/12313 [5:17:37<3:48:40, 2.62s/it] {'loss': 0.4402, 'grad_norm': 3.6556023426111084, 'learning_rate': 2.0212669471159098e-06, 'epoch': 0.57} 57%|█████▋ | 7074/12313 [5:17:37<3:48:40, 2.62s/it] 57%|█████▋ | 7075/12313 [5:17:39<3:45:27, 2.58s/it] {'loss': 0.5183, 'grad_norm': 3.202736868542401, 'learning_rate': 2.020621511509528e-06, 'epoch': 0.57} 57%|█████▋ | 7075/12313 [5:17:39<3:45:27, 2.58s/it] 57%|█████▋ | 7076/12313 [5:17:42<3:52:52, 2.67s/it] {'loss': 0.477, 'grad_norm': 5.842839385752603, 'learning_rate': 2.019976109073601e-06, 'epoch': 0.57} 57%|█████▋ | 7076/12313 [5:17:42<3:52:52, 2.67s/it] 57%|█████▋ | 7077/12313 [5:17:45<3:52:14, 2.66s/it] {'loss': 0.4842, 'grad_norm': 12.847781405058441, 'learning_rate': 2.0193307398527865e-06, 'epoch': 0.57} 57%|█████▋ | 7077/12313 [5:17:45<3:52:14, 2.66s/it] 57%|█████▋ | 7078/12313 [5:17:48<3:51:08, 2.65s/it] {'loss': 0.4538, 'grad_norm': 21.13344792621275, 'learning_rate': 2.0186854038917405e-06, 'epoch': 0.57} 57%|█████▋ | 7078/12313 [5:17:48<3:51:08, 2.65s/it] 57%|█████▋ | 7079/12313 [5:17:50<3:54:27, 2.69s/it] {'loss': 0.4578, 'grad_norm': 10.50726009998217, 'learning_rate': 2.0180401012351182e-06, 'epoch': 0.57} 57%|█████▋ | 7079/12313 [5:17:50<3:54:27, 2.69s/it] 58%|█████▊ | 7080/12313 [5:17:53<3:48:47, 2.62s/it] {'loss': 0.5223, 'grad_norm': 5.987355841136969, 'learning_rate': 2.0173948319275696e-06, 'epoch': 0.58} 58%|█████▊ | 7080/12313 [5:17:53<3:48:47, 2.62s/it] 58%|█████▊ | 7081/12313 [5:17:55<3:49:40, 2.63s/it] {'loss': 0.4965, 'grad_norm': 5.722234729191682, 'learning_rate': 2.016749596013744e-06, 'epoch': 0.58} 58%|█████▊ | 7081/12313 [5:17:55<3:49:40, 2.63s/it] 58%|█████▊ | 7082/12313 [5:17:58<3:48:41, 2.62s/it] {'loss': 0.4677, 'grad_norm': 4.664393900667678, 'learning_rate': 2.0161043935382897e-06, 'epoch': 0.58} 58%|█████▊ | 7082/12313 [5:17:58<3:48:41, 2.62s/it] 58%|█████▊ | 7083/12313 [5:18:01<3:47:07, 2.61s/it] {'loss': 0.5619, 'grad_norm': 7.498350619883631, 'learning_rate': 2.0154592245458504e-06, 'epoch': 0.58} 58%|█████▊ | 7083/12313 [5:18:01<3:47:07, 2.61s/it] 58%|█████▊ | 7084/12313 [5:18:03<3:46:39, 2.60s/it] {'loss': 0.5719, 'grad_norm': 5.775102597864225, 'learning_rate': 2.014814089081067e-06, 'epoch': 0.58} 58%|█████▊ | 7084/12313 [5:18:03<3:46:39, 2.60s/it] 58%|█████▊ | 7085/12313 [5:18:06<3:46:56, 2.60s/it] {'loss': 0.7247, 'grad_norm': 5.1155695309255025, 'learning_rate': 2.014168987188582e-06, 'epoch': 0.58} 58%|█████▊ | 7085/12313 [5:18:06<3:46:56, 2.60s/it] 58%|█████▊ | 7086/12313 [5:18:09<3:52:01, 2.66s/it] {'loss': 0.6624, 'grad_norm': 4.2278332646738175, 'learning_rate': 2.0135239189130325e-06, 'epoch': 0.58} 58%|█████▊ | 7086/12313 [5:18:09<3:52:01, 2.66s/it] 58%|█████▊ | 7087/12313 [5:18:11<3:50:29, 2.65s/it] {'loss': 0.7632, 'grad_norm': 3.7678111182822334, 'learning_rate': 2.0128788842990516e-06, 'epoch': 0.58} 58%|█████▊ | 7087/12313 [5:18:11<3:50:29, 2.65s/it] 58%|█████▊ | 7088/12313 [5:18:14<3:52:41, 2.67s/it] {'loss': 0.4833, 'grad_norm': 11.009075470970467, 'learning_rate': 2.0122338833912743e-06, 'epoch': 0.58} 58%|█████▊ | 7088/12313 [5:18:14<3:52:41, 2.67s/it] 58%|█████▊ | 7089/12313 [5:18:16<3:46:16, 2.60s/it] {'loss': 0.4961, 'grad_norm': 4.5002280122486535, 'learning_rate': 2.0115889162343316e-06, 'epoch': 0.58} 58%|█████▊ | 7089/12313 [5:18:16<3:46:16, 2.60s/it] 58%|█████▊ | 7090/12313 [5:18:19<3:45:33, 2.59s/it] {'loss': 0.475, 'grad_norm': 4.760191730901538, 'learning_rate': 2.01094398287285e-06, 'epoch': 0.58} 58%|█████▊ | 7090/12313 [5:18:19<3:45:33, 2.59s/it] 58%|█████▊ | 7091/12313 [5:18:22<3:56:36, 2.72s/it] {'loss': 0.353, 'grad_norm': 3.118348188458208, 'learning_rate': 2.010299083351457e-06, 'epoch': 0.58} 58%|█████▊ | 7091/12313 [5:18:22<3:56:36, 2.72s/it] 58%|█████▊ | 7092/12313 [5:18:25<3:56:21, 2.72s/it] {'loss': 0.5105, 'grad_norm': 6.734823400251969, 'learning_rate': 2.009654217714776e-06, 'epoch': 0.58} 58%|█████▊ | 7092/12313 [5:18:25<3:56:21, 2.72s/it] 58%|█████▊ | 7093/12313 [5:18:27<3:57:11, 2.73s/it] {'loss': 0.5659, 'grad_norm': 5.003848469090975, 'learning_rate': 2.0090093860074273e-06, 'epoch': 0.58} 58%|█████▊ | 7093/12313 [5:18:27<3:57:11, 2.73s/it] 58%|█████▊ | 7094/12313 [5:18:30<3:59:34, 2.75s/it] {'loss': 0.5335, 'grad_norm': 3.690892599491099, 'learning_rate': 2.008364588274031e-06, 'epoch': 0.58} 58%|█████▊ | 7094/12313 [5:18:30<3:59:34, 2.75s/it] 58%|█████▊ | 7095/12313 [5:18:33<4:01:05, 2.77s/it] {'loss': 0.5175, 'grad_norm': 5.097697074216427, 'learning_rate': 2.0077198245592033e-06, 'epoch': 0.58} 58%|█████▊ | 7095/12313 [5:18:33<4:01:05, 2.77s/it] 58%|█████▊ | 7096/12313 [5:18:36<3:56:42, 2.72s/it] {'loss': 0.5674, 'grad_norm': 3.9972301907638133, 'learning_rate': 2.0070750949075584e-06, 'epoch': 0.58} 58%|█████▊ | 7096/12313 [5:18:36<3:56:42, 2.72s/it] 58%|█████▊ | 7097/12313 [5:18:38<3:58:36, 2.74s/it] {'loss': 0.4085, 'grad_norm': 3.765506245804216, 'learning_rate': 2.0064303993637073e-06, 'epoch': 0.58} 58%|█████▊ | 7097/12313 [5:18:38<3:58:36, 2.74s/it] 58%|█████▊ | 7098/12313 [5:18:41<3:51:18, 2.66s/it] {'loss': 0.5775, 'grad_norm': 5.010605186017302, 'learning_rate': 2.005785737972262e-06, 'epoch': 0.58} 58%|█████▊ | 7098/12313 [5:18:41<3:51:18, 2.66s/it] 58%|█████▊ | 7099/12313 [5:18:44<3:53:08, 2.68s/it] {'loss': 0.5185, 'grad_norm': 3.786228217501147, 'learning_rate': 2.0051411107778273e-06, 'epoch': 0.58} 58%|█████▊ | 7099/12313 [5:18:44<3:53:08, 2.68s/it] 58%|█████▊ | 7100/12313 [5:18:46<3:53:25, 2.69s/it] {'loss': 0.4735, 'grad_norm': 9.0650925529505, 'learning_rate': 2.004496517825008e-06, 'epoch': 0.58} 58%|█████▊ | 7100/12313 [5:18:46<3:53:25, 2.69s/it] 58%|█████▊ | 7101/12313 [5:18:49<3:54:55, 2.70s/it] {'loss': 0.5995, 'grad_norm': 5.189793367646856, 'learning_rate': 2.0038519591584078e-06, 'epoch': 0.58} 58%|█████▊ | 7101/12313 [5:18:49<3:54:55, 2.70s/it] 58%|█████▊ | 7102/12313 [5:18:52<3:58:45, 2.75s/it] {'loss': 0.5761, 'grad_norm': 4.700410500709092, 'learning_rate': 2.0032074348226268e-06, 'epoch': 0.58} 58%|█████▊ | 7102/12313 [5:18:52<3:58:45, 2.75s/it] 58%|█████▊ | 7103/12313 [5:18:54<3:50:31, 2.65s/it] {'loss': 0.5744, 'grad_norm': 4.234749790705928, 'learning_rate': 2.002562944862261e-06, 'epoch': 0.58} 58%|█████▊ | 7103/12313 [5:18:54<3:50:31, 2.65s/it] 58%|█████▊ | 7104/12313 [5:18:57<3:45:43, 2.60s/it] {'loss': 0.5361, 'grad_norm': 3.629545684362975, 'learning_rate': 2.0019184893219076e-06, 'epoch': 0.58} 58%|█████▊ | 7104/12313 [5:18:57<3:45:43, 2.60s/it] 58%|█████▊ | 7105/12313 [5:18:59<3:45:31, 2.60s/it] {'loss': 0.424, 'grad_norm': 5.227066105381711, 'learning_rate': 2.0012740682461585e-06, 'epoch': 0.58} 58%|█████▊ | 7105/12313 [5:18:59<3:45:31, 2.60s/it] 58%|█████▊ | 7106/12313 [5:19:02<3:42:45, 2.57s/it] {'loss': 0.4179, 'grad_norm': 8.03973120733085, 'learning_rate': 2.0006296816796037e-06, 'epoch': 0.58} 58%|█████▊ | 7106/12313 [5:19:02<3:42:45, 2.57s/it] 58%|█████▊ | 7107/12313 [5:19:05<3:49:04, 2.64s/it] {'loss': 0.657, 'grad_norm': 3.3013104736748575, 'learning_rate': 1.9999853296668326e-06, 'epoch': 0.58} 58%|█████▊ | 7107/12313 [5:19:05<3:49:04, 2.64s/it] 58%|█████▊ | 7108/12313 [5:19:07<3:51:55, 2.67s/it] {'loss': 0.4993, 'grad_norm': 4.319311268255759, 'learning_rate': 1.999341012252431e-06, 'epoch': 0.58} 58%|█████▊ | 7108/12313 [5:19:07<3:51:55, 2.67s/it] 58%|█████▊ | 7109/12313 [5:19:10<4:00:09, 2.77s/it] {'loss': 0.474, 'grad_norm': 3.6319991209865945, 'learning_rate': 1.9986967294809804e-06, 'epoch': 0.58} 58%|█████▊ | 7109/12313 [5:19:10<4:00:09, 2.77s/it] 58%|█████▊ | 7110/12313 [5:19:13<3:56:21, 2.73s/it] {'loss': 0.5666, 'grad_norm': 4.960374042694389, 'learning_rate': 1.9980524813970635e-06, 'epoch': 0.58} 58%|█████▊ | 7110/12313 [5:19:13<3:56:21, 2.73s/it] 58%|█████▊ | 7111/12313 [5:19:16<3:54:24, 2.70s/it] {'loss': 0.6228, 'grad_norm': 2.9962262052676314, 'learning_rate': 1.997408268045259e-06, 'epoch': 0.58} 58%|█████▊ | 7111/12313 [5:19:16<3:54:24, 2.70s/it] 58%|█████▊ | 7112/12313 [5:19:19<3:55:58, 2.72s/it] {'loss': 0.5147, 'grad_norm': 5.668495286840983, 'learning_rate': 1.9967640894701424e-06, 'epoch': 0.58} 58%|█████▊ | 7112/12313 [5:19:19<3:55:58, 2.72s/it] 58%|█████▊ | 7113/12313 [5:19:21<3:55:28, 2.72s/it] {'loss': 0.4194, 'grad_norm': 4.163547326343956, 'learning_rate': 1.9961199457162867e-06, 'epoch': 0.58} 58%|█████▊ | 7113/12313 [5:19:21<3:55:28, 2.72s/it] 58%|█████▊ | 7114/12313 [5:19:24<3:55:20, 2.72s/it] {'loss': 0.5275, 'grad_norm': 3.4932620061985555, 'learning_rate': 1.995475836828264e-06, 'epoch': 0.58} 58%|█████▊ | 7114/12313 [5:19:24<3:55:20, 2.72s/it] 58%|█████▊ | 7115/12313 [5:19:27<3:52:44, 2.69s/it] {'loss': 0.4277, 'grad_norm': 4.368032739037839, 'learning_rate': 1.9948317628506444e-06, 'epoch': 0.58} 58%|█████▊ | 7115/12313 [5:19:27<3:52:44, 2.69s/it] 58%|█████▊ | 7116/12313 [5:19:29<3:50:09, 2.66s/it] {'loss': 0.4987, 'grad_norm': 3.876730464227753, 'learning_rate': 1.994187723827992e-06, 'epoch': 0.58} 58%|█████▊ | 7116/12313 [5:19:29<3:50:09, 2.66s/it] 58%|█████▊ | 7117/12313 [5:19:32<3:48:43, 2.64s/it] {'loss': 0.5308, 'grad_norm': 3.2062830745638506, 'learning_rate': 1.9935437198048722e-06, 'epoch': 0.58} 58%|█████▊ | 7117/12313 [5:19:32<3:48:43, 2.64s/it] 58%|█████▊ | 7118/12313 [5:19:34<3:50:14, 2.66s/it] {'loss': 0.464, 'grad_norm': 7.356711618544573, 'learning_rate': 1.9928997508258475e-06, 'epoch': 0.58} 58%|█████▊ | 7118/12313 [5:19:34<3:50:14, 2.66s/it] 58%|█████▊ | 7119/12313 [5:19:37<3:53:26, 2.70s/it] {'loss': 0.4484, 'grad_norm': 5.484693237704757, 'learning_rate': 1.9922558169354752e-06, 'epoch': 0.58} 58%|█████▊ | 7119/12313 [5:19:37<3:53:26, 2.70s/it] 58%|█████▊ | 7120/12313 [5:19:40<3:55:58, 2.73s/it] {'loss': 0.4322, 'grad_norm': 6.681361631875744, 'learning_rate': 1.9916119181783135e-06, 'epoch': 0.58} 58%|█████▊ | 7120/12313 [5:19:40<3:55:58, 2.73s/it] 58%|█████▊ | 7121/12313 [5:19:43<3:49:22, 2.65s/it] {'loss': 0.567, 'grad_norm': 4.438508741670482, 'learning_rate': 1.9909680545989175e-06, 'epoch': 0.58} 58%|█████▊ | 7121/12313 [5:19:43<3:49:22, 2.65s/it] 58%|█████▊ | 7122/12313 [5:19:45<3:47:47, 2.63s/it] {'loss': 0.5764, 'grad_norm': 5.678949037298267, 'learning_rate': 1.9903242262418366e-06, 'epoch': 0.58} 58%|█████▊ | 7122/12313 [5:19:45<3:47:47, 2.63s/it] 58%|█████▊ | 7123/12313 [5:19:47<3:39:11, 2.53s/it] {'loss': 0.7083, 'grad_norm': 4.194676378904089, 'learning_rate': 1.989680433151622e-06, 'epoch': 0.58} 58%|█████▊ | 7123/12313 [5:19:47<3:39:11, 2.53s/it] 58%|█████▊ | 7124/12313 [5:19:50<3:43:45, 2.59s/it] {'loss': 0.5588, 'grad_norm': 8.002991142166685, 'learning_rate': 1.989036675372822e-06, 'epoch': 0.58} 58%|█████▊ | 7124/12313 [5:19:50<3:43:45, 2.59s/it] 58%|█████▊ | 7125/12313 [5:19:53<3:45:07, 2.60s/it] {'loss': 0.5561, 'grad_norm': 3.764738047010275, 'learning_rate': 1.988392952949978e-06, 'epoch': 0.58} 58%|█████▊ | 7125/12313 [5:19:53<3:45:07, 2.60s/it] 58%|█████▊ | 7126/12313 [5:19:55<3:48:41, 2.65s/it] {'loss': 0.473, 'grad_norm': 7.746991321514468, 'learning_rate': 1.9877492659276353e-06, 'epoch': 0.58} 58%|█████▊ | 7126/12313 [5:19:55<3:48:41, 2.65s/it] 58%|█████▊ | 7127/12313 [5:19:58<3:42:51, 2.58s/it] {'loss': 0.417, 'grad_norm': 5.541041864204196, 'learning_rate': 1.9871056143503322e-06, 'epoch': 0.58} 58%|█████▊ | 7127/12313 [5:19:58<3:42:51, 2.58s/it] 58%|█████▊ | 7128/12313 [5:20:00<3:42:42, 2.58s/it] {'loss': 0.743, 'grad_norm': 3.424180901190808, 'learning_rate': 1.9864619982626064e-06, 'epoch': 0.58} 58%|█████▊ | 7128/12313 [5:20:00<3:42:42, 2.58s/it] 58%|█████▊ | 7129/12313 [5:20:04<3:55:23, 2.72s/it] {'loss': 0.6825, 'grad_norm': 16.31331645341726, 'learning_rate': 1.9858184177089915e-06, 'epoch': 0.58} 58%|█████▊ | 7129/12313 [5:20:04<3:55:23, 2.72s/it] 58%|█████▊ | 7130/12313 [5:20:06<3:53:20, 2.70s/it] {'loss': 0.5404, 'grad_norm': 3.593693975763408, 'learning_rate': 1.9851748727340214e-06, 'epoch': 0.58} 58%|█████▊ | 7130/12313 [5:20:06<3:53:20, 2.70s/it] 58%|█████▊ | 7131/12313 [5:20:09<3:47:15, 2.63s/it] {'loss': 0.5591, 'grad_norm': 6.678160792934489, 'learning_rate': 1.9845313633822255e-06, 'epoch': 0.58} 58%|█████▊ | 7131/12313 [5:20:09<3:47:15, 2.63s/it] 58%|█████▊ | 7132/12313 [5:20:12<3:57:23, 2.75s/it] {'loss': 0.5188, 'grad_norm': 4.456993833787158, 'learning_rate': 1.9838878896981303e-06, 'epoch': 0.58} 58%|█████▊ | 7132/12313 [5:20:12<3:57:23, 2.75s/it] 58%|█████▊ | 7133/12313 [5:20:14<3:52:41, 2.70s/it] {'loss': 0.3748, 'grad_norm': 4.230178294928199, 'learning_rate': 1.9832444517262625e-06, 'epoch': 0.58} 58%|█████▊ | 7133/12313 [5:20:14<3:52:41, 2.70s/it] 58%|█████▊ | 7134/12313 [5:20:17<3:49:35, 2.66s/it] {'loss': 0.4318, 'grad_norm': 4.218687976196768, 'learning_rate': 1.982601049511144e-06, 'epoch': 0.58} 58%|█████▊ | 7134/12313 [5:20:17<3:49:35, 2.66s/it] 58%|█████▊ | 7135/12313 [5:20:20<3:52:01, 2.69s/it] {'loss': 0.5105, 'grad_norm': 3.9051525132078386, 'learning_rate': 1.9819576830972938e-06, 'epoch': 0.58} 58%|█████▊ | 7135/12313 [5:20:20<3:52:01, 2.69s/it] 58%|█████▊ | 7136/12313 [5:20:22<3:51:55, 2.69s/it] {'loss': 0.5147, 'grad_norm': 5.953375128637914, 'learning_rate': 1.9813143525292304e-06, 'epoch': 0.58} 58%|█████▊ | 7136/12313 [5:20:22<3:51:55, 2.69s/it] 58%|█████▊ | 7137/12313 [5:20:25<3:52:37, 2.70s/it] {'loss': 0.5801, 'grad_norm': 5.299949856256123, 'learning_rate': 1.980671057851469e-06, 'epoch': 0.58} 58%|█████▊ | 7137/12313 [5:20:25<3:52:37, 2.70s/it] 58%|█████▊ | 7138/12313 [5:20:28<3:52:41, 2.70s/it] {'loss': 0.484, 'grad_norm': 5.575356667344601, 'learning_rate': 1.9800277991085217e-06, 'epoch': 0.58} 58%|█████▊ | 7138/12313 [5:20:28<3:52:41, 2.70s/it] 58%|█████▊ | 7139/12313 [5:20:30<3:52:49, 2.70s/it] {'loss': 0.3816, 'grad_norm': 4.75965558776345, 'learning_rate': 1.9793845763448987e-06, 'epoch': 0.58} 58%|█████▊ | 7139/12313 [5:20:30<3:52:49, 2.70s/it] 58%|█████▊ | 7140/12313 [5:20:33<3:48:48, 2.65s/it] {'loss': 0.4796, 'grad_norm': 3.335625778162441, 'learning_rate': 1.9787413896051084e-06, 'epoch': 0.58} 58%|█████▊ | 7140/12313 [5:20:33<3:48:48, 2.65s/it] 58%|█████▊ | 7141/12313 [5:20:36<3:51:51, 2.69s/it] {'loss': 0.4803, 'grad_norm': 4.108846663125449, 'learning_rate': 1.978098238933654e-06, 'epoch': 0.58} 58%|█████▊ | 7141/12313 [5:20:36<3:51:51, 2.69s/it] 58%|█████▊ | 7142/12313 [5:20:38<3:45:04, 2.61s/it] {'loss': 0.3848, 'grad_norm': 4.357715753076788, 'learning_rate': 1.9774551243750403e-06, 'epoch': 0.58} 58%|█████▊ | 7142/12313 [5:20:38<3:45:04, 2.61s/it] 58%|█████▊ | 7143/12313 [5:20:41<3:43:14, 2.59s/it] {'loss': 0.4834, 'grad_norm': 4.9420482058322435, 'learning_rate': 1.9768120459737663e-06, 'epoch': 0.58} 58%|█████▊ | 7143/12313 [5:20:41<3:43:14, 2.59s/it] 58%|█████▊ | 7144/12313 [5:20:43<3:46:31, 2.63s/it] {'loss': 0.4072, 'grad_norm': 3.610962949080674, 'learning_rate': 1.9761690037743293e-06, 'epoch': 0.58} 58%|█████▊ | 7144/12313 [5:20:43<3:46:31, 2.63s/it] 58%|█████▊ | 7145/12313 [5:20:46<3:45:30, 2.62s/it] {'loss': 0.5826, 'grad_norm': 4.466463892907367, 'learning_rate': 1.9755259978212253e-06, 'epoch': 0.58} 58%|█████▊ | 7145/12313 [5:20:46<3:45:30, 2.62s/it] 58%|█████▊ | 7146/12313 [5:20:48<3:39:42, 2.55s/it] {'loss': 0.6036, 'grad_norm': 5.2380939327824345, 'learning_rate': 1.9748830281589464e-06, 'epoch': 0.58} 58%|█████▊ | 7146/12313 [5:20:48<3:39:42, 2.55s/it] 58%|█████▊ | 7147/12313 [5:20:51<3:40:52, 2.57s/it] {'loss': 0.6842, 'grad_norm': 3.3553525016104704, 'learning_rate': 1.9742400948319838e-06, 'epoch': 0.58} 58%|█████▊ | 7147/12313 [5:20:51<3:40:52, 2.57s/it] 58%|█████▊ | 7148/12313 [5:20:54<3:45:37, 2.62s/it] {'loss': 0.5493, 'grad_norm': 5.736174896340501, 'learning_rate': 1.9735971978848224e-06, 'epoch': 0.58} 58%|█████▊ | 7148/12313 [5:20:54<3:45:37, 2.62s/it] 58%|█████▊ | 7149/12313 [5:20:56<3:48:04, 2.65s/it] {'loss': 0.4979, 'grad_norm': 3.703679606497901, 'learning_rate': 1.9729543373619497e-06, 'epoch': 0.58} 58%|█████▊ | 7149/12313 [5:20:56<3:48:04, 2.65s/it] 58%|█████▊ | 7150/12313 [5:20:59<3:47:20, 2.64s/it] {'loss': 0.45, 'grad_norm': 9.728903722310404, 'learning_rate': 1.972311513307848e-06, 'epoch': 0.58} 58%|█████▊ | 7150/12313 [5:20:59<3:47:20, 2.64s/it] 58%|█████▊ | 7151/12313 [5:21:02<3:45:48, 2.62s/it] {'loss': 0.7753, 'grad_norm': 5.187888574328153, 'learning_rate': 1.971668725766996e-06, 'epoch': 0.58} 58%|█████▊ | 7151/12313 [5:21:02<3:45:48, 2.62s/it] 58%|█████▊ | 7152/12313 [5:21:04<3:46:48, 2.64s/it] {'loss': 0.675, 'grad_norm': 3.3139229056956396, 'learning_rate': 1.971025974783872e-06, 'epoch': 0.58} 58%|█████▊ | 7152/12313 [5:21:04<3:46:48, 2.64s/it] 58%|█████▊ | 7153/12313 [5:21:08<4:04:06, 2.84s/it] {'loss': 0.5264, 'grad_norm': 9.118516211139692, 'learning_rate': 1.9703832604029523e-06, 'epoch': 0.58} 58%|█████▊ | 7153/12313 [5:21:08<4:04:06, 2.84s/it] 58%|█████▊ | 7154/12313 [5:21:10<3:53:49, 2.72s/it] {'loss': 0.5034, 'grad_norm': 4.415066688147641, 'learning_rate': 1.9697405826687063e-06, 'epoch': 0.58} 58%|█████▊ | 7154/12313 [5:21:10<3:53:49, 2.72s/it] 58%|█████▊ | 7155/12313 [5:21:13<3:58:29, 2.77s/it] {'loss': 0.5316, 'grad_norm': 5.533151171983695, 'learning_rate': 1.9690979416256062e-06, 'epoch': 0.58} 58%|█████▊ | 7155/12313 [5:21:13<3:58:29, 2.77s/it] 58%|█████▊ | 7156/12313 [5:21:16<3:56:48, 2.76s/it] {'loss': 0.4886, 'grad_norm': 3.4203284332256274, 'learning_rate': 1.9684553373181197e-06, 'epoch': 0.58} 58%|█████▊ | 7156/12313 [5:21:16<3:56:48, 2.76s/it] 58%|█████▊ | 7157/12313 [5:21:18<3:53:26, 2.72s/it] {'loss': 0.5435, 'grad_norm': 7.5906819086747435, 'learning_rate': 1.967812769790709e-06, 'epoch': 0.58} 58%|█████▊ | 7157/12313 [5:21:18<3:53:26, 2.72s/it] 58%|█████▊ | 7158/12313 [5:21:21<3:47:25, 2.65s/it] {'loss': 0.4718, 'grad_norm': 5.6092591367647815, 'learning_rate': 1.9671702390878396e-06, 'epoch': 0.58} 58%|█████▊ | 7158/12313 [5:21:21<3:47:25, 2.65s/it] 58%|█████▊ | 7159/12313 [5:21:23<3:46:01, 2.63s/it] {'loss': 0.5233, 'grad_norm': 4.25266168413679, 'learning_rate': 1.9665277452539696e-06, 'epoch': 0.58} 58%|█████▊ | 7159/12313 [5:21:23<3:46:01, 2.63s/it] 58%|█████▊ | 7160/12313 [5:21:26<3:45:54, 2.63s/it] {'loss': 0.4787, 'grad_norm': 3.843541994359124, 'learning_rate': 1.965885288333555e-06, 'epoch': 0.58} 58%|█████▊ | 7160/12313 [5:21:26<3:45:54, 2.63s/it] 58%|█████▊ | 7161/12313 [5:21:29<3:48:07, 2.66s/it] {'loss': 0.6732, 'grad_norm': 4.549147478757539, 'learning_rate': 1.965242868371053e-06, 'epoch': 0.58} 58%|█████▊ | 7161/12313 [5:21:29<3:48:07, 2.66s/it] 58%|█████▊ | 7162/12313 [5:21:31<3:47:54, 2.65s/it] {'loss': 0.3952, 'grad_norm': 7.3370587404827345, 'learning_rate': 1.9646004854109136e-06, 'epoch': 0.58} 58%|█████▊ | 7162/12313 [5:21:31<3:47:54, 2.65s/it] 58%|█████▊ | 7163/12313 [5:21:34<3:49:21, 2.67s/it] {'loss': 0.5612, 'grad_norm': 4.832489166222658, 'learning_rate': 1.963958139497588e-06, 'epoch': 0.58} 58%|█████▊ | 7163/12313 [5:21:34<3:49:21, 2.67s/it] 58%|█████▊ | 7164/12313 [5:21:37<3:50:35, 2.69s/it] {'loss': 0.4501, 'grad_norm': 4.612979610999288, 'learning_rate': 1.9633158306755206e-06, 'epoch': 0.58} 58%|█████▊ | 7164/12313 [5:21:37<3:50:35, 2.69s/it] 58%|█████▊ | 7165/12313 [5:21:40<3:52:20, 2.71s/it] {'loss': 0.479, 'grad_norm': 3.842889295187542, 'learning_rate': 1.962673558989158e-06, 'epoch': 0.58} 58%|█████▊ | 7165/12313 [5:21:40<3:52:20, 2.71s/it] 58%|█████▊ | 7166/12313 [5:21:42<3:52:50, 2.71s/it] {'loss': 0.5962, 'grad_norm': 3.615074737738207, 'learning_rate': 1.9620313244829423e-06, 'epoch': 0.58} 58%|█████▊ | 7166/12313 [5:21:42<3:52:50, 2.71s/it] 58%|█████▊ | 7167/12313 [5:21:45<3:54:30, 2.73s/it] {'loss': 0.5366, 'grad_norm': 8.385548343766278, 'learning_rate': 1.961389127201311e-06, 'epoch': 0.58} 58%|█████▊ | 7167/12313 [5:21:45<3:54:30, 2.73s/it] 58%|█████▊ | 7168/12313 [5:21:48<3:55:16, 2.74s/it] {'loss': 0.3576, 'grad_norm': 4.0857765219914945, 'learning_rate': 1.9607469671887015e-06, 'epoch': 0.58} 58%|█████▊ | 7168/12313 [5:21:48<3:55:16, 2.74s/it] 58%|█████▊ | 7169/12313 [5:21:51<3:54:46, 2.74s/it] {'loss': 0.5, 'grad_norm': 4.7701105433833995, 'learning_rate': 1.960104844489548e-06, 'epoch': 0.58} 58%|█████▊ | 7169/12313 [5:21:51<3:54:46, 2.74s/it] 58%|█████▊ | 7170/12313 [5:21:53<3:53:04, 2.72s/it] {'loss': 0.6516, 'grad_norm': 4.079111419609443, 'learning_rate': 1.9594627591482817e-06, 'epoch': 0.58} 58%|█████▊ | 7170/12313 [5:21:53<3:53:04, 2.72s/it] 58%|█████▊ | 7171/12313 [5:21:56<3:56:02, 2.75s/it] {'loss': 0.5357, 'grad_norm': 5.075085483225634, 'learning_rate': 1.9588207112093324e-06, 'epoch': 0.58} 58%|█████▊ | 7171/12313 [5:21:56<3:56:02, 2.75s/it] 58%|█████▊ | 7172/12313 [5:21:59<3:58:14, 2.78s/it] {'loss': 0.4631, 'grad_norm': 3.9201086563815624, 'learning_rate': 1.958178700717125e-06, 'epoch': 0.58} 58%|█████▊ | 7172/12313 [5:21:59<3:58:14, 2.78s/it] 58%|█████▊ | 7173/12313 [5:22:01<3:48:58, 2.67s/it] {'loss': 0.4648, 'grad_norm': 4.27616202832829, 'learning_rate': 1.957536727716084e-06, 'epoch': 0.58} 58%|█████▊ | 7173/12313 [5:22:01<3:48:58, 2.67s/it] 58%|█████▊ | 7174/12313 [5:22:04<3:48:23, 2.67s/it] {'loss': 0.5446, 'grad_norm': 4.882524062302524, 'learning_rate': 1.956894792250631e-06, 'epoch': 0.58} 58%|█████▊ | 7174/12313 [5:22:04<3:48:23, 2.67s/it] 58%|█████▊ | 7175/12313 [5:22:07<3:48:59, 2.67s/it] {'loss': 0.54, 'grad_norm': 3.8190132907051435, 'learning_rate': 1.9562528943651837e-06, 'epoch': 0.58} 58%|█████▊ | 7175/12313 [5:22:07<3:48:59, 2.67s/it] 58%|█████▊ | 7176/12313 [5:22:09<3:49:33, 2.68s/it] {'loss': 0.4492, 'grad_norm': 6.140657105349779, 'learning_rate': 1.955611034104158e-06, 'epoch': 0.58} 58%|█████▊ | 7176/12313 [5:22:09<3:49:33, 2.68s/it] 58%|█████▊ | 7177/12313 [5:22:12<3:56:10, 2.76s/it] {'loss': 0.5207, 'grad_norm': 4.341900527053686, 'learning_rate': 1.9549692115119685e-06, 'epoch': 0.58} 58%|█████▊ | 7177/12313 [5:22:12<3:56:10, 2.76s/it] 58%|█████▊ | 7178/12313 [5:22:15<3:51:12, 2.70s/it] {'loss': 0.4006, 'grad_norm': 3.7483344786953685, 'learning_rate': 1.9543274266330244e-06, 'epoch': 0.58} 58%|█████▊ | 7178/12313 [5:22:15<3:51:12, 2.70s/it] 58%|█████▊ | 7179/12313 [5:22:18<3:54:32, 2.74s/it] {'loss': 0.5293, 'grad_norm': 3.3504281004368854, 'learning_rate': 1.9536856795117344e-06, 'epoch': 0.58} 58%|█████▊ | 7179/12313 [5:22:18<3:54:32, 2.74s/it] 58%|█████▊ | 7180/12313 [5:22:20<3:53:03, 2.72s/it] {'loss': 0.6898, 'grad_norm': 6.428753807471014, 'learning_rate': 1.9530439701925046e-06, 'epoch': 0.58} 58%|█████▊ | 7180/12313 [5:22:20<3:53:03, 2.72s/it] 58%|█████▊ | 7181/12313 [5:22:23<3:48:55, 2.68s/it] {'loss': 0.3292, 'grad_norm': 9.364262509671228, 'learning_rate': 1.952402298719737e-06, 'epoch': 0.58} 58%|█████▊ | 7181/12313 [5:22:23<3:48:55, 2.68s/it] 58%|█████▊ | 7182/12313 [5:22:26<3:56:50, 2.77s/it] {'loss': 0.4671, 'grad_norm': 3.942276858330493, 'learning_rate': 1.951760665137832e-06, 'epoch': 0.58} 58%|█████▊ | 7182/12313 [5:22:26<3:56:50, 2.77s/it] 58%|█████▊ | 7183/12313 [5:22:29<3:56:12, 2.76s/it] {'loss': 0.4731, 'grad_norm': 6.118096625229616, 'learning_rate': 1.9511190694911875e-06, 'epoch': 0.58} 58%|█████▊ | 7183/12313 [5:22:29<3:56:12, 2.76s/it] 58%|█████▊ | 7184/12313 [5:22:31<3:49:53, 2.69s/it] {'loss': 0.4638, 'grad_norm': 4.555831582312394, 'learning_rate': 1.9504775118241987e-06, 'epoch': 0.58} 58%|█████▊ | 7184/12313 [5:22:31<3:49:53, 2.69s/it] 58%|█████▊ | 7185/12313 [5:22:34<3:48:00, 2.67s/it] {'loss': 0.6663, 'grad_norm': 3.4499740673047214, 'learning_rate': 1.9498359921812583e-06, 'epoch': 0.58} 58%|█████▊ | 7185/12313 [5:22:34<3:48:00, 2.67s/it] 58%|█████▊ | 7186/12313 [5:22:37<3:51:36, 2.71s/it] {'loss': 0.4266, 'grad_norm': 6.8178554531301705, 'learning_rate': 1.9491945106067544e-06, 'epoch': 0.58} 58%|█████▊ | 7186/12313 [5:22:37<3:51:36, 2.71s/it] 58%|█████▊ | 7187/12313 [5:22:39<3:50:11, 2.69s/it] {'loss': 0.5886, 'grad_norm': 4.578346378430701, 'learning_rate': 1.948553067145076e-06, 'epoch': 0.58} 58%|█████▊ | 7187/12313 [5:22:39<3:50:11, 2.69s/it] 58%|█████▊ | 7188/12313 [5:22:42<3:51:26, 2.71s/it] {'loss': 0.3962, 'grad_norm': 3.0536821725267718, 'learning_rate': 1.947911661840607e-06, 'epoch': 0.58} 58%|█████▊ | 7188/12313 [5:22:42<3:51:26, 2.71s/it] 58%|█████▊ | 7189/12313 [5:22:45<3:50:24, 2.70s/it] {'loss': 0.3495, 'grad_norm': 10.782933310564555, 'learning_rate': 1.947270294737728e-06, 'epoch': 0.58} 58%|█████▊ | 7189/12313 [5:22:45<3:50:24, 2.70s/it] 58%|█████▊ | 7190/12313 [5:22:48<3:59:55, 2.81s/it] {'loss': 0.4919, 'grad_norm': 4.129900561498041, 'learning_rate': 1.9466289658808207e-06, 'epoch': 0.58} 58%|█████▊ | 7190/12313 [5:22:48<3:59:55, 2.81s/it] 58%|█████▊ | 7191/12313 [5:22:51<3:56:16, 2.77s/it] {'loss': 0.541, 'grad_norm': 6.249402586479176, 'learning_rate': 1.9459876753142593e-06, 'epoch': 0.58} 58%|█████▊ | 7191/12313 [5:22:51<3:56:16, 2.77s/it] 58%|█████▊ | 7192/12313 [5:22:53<3:57:02, 2.78s/it] {'loss': 0.4936, 'grad_norm': 2.926784134616971, 'learning_rate': 1.9453464230824186e-06, 'epoch': 0.58} 58%|█████▊ | 7192/12313 [5:22:53<3:57:02, 2.78s/it] 58%|█████▊ | 7193/12313 [5:22:56<3:55:44, 2.76s/it] {'loss': 0.5192, 'grad_norm': 3.6484600396561433, 'learning_rate': 1.9447052092296712e-06, 'epoch': 0.58} 58%|█████▊ | 7193/12313 [5:22:56<3:55:44, 2.76s/it] 58%|█████▊ | 7194/12313 [5:22:59<4:03:37, 2.86s/it] {'loss': 0.4092, 'grad_norm': 3.898125434059147, 'learning_rate': 1.9440640338003835e-06, 'epoch': 0.58} 58%|█████▊ | 7194/12313 [5:22:59<4:03:37, 2.86s/it] 58%|█████▊ | 7195/12313 [5:23:02<3:55:48, 2.76s/it] {'loss': 0.5507, 'grad_norm': 3.683502965273959, 'learning_rate': 1.943422896838922e-06, 'epoch': 0.58} 58%|█████▊ | 7195/12313 [5:23:02<3:55:48, 2.76s/it] 58%|█████▊ | 7196/12313 [5:23:05<3:57:45, 2.79s/it] {'loss': 0.4068, 'grad_norm': 5.288308403487764, 'learning_rate': 1.9427817983896518e-06, 'epoch': 0.58} 58%|█████▊ | 7196/12313 [5:23:05<3:57:45, 2.79s/it] 58%|█████▊ | 7197/12313 [5:23:08<4:06:28, 2.89s/it] {'loss': 0.7837, 'grad_norm': 5.445064431182992, 'learning_rate': 1.942140738496931e-06, 'epoch': 0.58} 58%|█████▊ | 7197/12313 [5:23:08<4:06:28, 2.89s/it] 58%|█████▊ | 7198/12313 [5:23:10<4:00:36, 2.82s/it] {'loss': 0.3739, 'grad_norm': 6.058709142095907, 'learning_rate': 1.9414997172051184e-06, 'epoch': 0.58} 58%|█████▊ | 7198/12313 [5:23:10<4:00:36, 2.82s/it] 58%|█████▊ | 7199/12313 [5:23:13<3:58:41, 2.80s/it] {'loss': 0.5194, 'grad_norm': 3.970322164486651, 'learning_rate': 1.9408587345585707e-06, 'epoch': 0.58} 58%|█████▊ | 7199/12313 [5:23:13<3:58:41, 2.80s/it] 58%|█████▊ | 7200/12313 [5:23:15<3:49:27, 2.69s/it] {'loss': 0.4401, 'grad_norm': 5.3016872496159495, 'learning_rate': 1.9402177906016395e-06, 'epoch': 0.58} 58%|█████▊ | 7200/12313 [5:23:15<3:49:27, 2.69s/it] 58%|█████▊ | 7201/12313 [5:23:18<3:49:32, 2.69s/it] {'loss': 0.398, 'grad_norm': 4.775951466036485, 'learning_rate': 1.939576885378674e-06, 'epoch': 0.58} 58%|█████▊ | 7201/12313 [5:23:18<3:49:32, 2.69s/it] 58%|█████▊ | 7202/12313 [5:23:21<3:45:22, 2.65s/it] {'loss': 0.5067, 'grad_norm': 6.899462009323301, 'learning_rate': 1.9389360189340213e-06, 'epoch': 0.58} 58%|█████▊ | 7202/12313 [5:23:21<3:45:22, 2.65s/it] 58%|█████▊ | 7203/12313 [5:23:24<3:51:49, 2.72s/it] {'loss': 0.5829, 'grad_norm': 3.2063799922509464, 'learning_rate': 1.9382951913120276e-06, 'epoch': 0.58} 58%|█████▊ | 7203/12313 [5:23:24<3:51:49, 2.72s/it] 59%|█████▊ | 7204/12313 [5:23:26<3:49:10, 2.69s/it] {'loss': 0.4999, 'grad_norm': 9.321053496169446, 'learning_rate': 1.937654402557034e-06, 'epoch': 0.59} 59%|█████▊ | 7204/12313 [5:23:26<3:49:10, 2.69s/it] 59%|█████▊ | 7205/12313 [5:23:29<3:48:11, 2.68s/it] {'loss': 0.2643, 'grad_norm': 5.677431258954087, 'learning_rate': 1.937013652713378e-06, 'epoch': 0.59} 59%|█████▊ | 7205/12313 [5:23:29<3:48:11, 2.68s/it] 59%|█████▊ | 7206/12313 [5:23:32<3:46:33, 2.66s/it] {'loss': 0.7468, 'grad_norm': 3.3598751265364446, 'learning_rate': 1.9363729418253995e-06, 'epoch': 0.59} 59%|█████▊ | 7206/12313 [5:23:32<3:46:33, 2.66s/it] 59%|█████▊ | 7207/12313 [5:23:34<3:50:50, 2.71s/it] {'loss': 0.5862, 'grad_norm': 4.156902235894881, 'learning_rate': 1.93573226993743e-06, 'epoch': 0.59} 59%|█████▊ | 7207/12313 [5:23:34<3:50:50, 2.71s/it] 59%|█████▊ | 7208/12313 [5:23:37<3:48:00, 2.68s/it] {'loss': 0.6588, 'grad_norm': 4.5056414028942084, 'learning_rate': 1.9350916370938004e-06, 'epoch': 0.59} 59%|█████▊ | 7208/12313 [5:23:37<3:48:00, 2.68s/it] 59%|█████▊ | 7209/12313 [5:23:40<3:46:54, 2.67s/it] {'loss': 0.8105, 'grad_norm': 5.759552128771596, 'learning_rate': 1.9344510433388405e-06, 'epoch': 0.59} 59%|█████▊ | 7209/12313 [5:23:40<3:46:54, 2.67s/it] 59%|█████▊ | 7210/12313 [5:23:42<3:48:49, 2.69s/it] {'loss': 0.5087, 'grad_norm': 4.3224511149017255, 'learning_rate': 1.9338104887168753e-06, 'epoch': 0.59} 59%|█████▊ | 7210/12313 [5:23:42<3:48:49, 2.69s/it] 59%|█████▊ | 7211/12313 [5:23:46<4:09:26, 2.93s/it] {'loss': 0.4611, 'grad_norm': 4.7674728901987455, 'learning_rate': 1.933169973272227e-06, 'epoch': 0.59} 59%|█████▊ | 7211/12313 [5:23:46<4:09:26, 2.93s/it] 59%|█████▊ | 7212/12313 [5:23:48<4:01:19, 2.84s/it] {'loss': 0.5603, 'grad_norm': 5.687185959055942, 'learning_rate': 1.932529497049217e-06, 'epoch': 0.59} 59%|█████▊ | 7212/12313 [5:23:48<4:01:19, 2.84s/it] 59%|█████▊ | 7213/12313 [5:23:51<3:58:25, 2.81s/it] {'loss': 0.6074, 'grad_norm': 6.09715437191108, 'learning_rate': 1.9318890600921638e-06, 'epoch': 0.59} 59%|█████▊ | 7213/12313 [5:23:51<3:58:25, 2.81s/it] 59%|█████▊ | 7214/12313 [5:23:54<3:56:11, 2.78s/it] {'loss': 0.6328, 'grad_norm': 5.020060984695505, 'learning_rate': 1.9312486624453783e-06, 'epoch': 0.59} 59%|█████▊ | 7214/12313 [5:23:54<3:56:11, 2.78s/it] 59%|█████▊ | 7215/12313 [5:23:57<3:53:07, 2.74s/it] {'loss': 0.5313, 'grad_norm': 4.477953926588951, 'learning_rate': 1.9306083041531773e-06, 'epoch': 0.59} 59%|█████▊ | 7215/12313 [5:23:57<3:53:07, 2.74s/it] 59%|█████▊ | 7216/12313 [5:23:59<3:50:12, 2.71s/it] {'loss': 0.6649, 'grad_norm': 3.519455685248619, 'learning_rate': 1.9299679852598684e-06, 'epoch': 0.59} 59%|█████▊ | 7216/12313 [5:23:59<3:50:12, 2.71s/it] 59%|█████▊ | 7217/12313 [5:24:02<3:46:55, 2.67s/it] {'loss': 0.4758, 'grad_norm': 3.6704238824484325, 'learning_rate': 1.929327705809757e-06, 'epoch': 0.59} 59%|█████▊ | 7217/12313 [5:24:02<3:46:55, 2.67s/it] 59%|█████▊ | 7218/12313 [5:24:04<3:44:14, 2.64s/it] {'loss': 0.4217, 'grad_norm': 6.283905565847572, 'learning_rate': 1.928687465847148e-06, 'epoch': 0.59} 59%|█████▊ | 7218/12313 [5:24:04<3:44:14, 2.64s/it] 59%|█████▊ | 7219/12313 [5:24:07<3:41:39, 2.61s/it] {'loss': 0.4945, 'grad_norm': 6.945906876806998, 'learning_rate': 1.9280472654163436e-06, 'epoch': 0.59} 59%|█████▊ | 7219/12313 [5:24:07<3:41:39, 2.61s/it] 59%|█████▊ | 7220/12313 [5:24:09<3:41:06, 2.60s/it] {'loss': 0.4238, 'grad_norm': 8.370884932771967, 'learning_rate': 1.927407104561641e-06, 'epoch': 0.59} 59%|█████▊ | 7220/12313 [5:24:09<3:41:06, 2.60s/it] 59%|█████▊ | 7221/12313 [5:24:12<3:42:14, 2.62s/it] {'loss': 0.5634, 'grad_norm': 4.383870501450878, 'learning_rate': 1.926766983327336e-06, 'epoch': 0.59} 59%|█████▊ | 7221/12313 [5:24:12<3:42:14, 2.62s/it] 59%|█████▊ | 7222/12313 [5:24:15<3:45:15, 2.65s/it] {'loss': 0.634, 'grad_norm': 9.241632141950406, 'learning_rate': 1.9261269017577228e-06, 'epoch': 0.59} 59%|█████▊ | 7222/12313 [5:24:15<3:45:15, 2.65s/it] 59%|█████▊ | 7223/12313 [5:24:17<3:44:10, 2.64s/it] {'loss': 0.5618, 'grad_norm': 3.7586621345897853, 'learning_rate': 1.9254868598970904e-06, 'epoch': 0.59} 59%|█████▊ | 7223/12313 [5:24:17<3:44:10, 2.64s/it] 59%|█████▊ | 7224/12313 [5:24:20<3:45:34, 2.66s/it] {'loss': 0.6119, 'grad_norm': 4.209074416276335, 'learning_rate': 1.924846857789726e-06, 'epoch': 0.59} 59%|█████▊ | 7224/12313 [5:24:20<3:45:34, 2.66s/it] 59%|█████▊ | 7225/12313 [5:24:23<3:47:27, 2.68s/it] {'loss': 0.4853, 'grad_norm': 4.546889702288765, 'learning_rate': 1.924206895479916e-06, 'epoch': 0.59} 59%|█████▊ | 7225/12313 [5:24:23<3:47:27, 2.68s/it] 59%|█████▊ | 7226/12313 [5:24:26<3:48:18, 2.69s/it] {'loss': 0.3448, 'grad_norm': 3.7823061514718423, 'learning_rate': 1.9235669730119415e-06, 'epoch': 0.59} 59%|█████▊ | 7226/12313 [5:24:26<3:48:18, 2.69s/it] 59%|█████▊ | 7227/12313 [5:24:28<3:52:34, 2.74s/it] {'loss': 0.5403, 'grad_norm': 4.256499430588995, 'learning_rate': 1.922927090430081e-06, 'epoch': 0.59} 59%|█████▊ | 7227/12313 [5:24:28<3:52:34, 2.74s/it] 59%|█████▊ | 7228/12313 [5:24:31<3:52:39, 2.75s/it] {'loss': 0.4421, 'grad_norm': 44.81613726842311, 'learning_rate': 1.9222872477786124e-06, 'epoch': 0.59} 59%|█████▊ | 7228/12313 [5:24:31<3:52:39, 2.75s/it] 59%|█████▊ | 7229/12313 [5:24:34<3:50:11, 2.72s/it] {'loss': 0.4627, 'grad_norm': 4.765715400723174, 'learning_rate': 1.921647445101809e-06, 'epoch': 0.59} 59%|█████▊ | 7229/12313 [5:24:34<3:50:11, 2.72s/it] 59%|█████▊ | 7230/12313 [5:24:36<3:46:32, 2.67s/it] {'loss': 0.4778, 'grad_norm': 20.740588315180986, 'learning_rate': 1.921007682443941e-06, 'epoch': 0.59} 59%|█████▊ | 7230/12313 [5:24:36<3:46:32, 2.67s/it] 59%|█████▊ | 7231/12313 [5:24:39<3:53:22, 2.76s/it] {'loss': 0.4699, 'grad_norm': 5.271002543096211, 'learning_rate': 1.920367959849277e-06, 'epoch': 0.59} 59%|█████▊ | 7231/12313 [5:24:39<3:53:22, 2.76s/it] 59%|█████▊ | 7232/12313 [5:24:42<3:48:31, 2.70s/it] {'loss': 0.6411, 'grad_norm': 4.633497702253883, 'learning_rate': 1.919728277362083e-06, 'epoch': 0.59} 59%|█████▊ | 7232/12313 [5:24:42<3:48:31, 2.70s/it] 59%|█████▊ | 7233/12313 [5:24:45<3:53:52, 2.76s/it] {'loss': 0.3667, 'grad_norm': 2.739394598878589, 'learning_rate': 1.91908863502662e-06, 'epoch': 0.59} 59%|█████▊ | 7233/12313 [5:24:45<3:53:52, 2.76s/it] 59%|█████▉ | 7234/12313 [5:24:47<3:49:32, 2.71s/it] {'loss': 0.5021, 'grad_norm': 5.224714666393266, 'learning_rate': 1.9184490328871502e-06, 'epoch': 0.59} 59%|█████▉ | 7234/12313 [5:24:47<3:49:32, 2.71s/it] 59%|█████▉ | 7235/12313 [5:24:50<3:49:13, 2.71s/it] {'loss': 0.4818, 'grad_norm': 7.019642199719357, 'learning_rate': 1.9178094709879296e-06, 'epoch': 0.59} 59%|█████▉ | 7235/12313 [5:24:50<3:49:13, 2.71s/it] 59%|█████▉ | 7236/12313 [5:24:53<3:46:05, 2.67s/it] {'loss': 0.455, 'grad_norm': 5.128939865748304, 'learning_rate': 1.9171699493732122e-06, 'epoch': 0.59} 59%|█████▉ | 7236/12313 [5:24:53<3:46:05, 2.67s/it] 59%|█████▉ | 7237/12313 [5:24:56<3:50:02, 2.72s/it] {'loss': 0.4862, 'grad_norm': 7.227653257937603, 'learning_rate': 1.916530468087249e-06, 'epoch': 0.59} 59%|█████▉ | 7237/12313 [5:24:56<3:50:02, 2.72s/it] 59%|█████▉ | 7238/12313 [5:24:58<3:45:29, 2.67s/it] {'loss': 0.5265, 'grad_norm': 5.527188153857498, 'learning_rate': 1.9158910271742905e-06, 'epoch': 0.59} 59%|█████▉ | 7238/12313 [5:24:58<3:45:29, 2.67s/it] 59%|█████▉ | 7239/12313 [5:25:01<3:45:58, 2.67s/it] {'loss': 0.5964, 'grad_norm': 2.6827965014779593, 'learning_rate': 1.9152516266785807e-06, 'epoch': 0.59} 59%|█████▉ | 7239/12313 [5:25:01<3:45:58, 2.67s/it] 59%|█████▉ | 7240/12313 [5:25:03<3:44:52, 2.66s/it] {'loss': 0.5542, 'grad_norm': 5.211112335596169, 'learning_rate': 1.9146122666443635e-06, 'epoch': 0.59} 59%|█████▉ | 7240/12313 [5:25:03<3:44:52, 2.66s/it] 59%|█████▉ | 7241/12313 [5:25:06<3:51:37, 2.74s/it] {'loss': 0.4605, 'grad_norm': 4.03076289953267, 'learning_rate': 1.91397294711588e-06, 'epoch': 0.59} 59%|█████▉ | 7241/12313 [5:25:06<3:51:37, 2.74s/it] 59%|█████▉ | 7242/12313 [5:25:09<3:44:46, 2.66s/it] {'loss': 0.5431, 'grad_norm': 5.996591886378709, 'learning_rate': 1.9133336681373673e-06, 'epoch': 0.59} 59%|█████▉ | 7242/12313 [5:25:09<3:44:46, 2.66s/it] 59%|█████▉ | 7243/12313 [5:25:12<3:44:53, 2.66s/it] {'loss': 0.4664, 'grad_norm': 4.7750369105364845, 'learning_rate': 1.912694429753059e-06, 'epoch': 0.59} 59%|█████▉ | 7243/12313 [5:25:12<3:44:53, 2.66s/it] 59%|█████▉ | 7244/12313 [5:25:14<3:40:48, 2.61s/it] {'loss': 0.4708, 'grad_norm': 4.118454697868038, 'learning_rate': 1.912055232007188e-06, 'epoch': 0.59} 59%|█████▉ | 7244/12313 [5:25:14<3:40:48, 2.61s/it] 59%|█████▉ | 7245/12313 [5:25:17<3:42:05, 2.63s/it] {'loss': 0.4099, 'grad_norm': 5.287731670886094, 'learning_rate': 1.911416074943984e-06, 'epoch': 0.59} 59%|█████▉ | 7245/12313 [5:25:17<3:42:05, 2.63s/it] 59%|█████▉ | 7246/12313 [5:25:19<3:43:41, 2.65s/it] {'loss': 0.4555, 'grad_norm': 3.463709326164506, 'learning_rate': 1.9107769586076716e-06, 'epoch': 0.59} 59%|█████▉ | 7246/12313 [5:25:19<3:43:41, 2.65s/it] 59%|█████▉ | 7247/12313 [5:25:22<3:49:52, 2.72s/it] {'loss': 0.5739, 'grad_norm': 3.9158367444108855, 'learning_rate': 1.9101378830424758e-06, 'epoch': 0.59} 59%|█████▉ | 7247/12313 [5:25:22<3:49:52, 2.72s/it] 59%|█████▉ | 7248/12313 [5:25:25<3:48:20, 2.70s/it] {'loss': 0.561, 'grad_norm': 4.558330070983507, 'learning_rate': 1.909498848292617e-06, 'epoch': 0.59} 59%|█████▉ | 7248/12313 [5:25:25<3:48:20, 2.70s/it] 59%|█████▉ | 7249/12313 [5:25:28<3:45:27, 2.67s/it] {'loss': 0.3776, 'grad_norm': 5.495600846510651, 'learning_rate': 1.9088598544023118e-06, 'epoch': 0.59} 59%|█████▉ | 7249/12313 [5:25:28<3:45:27, 2.67s/it] 59%|█████▉ | 7250/12313 [5:25:30<3:46:43, 2.69s/it] {'loss': 0.4483, 'grad_norm': 4.632815620841959, 'learning_rate': 1.908220901415777e-06, 'epoch': 0.59} 59%|█████▉ | 7250/12313 [5:25:30<3:46:43, 2.69s/it] 59%|█████▉ | 7251/12313 [5:25:33<3:41:56, 2.63s/it] {'loss': 0.5724, 'grad_norm': 4.978619608664077, 'learning_rate': 1.907581989377224e-06, 'epoch': 0.59} 59%|█████▉ | 7251/12313 [5:25:33<3:41:56, 2.63s/it] 59%|█████▉ | 7252/12313 [5:25:36<3:49:26, 2.72s/it] {'loss': 0.5294, 'grad_norm': 5.346540069308749, 'learning_rate': 1.9069431183308615e-06, 'epoch': 0.59} 59%|█████▉ | 7252/12313 [5:25:36<3:49:26, 2.72s/it] 59%|█████▉ | 7253/12313 [5:25:38<3:45:26, 2.67s/it] {'loss': 0.6616, 'grad_norm': 4.59400323772315, 'learning_rate': 1.906304288320896e-06, 'epoch': 0.59} 59%|█████▉ | 7253/12313 [5:25:38<3:45:26, 2.67s/it] 59%|█████▉ | 7254/12313 [5:25:41<3:47:36, 2.70s/it] {'loss': 0.6185, 'grad_norm': 6.3719239881505665, 'learning_rate': 1.9056654993915326e-06, 'epoch': 0.59} 59%|█████▉ | 7254/12313 [5:25:41<3:47:36, 2.70s/it] 59%|█████▉ | 7255/12313 [5:25:44<3:45:00, 2.67s/it] {'loss': 0.4781, 'grad_norm': 6.682687810031172, 'learning_rate': 1.9050267515869709e-06, 'epoch': 0.59} 59%|█████▉ | 7255/12313 [5:25:44<3:45:00, 2.67s/it] 59%|█████▉ | 7256/12313 [5:25:46<3:44:24, 2.66s/it] {'loss': 0.8261, 'grad_norm': 3.7501427973660397, 'learning_rate': 1.9043880449514085e-06, 'epoch': 0.59} 59%|█████▉ | 7256/12313 [5:25:46<3:44:24, 2.66s/it] 59%|█████▉ | 7257/12313 [5:25:49<3:43:45, 2.66s/it] {'loss': 0.4972, 'grad_norm': 5.558288691226038, 'learning_rate': 1.9037493795290421e-06, 'epoch': 0.59} 59%|█████▉ | 7257/12313 [5:25:49<3:43:45, 2.66s/it] 59%|█████▉ | 7258/12313 [5:25:52<3:44:57, 2.67s/it] {'loss': 0.4625, 'grad_norm': 5.421642176266866, 'learning_rate': 1.9031107553640632e-06, 'epoch': 0.59} 59%|█████▉ | 7258/12313 [5:25:52<3:44:57, 2.67s/it] 59%|█████▉ | 7259/12313 [5:25:54<3:46:03, 2.68s/it] {'loss': 0.3844, 'grad_norm': 5.221069697025148, 'learning_rate': 1.9024721725006598e-06, 'epoch': 0.59} 59%|█████▉ | 7259/12313 [5:25:54<3:46:03, 2.68s/it] 59%|█████▉ | 7260/12313 [5:25:57<3:55:41, 2.80s/it] {'loss': 0.3608, 'grad_norm': 7.976877536523762, 'learning_rate': 1.9018336309830202e-06, 'epoch': 0.59} 59%|█████▉ | 7260/12313 [5:25:57<3:55:41, 2.80s/it] 59%|█████▉ | 7261/12313 [5:26:00<3:50:58, 2.74s/it] {'loss': 0.5597, 'grad_norm': 9.70876511343524, 'learning_rate': 1.9011951308553284e-06, 'epoch': 0.59} 59%|█████▉ | 7261/12313 [5:26:00<3:50:58, 2.74s/it] 59%|█████▉ | 7262/12313 [5:26:03<3:46:49, 2.69s/it] {'loss': 0.5158, 'grad_norm': 4.2405529559362405, 'learning_rate': 1.900556672161763e-06, 'epoch': 0.59} 59%|█████▉ | 7262/12313 [5:26:03<3:46:49, 2.69s/it] 59%|█████▉ | 7263/12313 [5:26:05<3:46:27, 2.69s/it] {'loss': 0.5368, 'grad_norm': 6.1774490371345845, 'learning_rate': 1.899918254946504e-06, 'epoch': 0.59} 59%|█████▉ | 7263/12313 [5:26:05<3:46:27, 2.69s/it] 59%|█████▉ | 7264/12313 [5:26:08<3:43:35, 2.66s/it] {'loss': 0.6, 'grad_norm': 5.486539007392727, 'learning_rate': 1.8992798792537265e-06, 'epoch': 0.59} 59%|█████▉ | 7264/12313 [5:26:08<3:43:35, 2.66s/it] 59%|█████▉ | 7265/12313 [5:26:10<3:42:07, 2.64s/it] {'loss': 0.3974, 'grad_norm': 4.8768276862477995, 'learning_rate': 1.898641545127601e-06, 'epoch': 0.59} 59%|█████▉ | 7265/12313 [5:26:10<3:42:07, 2.64s/it] 59%|█████▉ | 7266/12313 [5:26:13<3:39:52, 2.61s/it] {'loss': 0.5479, 'grad_norm': 5.691046376635863, 'learning_rate': 1.8980032526122985e-06, 'epoch': 0.59} 59%|█████▉ | 7266/12313 [5:26:13<3:39:52, 2.61s/it] 59%|█████▉ | 7267/12313 [5:26:15<3:37:24, 2.59s/it] {'loss': 0.4471, 'grad_norm': 5.4416215817890325, 'learning_rate': 1.8973650017519855e-06, 'epoch': 0.59} 59%|█████▉ | 7267/12313 [5:26:16<3:37:24, 2.59s/it] 59%|█████▉ | 7268/12313 [5:26:19<3:49:04, 2.72s/it] {'loss': 0.5173, 'grad_norm': 3.7432268661179324, 'learning_rate': 1.8967267925908237e-06, 'epoch': 0.59} 59%|█████▉ | 7268/12313 [5:26:19<3:49:04, 2.72s/it] 59%|█████▉ | 7269/12313 [5:26:21<3:44:45, 2.67s/it] {'loss': 0.4184, 'grad_norm': 8.040626967946537, 'learning_rate': 1.8960886251729756e-06, 'epoch': 0.59} 59%|█████▉ | 7269/12313 [5:26:21<3:44:45, 2.67s/it] 59%|█████▉ | 7270/12313 [5:26:24<3:43:19, 2.66s/it] {'loss': 0.4146, 'grad_norm': 6.080428319987492, 'learning_rate': 1.8954504995425994e-06, 'epoch': 0.59} 59%|█████▉ | 7270/12313 [5:26:24<3:43:19, 2.66s/it] 59%|█████▉ | 7271/12313 [5:26:26<3:44:06, 2.67s/it] {'loss': 0.4314, 'grad_norm': 4.115370064107381, 'learning_rate': 1.8948124157438485e-06, 'epoch': 0.59} 59%|█████▉ | 7271/12313 [5:26:26<3:44:06, 2.67s/it] 59%|█████▉ | 7272/12313 [5:26:29<3:40:29, 2.62s/it] {'loss': 0.4831, 'grad_norm': 4.654941886199049, 'learning_rate': 1.8941743738208752e-06, 'epoch': 0.59} 59%|█████▉ | 7272/12313 [5:26:29<3:40:29, 2.62s/it] 59%|█████▉ | 7273/12313 [5:26:32<3:42:39, 2.65s/it] {'loss': 0.5219, 'grad_norm': 6.074230738214155, 'learning_rate': 1.8935363738178288e-06, 'epoch': 0.59} 59%|█████▉ | 7273/12313 [5:26:32<3:42:39, 2.65s/it] 59%|█████▉ | 7274/12313 [5:26:34<3:42:55, 2.65s/it] {'loss': 0.6185, 'grad_norm': 4.209033114032256, 'learning_rate': 1.8928984157788565e-06, 'epoch': 0.59} 59%|█████▉ | 7274/12313 [5:26:34<3:42:55, 2.65s/it] 59%|█████▉ | 7275/12313 [5:26:37<3:47:43, 2.71s/it] {'loss': 0.5069, 'grad_norm': 6.854702739452237, 'learning_rate': 1.8922604997480998e-06, 'epoch': 0.59} 59%|█████▉ | 7275/12313 [5:26:37<3:47:43, 2.71s/it] 59%|█████▉ | 7276/12313 [5:26:41<4:05:31, 2.92s/it] {'loss': 0.4144, 'grad_norm': 5.0281604588869975, 'learning_rate': 1.8916226257697004e-06, 'epoch': 0.59} 59%|█████▉ | 7276/12313 [5:26:41<4:05:31, 2.92s/it] 59%|█████▉ | 7277/12313 [5:26:43<4:04:16, 2.91s/it] {'loss': 0.697, 'grad_norm': 3.904693540082544, 'learning_rate': 1.8909847938877962e-06, 'epoch': 0.59} 59%|█████▉ | 7277/12313 [5:26:43<4:04:16, 2.91s/it] 59%|█████▉ | 7278/12313 [5:26:46<4:00:27, 2.87s/it] {'loss': 0.3997, 'grad_norm': 4.638587360552655, 'learning_rate': 1.89034700414652e-06, 'epoch': 0.59} 59%|█████▉ | 7278/12313 [5:26:46<4:00:27, 2.87s/it] 59%|█████▉ | 7279/12313 [5:26:49<3:54:24, 2.79s/it] {'loss': 0.5474, 'grad_norm': 4.955343175267275, 'learning_rate': 1.8897092565900048e-06, 'epoch': 0.59} 59%|█████▉ | 7279/12313 [5:26:49<3:54:24, 2.79s/it] 59%|█████▉ | 7280/12313 [5:26:51<3:46:00, 2.69s/it] {'loss': 0.5885, 'grad_norm': 4.586670514862452, 'learning_rate': 1.8890715512623802e-06, 'epoch': 0.59} 59%|█████▉ | 7280/12313 [5:26:51<3:46:00, 2.69s/it] 59%|█████▉ | 7281/12313 [5:26:54<3:42:14, 2.65s/it] {'loss': 0.4193, 'grad_norm': 6.135861467307459, 'learning_rate': 1.8884338882077697e-06, 'epoch': 0.59} 59%|█████▉ | 7281/12313 [5:26:54<3:42:14, 2.65s/it] 59%|█████▉ | 7282/12313 [5:26:57<3:46:24, 2.70s/it] {'loss': 0.4749, 'grad_norm': 8.91363405049214, 'learning_rate': 1.8877962674702977e-06, 'epoch': 0.59} 59%|█████▉ | 7282/12313 [5:26:57<3:46:24, 2.70s/it] 59%|█████▉ | 7283/12313 [5:26:59<3:45:15, 2.69s/it] {'loss': 0.4059, 'grad_norm': 3.5396738186174317, 'learning_rate': 1.8871586890940847e-06, 'epoch': 0.59} 59%|█████▉ | 7283/12313 [5:26:59<3:45:15, 2.69s/it] 59%|█████▉ | 7284/12313 [5:27:02<3:49:51, 2.74s/it] {'loss': 0.4456, 'grad_norm': 4.623629413772576, 'learning_rate': 1.886521153123246e-06, 'epoch': 0.59} 59%|█████▉ | 7284/12313 [5:27:02<3:49:51, 2.74s/it] 59%|█████▉ | 7285/12313 [5:27:05<3:49:14, 2.74s/it] {'loss': 0.4957, 'grad_norm': 4.952672810545965, 'learning_rate': 1.8858836596018973e-06, 'epoch': 0.59} 59%|█████▉ | 7285/12313 [5:27:05<3:49:14, 2.74s/it] 59%|█████▉ | 7286/12313 [5:27:08<3:46:32, 2.70s/it] {'loss': 0.5374, 'grad_norm': 4.741145039545014, 'learning_rate': 1.8852462085741497e-06, 'epoch': 0.59} 59%|█████▉ | 7286/12313 [5:27:08<3:46:32, 2.70s/it] 59%|█████▉ | 7287/12313 [5:27:10<3:51:19, 2.76s/it] {'loss': 0.558, 'grad_norm': 8.418571766778543, 'learning_rate': 1.8846088000841096e-06, 'epoch': 0.59} 59%|█████▉ | 7287/12313 [5:27:10<3:51:19, 2.76s/it] 59%|█████▉ | 7288/12313 [5:27:13<3:50:17, 2.75s/it] {'loss': 0.5848, 'grad_norm': 3.602706463619327, 'learning_rate': 1.8839714341758847e-06, 'epoch': 0.59} 59%|█████▉ | 7288/12313 [5:27:13<3:50:17, 2.75s/it] 59%|█████▉ | 7289/12313 [5:27:16<3:49:59, 2.75s/it] {'loss': 0.5071, 'grad_norm': 10.651397817035003, 'learning_rate': 1.883334110893576e-06, 'epoch': 0.59} 59%|█████▉ | 7289/12313 [5:27:16<3:49:59, 2.75s/it] 59%|█████▉ | 7290/12313 [5:27:18<3:43:06, 2.67s/it] {'loss': 0.4555, 'grad_norm': 4.361973439215471, 'learning_rate': 1.8826968302812837e-06, 'epoch': 0.59} 59%|█████▉ | 7290/12313 [5:27:18<3:43:06, 2.67s/it] 59%|█████▉ | 7291/12313 [5:27:21<3:47:02, 2.71s/it] {'loss': 0.6062, 'grad_norm': 6.18398302417084, 'learning_rate': 1.8820595923831025e-06, 'epoch': 0.59} 59%|█████▉ | 7291/12313 [5:27:21<3:47:02, 2.71s/it] 59%|█████▉ | 7292/12313 [5:27:24<3:48:49, 2.73s/it] {'loss': 0.5944, 'grad_norm': 8.248248546443998, 'learning_rate': 1.8814223972431276e-06, 'epoch': 0.59} 59%|█████▉ | 7292/12313 [5:27:24<3:48:49, 2.73s/it] 59%|█████▉ | 7293/12313 [5:27:27<3:57:12, 2.84s/it] {'loss': 0.5043, 'grad_norm': 7.741858864310114, 'learning_rate': 1.8807852449054497e-06, 'epoch': 0.59} 59%|█████▉ | 7293/12313 [5:27:27<3:57:12, 2.84s/it] 59%|█████▉ | 7294/12313 [5:27:30<3:52:55, 2.78s/it] {'loss': 0.5163, 'grad_norm': 3.9704756953268565, 'learning_rate': 1.8801481354141547e-06, 'epoch': 0.59} 59%|█████▉ | 7294/12313 [5:27:30<3:52:55, 2.78s/it] 59%|█████▉ | 7295/12313 [5:27:33<3:53:10, 2.79s/it] {'loss': 0.4582, 'grad_norm': 4.432871358786943, 'learning_rate': 1.8795110688133283e-06, 'epoch': 0.59} 59%|█████▉ | 7295/12313 [5:27:33<3:53:10, 2.79s/it] 59%|█████▉ | 7296/12313 [5:27:35<3:44:40, 2.69s/it] {'loss': 0.4393, 'grad_norm': 4.035829611958314, 'learning_rate': 1.878874045147053e-06, 'epoch': 0.59} 59%|█████▉ | 7296/12313 [5:27:35<3:44:40, 2.69s/it] 59%|█████▉ | 7297/12313 [5:27:38<3:45:00, 2.69s/it] {'loss': 0.62, 'grad_norm': 3.2780411870082284, 'learning_rate': 1.8782370644594055e-06, 'epoch': 0.59} 59%|█████▉ | 7297/12313 [5:27:38<3:45:00, 2.69s/it] 59%|█████▉ | 7298/12313 [5:27:40<3:47:10, 2.72s/it] {'loss': 0.5785, 'grad_norm': 5.278700539156712, 'learning_rate': 1.8776001267944628e-06, 'epoch': 0.59} 59%|█████▉ | 7298/12313 [5:27:40<3:47:10, 2.72s/it] 59%|█████▉ | 7299/12313 [5:27:43<3:51:59, 2.78s/it] {'loss': 0.57, 'grad_norm': 3.827532458925898, 'learning_rate': 1.876963232196298e-06, 'epoch': 0.59} 59%|█████▉ | 7299/12313 [5:27:43<3:51:59, 2.78s/it] 59%|█████▉ | 7300/12313 [5:27:46<3:51:04, 2.77s/it] {'loss': 0.609, 'grad_norm': 4.0256389154851595, 'learning_rate': 1.876326380708979e-06, 'epoch': 0.59} 59%|█████▉ | 7300/12313 [5:27:46<3:51:04, 2.77s/it] 59%|█████▉ | 7301/12313 [5:27:49<3:44:12, 2.68s/it] {'loss': 0.5095, 'grad_norm': 8.391342343121048, 'learning_rate': 1.8756895723765747e-06, 'epoch': 0.59} 59%|█████▉ | 7301/12313 [5:27:49<3:44:12, 2.68s/it] 59%|█████▉ | 7302/12313 [5:27:51<3:42:21, 2.66s/it] {'loss': 0.5275, 'grad_norm': 3.698568278989539, 'learning_rate': 1.8750528072431477e-06, 'epoch': 0.59} 59%|█████▉ | 7302/12313 [5:27:51<3:42:21, 2.66s/it] 59%|█████▉ | 7303/12313 [5:27:54<3:41:41, 2.66s/it] {'loss': 0.6133, 'grad_norm': 7.336981984750687, 'learning_rate': 1.8744160853527579e-06, 'epoch': 0.59} 59%|█████▉ | 7303/12313 [5:27:54<3:41:41, 2.66s/it] 59%|█████▉ | 7304/12313 [5:27:56<3:40:15, 2.64s/it] {'loss': 0.4626, 'grad_norm': 4.166581863576458, 'learning_rate': 1.8737794067494656e-06, 'epoch': 0.59} 59%|█████▉ | 7304/12313 [5:27:56<3:40:15, 2.64s/it] 59%|█████▉ | 7305/12313 [5:27:59<3:45:39, 2.70s/it] {'loss': 0.5315, 'grad_norm': 4.274646132254657, 'learning_rate': 1.8731427714773233e-06, 'epoch': 0.59} 59%|█████▉ | 7305/12313 [5:27:59<3:45:39, 2.70s/it] 59%|█████▉ | 7306/12313 [5:28:02<3:48:33, 2.74s/it] {'loss': 0.4508, 'grad_norm': 3.911879665279761, 'learning_rate': 1.8725061795803846e-06, 'epoch': 0.59} 59%|█████▉ | 7306/12313 [5:28:02<3:48:33, 2.74s/it] 59%|█████▉ | 7307/12313 [5:28:05<3:44:00, 2.68s/it] {'loss': 0.5147, 'grad_norm': 2.642462457610074, 'learning_rate': 1.8718696311026956e-06, 'epoch': 0.59} 59%|█████▉ | 7307/12313 [5:28:05<3:44:00, 2.68s/it] 59%|█████▉ | 7308/12313 [5:28:07<3:43:50, 2.68s/it] {'loss': 0.3801, 'grad_norm': 4.675230745750343, 'learning_rate': 1.871233126088305e-06, 'epoch': 0.59} 59%|█████▉ | 7308/12313 [5:28:07<3:43:50, 2.68s/it] 59%|█████▉ | 7309/12313 [5:28:10<3:45:25, 2.70s/it] {'loss': 0.5329, 'grad_norm': 10.255898660318655, 'learning_rate': 1.8705966645812544e-06, 'epoch': 0.59} 59%|█████▉ | 7309/12313 [5:28:10<3:45:25, 2.70s/it] 59%|█████▉ | 7310/12313 [5:28:13<3:43:06, 2.68s/it] {'loss': 0.4131, 'grad_norm': 4.954517136307957, 'learning_rate': 1.8699602466255828e-06, 'epoch': 0.59} 59%|█████▉ | 7310/12313 [5:28:13<3:43:06, 2.68s/it] 59%|█████▉ | 7311/12313 [5:28:16<3:45:58, 2.71s/it] {'loss': 0.4266, 'grad_norm': 4.751776132607896, 'learning_rate': 1.8693238722653278e-06, 'epoch': 0.59} 59%|█████▉ | 7311/12313 [5:28:16<3:45:58, 2.71s/it] 59%|█████▉ | 7312/12313 [5:28:19<3:56:50, 2.84s/it] {'loss': 0.4889, 'grad_norm': 3.047587348683506, 'learning_rate': 1.8686875415445238e-06, 'epoch': 0.59} 59%|█████▉ | 7312/12313 [5:28:19<3:56:50, 2.84s/it] 59%|█████▉ | 7313/12313 [5:28:21<3:48:28, 2.74s/it] {'loss': 0.4731, 'grad_norm': 12.990954504278774, 'learning_rate': 1.8680512545071999e-06, 'epoch': 0.59} 59%|█████▉ | 7313/12313 [5:28:21<3:48:28, 2.74s/it] 59%|█████▉ | 7314/12313 [5:28:24<3:44:44, 2.70s/it] {'loss': 0.6631, 'grad_norm': 4.070689985920183, 'learning_rate': 1.8674150111973854e-06, 'epoch': 0.59} 59%|█████▉ | 7314/12313 [5:28:24<3:44:44, 2.70s/it] 59%|█████▉ | 7315/12313 [5:28:26<3:42:47, 2.67s/it] {'loss': 0.8279, 'grad_norm': 2.864000413306198, 'learning_rate': 1.866778811659104e-06, 'epoch': 0.59} 59%|█████▉ | 7315/12313 [5:28:26<3:42:47, 2.67s/it] 59%|█████▉ | 7316/12313 [5:28:29<3:43:06, 2.68s/it] {'loss': 0.6624, 'grad_norm': 3.763015690605599, 'learning_rate': 1.8661426559363768e-06, 'epoch': 0.59} 59%|█████▉ | 7316/12313 [5:28:29<3:43:06, 2.68s/it] 59%|█████▉ | 7317/12313 [5:28:32<3:44:18, 2.69s/it] {'loss': 0.5768, 'grad_norm': 6.349440373441788, 'learning_rate': 1.8655065440732243e-06, 'epoch': 0.59} 59%|█████▉ | 7317/12313 [5:28:32<3:44:18, 2.69s/it] 59%|█████▉ | 7318/12313 [5:28:35<3:58:01, 2.86s/it] {'loss': 0.54, 'grad_norm': 3.739887750298035, 'learning_rate': 1.8648704761136604e-06, 'epoch': 0.59} 59%|█████▉ | 7318/12313 [5:28:35<3:58:01, 2.86s/it] 59%|█████▉ | 7319/12313 [5:28:38<4:00:22, 2.89s/it] {'loss': 0.4265, 'grad_norm': 3.8465470522434977, 'learning_rate': 1.8642344521016974e-06, 'epoch': 0.59} 59%|█████▉ | 7319/12313 [5:28:38<4:00:22, 2.89s/it] 59%|█████▉ | 7320/12313 [5:28:40<3:47:10, 2.73s/it] {'loss': 0.5427, 'grad_norm': 5.442222857316663, 'learning_rate': 1.8635984720813471e-06, 'epoch': 0.59} 59%|█████▉ | 7320/12313 [5:28:40<3:47:10, 2.73s/it] 59%|█████▉ | 7321/12313 [5:28:43<3:41:49, 2.67s/it] {'loss': 0.4966, 'grad_norm': 5.036275208595901, 'learning_rate': 1.8629625360966137e-06, 'epoch': 0.59} 59%|█████▉ | 7321/12313 [5:28:43<3:41:49, 2.67s/it] 59%|█████▉ | 7322/12313 [5:28:45<3:38:56, 2.63s/it] {'loss': 0.4192, 'grad_norm': 4.325139913021066, 'learning_rate': 1.8623266441915006e-06, 'epoch': 0.59} 59%|█████▉ | 7322/12313 [5:28:45<3:38:56, 2.63s/it] 59%|█████▉ | 7323/12313 [5:28:48<3:36:14, 2.60s/it] {'loss': 0.4411, 'grad_norm': 3.7229174258197437, 'learning_rate': 1.86169079641001e-06, 'epoch': 0.59} 59%|█████▉ | 7323/12313 [5:28:48<3:36:14, 2.60s/it] 59%|█████▉ | 7324/12313 [5:28:51<3:39:06, 2.64s/it] {'loss': 0.5971, 'grad_norm': 4.325040819420997, 'learning_rate': 1.861054992796138e-06, 'epoch': 0.59} 59%|█████▉ | 7324/12313 [5:28:51<3:39:06, 2.64s/it] 59%|█████▉ | 7325/12313 [5:28:53<3:43:09, 2.68s/it] {'loss': 0.5484, 'grad_norm': 5.429278035591401, 'learning_rate': 1.860419233393879e-06, 'epoch': 0.59} 59%|█████▉ | 7325/12313 [5:28:53<3:43:09, 2.68s/it] 59%|█████▉ | 7326/12313 [5:28:56<3:42:48, 2.68s/it] {'loss': 0.5676, 'grad_norm': 4.384346726547922, 'learning_rate': 1.859783518247223e-06, 'epoch': 0.59} 59%|█████▉ | 7326/12313 [5:28:56<3:42:48, 2.68s/it] 60%|█████▉ | 7327/12313 [5:28:59<3:44:38, 2.70s/it] {'loss': 0.4677, 'grad_norm': 7.391929682556335, 'learning_rate': 1.8591478474001601e-06, 'epoch': 0.6} 60%|█████▉ | 7327/12313 [5:28:59<3:44:38, 2.70s/it] 60%|█████▉ | 7328/12313 [5:29:02<3:43:07, 2.69s/it] {'loss': 0.562, 'grad_norm': 5.680231306618248, 'learning_rate': 1.858512220896675e-06, 'epoch': 0.6} 60%|█████▉ | 7328/12313 [5:29:02<3:43:07, 2.69s/it] 60%|█████▉ | 7329/12313 [5:29:04<3:41:37, 2.67s/it] {'loss': 0.4176, 'grad_norm': 4.0720446605312395, 'learning_rate': 1.857876638780748e-06, 'epoch': 0.6} 60%|█████▉ | 7329/12313 [5:29:04<3:41:37, 2.67s/it] 60%|█████▉ | 7330/12313 [5:29:07<3:42:06, 2.67s/it] {'loss': 0.5309, 'grad_norm': 4.5916594217289575, 'learning_rate': 1.85724110109636e-06, 'epoch': 0.6} 60%|█████▉ | 7330/12313 [5:29:07<3:42:06, 2.67s/it] 60%|█████▉ | 7331/12313 [5:29:10<3:46:08, 2.72s/it] {'loss': 0.4923, 'grad_norm': 4.6780280469096365, 'learning_rate': 1.8566056078874858e-06, 'epoch': 0.6} 60%|█████▉ | 7331/12313 [5:29:10<3:46:08, 2.72s/it] 60%|█████▉ | 7332/12313 [5:29:12<3:44:29, 2.70s/it] {'loss': 0.5152, 'grad_norm': 3.6599980641941734, 'learning_rate': 1.8559701591980977e-06, 'epoch': 0.6} 60%|█████▉ | 7332/12313 [5:29:12<3:44:29, 2.70s/it] 60%|█████▉ | 7333/12313 [5:29:15<3:51:56, 2.79s/it] {'loss': 0.4539, 'grad_norm': 2.978800119941686, 'learning_rate': 1.8553347550721672e-06, 'epoch': 0.6} 60%|█████▉ | 7333/12313 [5:29:15<3:51:56, 2.79s/it] 60%|█████▉ | 7334/12313 [5:29:18<3:52:41, 2.80s/it] {'loss': 0.64, 'grad_norm': 3.411250854476356, 'learning_rate': 1.8546993955536597e-06, 'epoch': 0.6} 60%|█████▉ | 7334/12313 [5:29:18<3:52:41, 2.80s/it] 60%|█████▉ | 7335/12313 [5:29:21<3:50:05, 2.77s/it] {'loss': 0.5401, 'grad_norm': 4.069278334737661, 'learning_rate': 1.8540640806865379e-06, 'epoch': 0.6} 60%|█████▉ | 7335/12313 [5:29:21<3:50:05, 2.77s/it] 60%|█████▉ | 7336/12313 [5:29:24<3:47:51, 2.75s/it] {'loss': 0.4868, 'grad_norm': 7.8976077553265505, 'learning_rate': 1.8534288105147644e-06, 'epoch': 0.6} 60%|█████▉ | 7336/12313 [5:29:24<3:47:51, 2.75s/it] 60%|█████▉ | 7337/12313 [5:29:26<3:49:02, 2.76s/it] {'loss': 0.4693, 'grad_norm': 3.08341050427021, 'learning_rate': 1.8527935850822947e-06, 'epoch': 0.6} 60%|█████▉ | 7337/12313 [5:29:26<3:49:02, 2.76s/it] 60%|█████▉ | 7338/12313 [5:29:29<3:56:58, 2.86s/it] {'loss': 0.5314, 'grad_norm': 4.0472231931258955, 'learning_rate': 1.8521584044330832e-06, 'epoch': 0.6} 60%|█████▉ | 7338/12313 [5:29:29<3:56:58, 2.86s/it] 60%|█████▉ | 7339/12313 [5:29:32<3:51:37, 2.79s/it] {'loss': 0.6015, 'grad_norm': 7.289070610294687, 'learning_rate': 1.851523268611082e-06, 'epoch': 0.6} 60%|█████▉ | 7339/12313 [5:29:32<3:51:37, 2.79s/it] 60%|█████▉ | 7340/12313 [5:29:35<3:49:32, 2.77s/it] {'loss': 0.4459, 'grad_norm': 5.48979102747638, 'learning_rate': 1.8508881776602386e-06, 'epoch': 0.6} 60%|█████▉ | 7340/12313 [5:29:35<3:49:32, 2.77s/it] 60%|█████▉ | 7341/12313 [5:29:38<4:00:35, 2.90s/it] {'loss': 0.5274, 'grad_norm': 15.0231623389192, 'learning_rate': 1.850253131624497e-06, 'epoch': 0.6} 60%|█████▉ | 7341/12313 [5:29:38<4:00:35, 2.90s/it] 60%|█████▉ | 7342/12313 [5:29:41<4:00:52, 2.91s/it] {'loss': 0.7543, 'grad_norm': 4.569897243481455, 'learning_rate': 1.8496181305478014e-06, 'epoch': 0.6} 60%|█████▉ | 7342/12313 [5:29:41<4:00:52, 2.91s/it] 60%|█████▉ | 7343/12313 [5:29:44<3:52:57, 2.81s/it] {'loss': 0.4809, 'grad_norm': 5.818717995667628, 'learning_rate': 1.8489831744740887e-06, 'epoch': 0.6} 60%|█████▉ | 7343/12313 [5:29:44<3:52:57, 2.81s/it] 60%|█████▉ | 7344/12313 [5:29:46<3:51:45, 2.80s/it] {'loss': 0.4216, 'grad_norm': 7.4916301022433505, 'learning_rate': 1.8483482634472948e-06, 'epoch': 0.6} 60%|█████▉ | 7344/12313 [5:29:46<3:51:45, 2.80s/it] 60%|█████▉ | 7345/12313 [5:29:49<3:48:54, 2.76s/it] {'loss': 0.4636, 'grad_norm': 2.952985348960789, 'learning_rate': 1.8477133975113516e-06, 'epoch': 0.6} 60%|█████▉ | 7345/12313 [5:29:49<3:48:54, 2.76s/it] 60%|█████▉ | 7346/12313 [5:29:52<3:48:07, 2.76s/it] {'loss': 0.5137, 'grad_norm': 5.3497885917720875, 'learning_rate': 1.8470785767101898e-06, 'epoch': 0.6} 60%|█████▉ | 7346/12313 [5:29:52<3:48:07, 2.76s/it] 60%|█████▉ | 7347/12313 [5:29:54<3:47:51, 2.75s/it] {'loss': 0.6904, 'grad_norm': 4.056347500018793, 'learning_rate': 1.8464438010877348e-06, 'epoch': 0.6} 60%|█████▉ | 7347/12313 [5:29:54<3:47:51, 2.75s/it] 60%|█████▉ | 7348/12313 [5:29:57<3:52:08, 2.81s/it] {'loss': 0.5172, 'grad_norm': 3.334261926906368, 'learning_rate': 1.845809070687909e-06, 'epoch': 0.6} 60%|█████▉ | 7348/12313 [5:29:57<3:52:08, 2.81s/it] 60%|█████▉ | 7349/12313 [5:30:00<3:52:20, 2.81s/it] {'loss': 0.434, 'grad_norm': 8.367386875652794, 'learning_rate': 1.8451743855546345e-06, 'epoch': 0.6} 60%|█████▉ | 7349/12313 [5:30:00<3:52:20, 2.81s/it] 60%|█████▉ | 7350/12313 [5:30:03<3:55:01, 2.84s/it] {'loss': 0.5824, 'grad_norm': 6.559459547573149, 'learning_rate': 1.8445397457318265e-06, 'epoch': 0.6} 60%|█████▉ | 7350/12313 [5:30:03<3:55:01, 2.84s/it] 60%|█████▉ | 7351/12313 [5:30:06<3:51:36, 2.80s/it] {'loss': 0.5153, 'grad_norm': 6.094611931889219, 'learning_rate': 1.8439051512633984e-06, 'epoch': 0.6} 60%|█████▉ | 7351/12313 [5:30:06<3:51:36, 2.80s/it] 60%|█████▉ | 7352/12313 [5:30:09<3:55:29, 2.85s/it] {'loss': 0.6593, 'grad_norm': 25.80816640442915, 'learning_rate': 1.8432706021932627e-06, 'epoch': 0.6} 60%|█████▉ | 7352/12313 [5:30:09<3:55:29, 2.85s/it] 60%|█████▉ | 7353/12313 [5:30:11<3:48:00, 2.76s/it] {'loss': 0.6459, 'grad_norm': 4.977546000712229, 'learning_rate': 1.8426360985653248e-06, 'epoch': 0.6} 60%|█████▉ | 7353/12313 [5:30:11<3:48:00, 2.76s/it] 60%|█████▉ | 7354/12313 [5:30:14<3:47:36, 2.75s/it] {'loss': 0.5861, 'grad_norm': 5.821123685581579, 'learning_rate': 1.8420016404234897e-06, 'epoch': 0.6} 60%|█████▉ | 7354/12313 [5:30:14<3:47:36, 2.75s/it] 60%|█████▉ | 7355/12313 [5:30:17<3:43:34, 2.71s/it] {'loss': 0.4389, 'grad_norm': 6.3086144237425525, 'learning_rate': 1.8413672278116595e-06, 'epoch': 0.6} 60%|█████▉ | 7355/12313 [5:30:17<3:43:34, 2.71s/it] 60%|█████▉ | 7356/12313 [5:30:19<3:38:35, 2.65s/it] {'loss': 0.4021, 'grad_norm': 5.044008936737754, 'learning_rate': 1.840732860773731e-06, 'epoch': 0.6} 60%|█████▉ | 7356/12313 [5:30:19<3:38:35, 2.65s/it] 60%|█████▉ | 7357/12313 [5:30:22<3:33:52, 2.59s/it] {'loss': 0.4824, 'grad_norm': 4.6734719943119005, 'learning_rate': 1.8400985393535986e-06, 'epoch': 0.6} 60%|█████▉ | 7357/12313 [5:30:22<3:33:52, 2.59s/it] 60%|█████▉ | 7358/12313 [5:30:24<3:36:49, 2.63s/it] {'loss': 0.5207, 'grad_norm': 3.603622687404737, 'learning_rate': 1.8394642635951563e-06, 'epoch': 0.6} 60%|█████▉ | 7358/12313 [5:30:24<3:36:49, 2.63s/it] 60%|█████▉ | 7359/12313 [5:30:27<3:32:48, 2.58s/it] {'loss': 0.6815, 'grad_norm': 3.7989831364132, 'learning_rate': 1.838830033542291e-06, 'epoch': 0.6} 60%|█████▉ | 7359/12313 [5:30:27<3:32:48, 2.58s/it] 60%|█████▉ | 7360/12313 [5:30:29<3:30:29, 2.55s/it] {'loss': 0.4749, 'grad_norm': 2.3766163305666153, 'learning_rate': 1.8381958492388873e-06, 'epoch': 0.6} 60%|█████▉ | 7360/12313 [5:30:29<3:30:29, 2.55s/it] 60%|█████▉ | 7361/12313 [5:30:32<3:34:14, 2.60s/it] {'loss': 0.5948, 'grad_norm': 3.0966906185261993, 'learning_rate': 1.837561710728828e-06, 'epoch': 0.6} 60%|█████▉ | 7361/12313 [5:30:32<3:34:14, 2.60s/it] 60%|█████▉ | 7362/12313 [5:30:35<3:36:38, 2.63s/it] {'loss': 0.3566, 'grad_norm': 8.468628755593905, 'learning_rate': 1.8369276180559933e-06, 'epoch': 0.6} 60%|█████▉ | 7362/12313 [5:30:35<3:36:38, 2.63s/it] 60%|█████▉ | 7363/12313 [5:30:37<3:38:49, 2.65s/it] {'loss': 0.4737, 'grad_norm': 9.11735062017391, 'learning_rate': 1.836293571264258e-06, 'epoch': 0.6} 60%|█████▉ | 7363/12313 [5:30:37<3:38:49, 2.65s/it] 60%|█████▉ | 7364/12313 [5:30:40<3:44:23, 2.72s/it] {'loss': 0.4476, 'grad_norm': 5.041492074022004, 'learning_rate': 1.835659570397494e-06, 'epoch': 0.6} 60%|█████▉ | 7364/12313 [5:30:40<3:44:23, 2.72s/it] 60%|█████▉ | 7365/12313 [5:30:43<3:43:20, 2.71s/it] {'loss': 0.6365, 'grad_norm': 3.99736329645453, 'learning_rate': 1.8350256154995733e-06, 'epoch': 0.6} 60%|█████▉ | 7365/12313 [5:30:43<3:43:20, 2.71s/it] 60%|█████▉ | 7366/12313 [5:30:46<3:40:33, 2.68s/it] {'loss': 0.4232, 'grad_norm': 8.810624359575181, 'learning_rate': 1.8343917066143597e-06, 'epoch': 0.6} 60%|█████▉ | 7366/12313 [5:30:46<3:40:33, 2.68s/it] 60%|█████▉ | 7367/12313 [5:30:48<3:38:25, 2.65s/it] {'loss': 0.4499, 'grad_norm': 4.925256449200388, 'learning_rate': 1.8337578437857169e-06, 'epoch': 0.6} 60%|█████▉ | 7367/12313 [5:30:48<3:38:25, 2.65s/it] 60%|█████▉ | 7368/12313 [5:30:51<3:51:58, 2.81s/it] {'loss': 0.6362, 'grad_norm': 3.5049195280397565, 'learning_rate': 1.8331240270575062e-06, 'epoch': 0.6} 60%|█████▉ | 7368/12313 [5:30:51<3:51:58, 2.81s/it] 60%|█████▉ | 7369/12313 [5:30:54<3:48:57, 2.78s/it] {'loss': 0.4814, 'grad_norm': 8.10123639728082, 'learning_rate': 1.8324902564735834e-06, 'epoch': 0.6} 60%|█████▉ | 7369/12313 [5:30:54<3:48:57, 2.78s/it] 60%|█████▉ | 7370/12313 [5:30:57<3:47:14, 2.76s/it] {'loss': 0.6061, 'grad_norm': 3.656092093713114, 'learning_rate': 1.831856532077801e-06, 'epoch': 0.6} 60%|█████▉ | 7370/12313 [5:30:57<3:47:14, 2.76s/it] 60%|█████▉ | 7371/12313 [5:31:00<3:46:37, 2.75s/it] {'loss': 0.5401, 'grad_norm': 5.372488168989093, 'learning_rate': 1.831222853914012e-06, 'epoch': 0.6} 60%|█████▉ | 7371/12313 [5:31:00<3:46:37, 2.75s/it] 60%|█████▉ | 7372/12313 [5:31:02<3:40:08, 2.67s/it] {'loss': 0.4336, 'grad_norm': 5.279583180258407, 'learning_rate': 1.830589222026062e-06, 'epoch': 0.6} 60%|█████▉ | 7372/12313 [5:31:02<3:40:08, 2.67s/it] 60%|█████▉ | 7373/12313 [5:31:05<3:40:41, 2.68s/it] {'loss': 0.6477, 'grad_norm': 8.322203353642944, 'learning_rate': 1.8299556364577936e-06, 'epoch': 0.6} 60%|█████▉ | 7373/12313 [5:31:05<3:40:41, 2.68s/it] 60%|█████▉ | 7374/12313 [5:31:07<3:40:10, 2.67s/it] {'loss': 0.5287, 'grad_norm': 4.359529690940866, 'learning_rate': 1.8293220972530498e-06, 'epoch': 0.6} 60%|█████▉ | 7374/12313 [5:31:07<3:40:10, 2.67s/it] 60%|█████▉ | 7375/12313 [5:31:10<3:40:32, 2.68s/it] {'loss': 0.4167, 'grad_norm': 4.599497389802597, 'learning_rate': 1.8286886044556678e-06, 'epoch': 0.6} 60%|█████▉ | 7375/12313 [5:31:10<3:40:32, 2.68s/it] 60%|█████▉ | 7376/12313 [5:31:12<3:32:58, 2.59s/it] {'loss': 0.4743, 'grad_norm': 4.944453273477391, 'learning_rate': 1.8280551581094808e-06, 'epoch': 0.6} 60%|█████▉ | 7376/12313 [5:31:12<3:32:58, 2.59s/it] 60%|█████▉ | 7377/12313 [5:31:15<3:31:06, 2.57s/it] {'loss': 0.6592, 'grad_norm': 6.818339130298646, 'learning_rate': 1.8274217582583207e-06, 'epoch': 0.6} 60%|█████▉ | 7377/12313 [5:31:15<3:31:06, 2.57s/it] 60%|█████▉ | 7378/12313 [5:31:18<3:34:29, 2.61s/it] {'loss': 0.5305, 'grad_norm': 7.668165030066059, 'learning_rate': 1.826788404946016e-06, 'epoch': 0.6} 60%|█████▉ | 7378/12313 [5:31:18<3:34:29, 2.61s/it] 60%|█████▉ | 7379/12313 [5:31:20<3:34:13, 2.61s/it] {'loss': 0.5133, 'grad_norm': 4.13273227160322, 'learning_rate': 1.8261550982163904e-06, 'epoch': 0.6} 60%|█████▉ | 7379/12313 [5:31:20<3:34:13, 2.61s/it] 60%|█████▉ | 7380/12313 [5:31:23<3:36:28, 2.63s/it] {'loss': 0.5664, 'grad_norm': 2.7993562519365196, 'learning_rate': 1.825521838113265e-06, 'epoch': 0.6} 60%|█████▉ | 7380/12313 [5:31:23<3:36:28, 2.63s/it] 60%|█████▉ | 7381/12313 [5:31:26<3:38:41, 2.66s/it] {'loss': 0.4671, 'grad_norm': 9.626646869624484, 'learning_rate': 1.8248886246804598e-06, 'epoch': 0.6} 60%|█████▉ | 7381/12313 [5:31:26<3:38:41, 2.66s/it] 60%|█████▉ | 7382/12313 [5:31:28<3:33:50, 2.60s/it] {'loss': 0.5715, 'grad_norm': 7.444176905061055, 'learning_rate': 1.8242554579617883e-06, 'epoch': 0.6} 60%|█████▉ | 7382/12313 [5:31:28<3:33:50, 2.60s/it] 60%|█████▉ | 7383/12313 [5:31:31<3:34:06, 2.61s/it] {'loss': 0.4073, 'grad_norm': 9.588196848439587, 'learning_rate': 1.8236223380010625e-06, 'epoch': 0.6} 60%|█████▉ | 7383/12313 [5:31:31<3:34:06, 2.61s/it] 60%|█████▉ | 7384/12313 [5:31:33<3:32:44, 2.59s/it] {'loss': 0.6084, 'grad_norm': 6.866343301362604, 'learning_rate': 1.8229892648420922e-06, 'epoch': 0.6} 60%|█████▉ | 7384/12313 [5:31:33<3:32:44, 2.59s/it] 60%|█████▉ | 7385/12313 [5:31:36<3:33:40, 2.60s/it] {'loss': 0.4774, 'grad_norm': 5.01419995093504, 'learning_rate': 1.8223562385286809e-06, 'epoch': 0.6} 60%|█████▉ | 7385/12313 [5:31:36<3:33:40, 2.60s/it] 60%|█████▉ | 7386/12313 [5:31:39<3:35:21, 2.62s/it] {'loss': 0.5421, 'grad_norm': 5.956266278463153, 'learning_rate': 1.8217232591046313e-06, 'epoch': 0.6} 60%|█████▉ | 7386/12313 [5:31:39<3:35:21, 2.62s/it] 60%|█████▉ | 7387/12313 [5:31:41<3:36:25, 2.64s/it] {'loss': 0.7168, 'grad_norm': 5.610744689095254, 'learning_rate': 1.8210903266137434e-06, 'epoch': 0.6} 60%|█████▉ | 7387/12313 [5:31:41<3:36:25, 2.64s/it] 60%|██████ | 7388/12313 [5:31:44<3:31:25, 2.58s/it] {'loss': 0.5227, 'grad_norm': 4.657987631786978, 'learning_rate': 1.8204574410998119e-06, 'epoch': 0.6} 60%|██████ | 7388/12313 [5:31:44<3:31:25, 2.58s/it] 60%|██████ | 7389/12313 [5:31:46<3:34:25, 2.61s/it] {'loss': 0.4822, 'grad_norm': 4.434514515163585, 'learning_rate': 1.8198246026066279e-06, 'epoch': 0.6} 60%|██████ | 7389/12313 [5:31:46<3:34:25, 2.61s/it] 60%|██████ | 7390/12313 [5:31:49<3:45:07, 2.74s/it] {'loss': 0.579, 'grad_norm': 5.183467175096997, 'learning_rate': 1.819191811177982e-06, 'epoch': 0.6} 60%|██████ | 7390/12313 [5:31:49<3:45:07, 2.74s/it] 60%|██████ | 7391/12313 [5:31:52<3:42:08, 2.71s/it] {'loss': 0.6555, 'grad_norm': 4.579031661588377, 'learning_rate': 1.8185590668576602e-06, 'epoch': 0.6} 60%|██████ | 7391/12313 [5:31:52<3:42:08, 2.71s/it] 60%|██████ | 7392/12313 [5:31:55<3:47:33, 2.77s/it] {'loss': 0.541, 'grad_norm': 5.1765990102478066, 'learning_rate': 1.817926369689444e-06, 'epoch': 0.6} 60%|██████ | 7392/12313 [5:31:55<3:47:33, 2.77s/it] 60%|██████ | 7393/12313 [5:31:58<3:42:58, 2.72s/it] {'loss': 0.5579, 'grad_norm': 3.4939919762553977, 'learning_rate': 1.817293719717113e-06, 'epoch': 0.6} 60%|██████ | 7393/12313 [5:31:58<3:42:58, 2.72s/it] 60%|██████ | 7394/12313 [5:32:00<3:36:39, 2.64s/it] {'loss': 0.3529, 'grad_norm': 4.962538720225719, 'learning_rate': 1.8166611169844444e-06, 'epoch': 0.6} 60%|██████ | 7394/12313 [5:32:00<3:36:39, 2.64s/it] 60%|██████ | 7395/12313 [5:32:03<3:37:19, 2.65s/it] {'loss': 0.4768, 'grad_norm': 5.042600514532005, 'learning_rate': 1.8160285615352092e-06, 'epoch': 0.6} 60%|██████ | 7395/12313 [5:32:03<3:37:19, 2.65s/it] 60%|██████ | 7396/12313 [5:32:06<3:49:38, 2.80s/it] {'loss': 0.6365, 'grad_norm': 3.8637771148189097, 'learning_rate': 1.8153960534131774e-06, 'epoch': 0.6} 60%|██████ | 7396/12313 [5:32:06<3:49:38, 2.80s/it] 60%|██████ | 7397/12313 [5:32:08<3:42:24, 2.71s/it] {'loss': 0.5797, 'grad_norm': 4.22885301921691, 'learning_rate': 1.8147635926621162e-06, 'epoch': 0.6} 60%|██████ | 7397/12313 [5:32:08<3:42:24, 2.71s/it] 60%|██████ | 7398/12313 [5:32:11<3:38:55, 2.67s/it] {'loss': 0.3554, 'grad_norm': 3.6962740741884095, 'learning_rate': 1.8141311793257876e-06, 'epoch': 0.6} 60%|██████ | 7398/12313 [5:32:11<3:38:55, 2.67s/it] 60%|██████ | 7399/12313 [5:32:14<3:46:54, 2.77s/it] {'loss': 0.596, 'grad_norm': 3.758699623360753, 'learning_rate': 1.813498813447951e-06, 'epoch': 0.6} 60%|██████ | 7399/12313 [5:32:14<3:46:54, 2.77s/it] 60%|██████ | 7400/12313 [5:32:17<3:58:41, 2.91s/it] {'loss': 0.5367, 'grad_norm': 3.201807519185477, 'learning_rate': 1.812866495072364e-06, 'epoch': 0.6} 60%|██████ | 7400/12313 [5:32:17<3:58:41, 2.91s/it] 60%|██████ | 7401/12313 [5:32:20<3:54:09, 2.86s/it] {'loss': 0.4886, 'grad_norm': 4.447710834904164, 'learning_rate': 1.812234224242779e-06, 'epoch': 0.6} 60%|██████ | 7401/12313 [5:32:20<3:54:09, 2.86s/it] 60%|██████ | 7402/12313 [5:32:23<3:50:10, 2.81s/it] {'loss': 0.5678, 'grad_norm': 4.642639006126207, 'learning_rate': 1.8116020010029448e-06, 'epoch': 0.6} 60%|██████ | 7402/12313 [5:32:23<3:50:10, 2.81s/it] 60%|██████ | 7403/12313 [5:32:26<3:54:30, 2.87s/it] {'loss': 0.4742, 'grad_norm': 3.848966652372118, 'learning_rate': 1.8109698253966092e-06, 'epoch': 0.6} 60%|██████ | 7403/12313 [5:32:26<3:54:30, 2.87s/it] 60%|██████ | 7404/12313 [5:32:28<3:50:35, 2.82s/it] {'loss': 0.4872, 'grad_norm': 4.125486033292355, 'learning_rate': 1.8103376974675157e-06, 'epoch': 0.6} 60%|██████ | 7404/12313 [5:32:28<3:50:35, 2.82s/it] 60%|██████ | 7405/12313 [5:32:31<3:48:54, 2.80s/it] {'loss': 0.4748, 'grad_norm': 5.747061637911401, 'learning_rate': 1.8097056172594023e-06, 'epoch': 0.6} 60%|██████ | 7405/12313 [5:32:31<3:48:54, 2.80s/it] 60%|██████ | 7406/12313 [5:32:34<3:45:17, 2.75s/it] {'loss': 0.3921, 'grad_norm': 4.074457822433304, 'learning_rate': 1.8090735848160079e-06, 'epoch': 0.6} 60%|██████ | 7406/12313 [5:32:34<3:45:17, 2.75s/it] 60%|██████ | 7407/12313 [5:32:36<3:44:31, 2.75s/it] {'loss': 0.5085, 'grad_norm': 9.807247879545486, 'learning_rate': 1.808441600181065e-06, 'epoch': 0.6} 60%|██████ | 7407/12313 [5:32:36<3:44:31, 2.75s/it] 60%|██████ | 7408/12313 [5:32:40<3:59:07, 2.93s/it] {'loss': 0.431, 'grad_norm': 5.7128146988760475, 'learning_rate': 1.8078096633983023e-06, 'epoch': 0.6} 60%|██████ | 7408/12313 [5:32:40<3:59:07, 2.93s/it] 60%|██████ | 7409/12313 [5:32:43<3:53:59, 2.86s/it] {'loss': 0.5174, 'grad_norm': 21.8207651189527, 'learning_rate': 1.8071777745114477e-06, 'epoch': 0.6} 60%|██████ | 7409/12313 [5:32:43<3:53:59, 2.86s/it] 60%|██████ | 7410/12313 [5:32:45<3:52:45, 2.85s/it] {'loss': 0.4276, 'grad_norm': 9.425825620792596, 'learning_rate': 1.8065459335642254e-06, 'epoch': 0.6} 60%|██████ | 7410/12313 [5:32:45<3:52:45, 2.85s/it] 60%|██████ | 7411/12313 [5:32:48<3:42:20, 2.72s/it] {'loss': 0.4419, 'grad_norm': 4.023515489113687, 'learning_rate': 1.8059141406003532e-06, 'epoch': 0.6} 60%|██████ | 7411/12313 [5:32:48<3:42:20, 2.72s/it] 60%|██████ | 7412/12313 [5:32:51<3:47:09, 2.78s/it] {'loss': 0.5893, 'grad_norm': 3.615004223601, 'learning_rate': 1.8052823956635496e-06, 'epoch': 0.6} 60%|██████ | 7412/12313 [5:32:51<3:47:09, 2.78s/it] 60%|██████ | 7413/12313 [5:32:53<3:44:21, 2.75s/it] {'loss': 0.429, 'grad_norm': 5.159465901388711, 'learning_rate': 1.8046506987975278e-06, 'epoch': 0.6} 60%|██████ | 7413/12313 [5:32:53<3:44:21, 2.75s/it] 60%|██████ | 7414/12313 [5:32:56<3:41:54, 2.72s/it] {'loss': 0.6653, 'grad_norm': 3.103825361755657, 'learning_rate': 1.804019050045998e-06, 'epoch': 0.6} 60%|██████ | 7414/12313 [5:32:56<3:41:54, 2.72s/it] 60%|██████ | 7415/12313 [5:32:59<3:44:41, 2.75s/it] {'loss': 0.5391, 'grad_norm': 5.198427303718621, 'learning_rate': 1.8033874494526646e-06, 'epoch': 0.6} 60%|██████ | 7415/12313 [5:32:59<3:44:41, 2.75s/it] 60%|██████ | 7416/12313 [5:33:01<3:36:27, 2.65s/it] {'loss': 0.7089, 'grad_norm': 7.073819537725478, 'learning_rate': 1.8027558970612347e-06, 'epoch': 0.6} 60%|██████ | 7416/12313 [5:33:01<3:36:27, 2.65s/it] 60%|██████ | 7417/12313 [5:33:04<3:38:44, 2.68s/it] {'loss': 0.3996, 'grad_norm': 3.801629807198999, 'learning_rate': 1.8021243929154063e-06, 'epoch': 0.6} 60%|██████ | 7417/12313 [5:33:04<3:38:44, 2.68s/it] 60%|██████ | 7418/12313 [5:33:07<3:40:18, 2.70s/it] {'loss': 0.5828, 'grad_norm': 24.873498159563677, 'learning_rate': 1.8014929370588757e-06, 'epoch': 0.6} 60%|██████ | 7418/12313 [5:33:07<3:40:18, 2.70s/it] 60%|██████ | 7419/12313 [5:33:10<3:45:23, 2.76s/it] {'loss': 0.5204, 'grad_norm': 5.485907374005129, 'learning_rate': 1.8008615295353376e-06, 'epoch': 0.6} 60%|██████ | 7419/12313 [5:33:10<3:45:23, 2.76s/it] 60%|██████ | 7420/12313 [5:33:12<3:41:12, 2.71s/it] {'loss': 0.4032, 'grad_norm': 3.9473453351114705, 'learning_rate': 1.8002301703884816e-06, 'epoch': 0.6} 60%|██████ | 7420/12313 [5:33:12<3:41:12, 2.71s/it] 60%|██████ | 7421/12313 [5:33:15<3:40:37, 2.71s/it] {'loss': 0.4807, 'grad_norm': 4.470083941751573, 'learning_rate': 1.799598859661994e-06, 'epoch': 0.6} 60%|██████ | 7421/12313 [5:33:15<3:40:37, 2.71s/it] 60%|██████ | 7422/12313 [5:33:18<3:39:23, 2.69s/it] {'loss': 0.4221, 'grad_norm': 5.031697404411579, 'learning_rate': 1.7989675973995585e-06, 'epoch': 0.6} 60%|██████ | 7422/12313 [5:33:18<3:39:23, 2.69s/it] 60%|██████ | 7423/12313 [5:33:20<3:43:36, 2.74s/it] {'loss': 0.3792, 'grad_norm': 6.979484374974144, 'learning_rate': 1.7983363836448559e-06, 'epoch': 0.6} 60%|██████ | 7423/12313 [5:33:20<3:43:36, 2.74s/it] 60%|██████ | 7424/12313 [5:33:24<3:52:59, 2.86s/it] {'loss': 0.3426, 'grad_norm': 5.457913677824293, 'learning_rate': 1.7977052184415606e-06, 'epoch': 0.6} 60%|██████ | 7424/12313 [5:33:24<3:52:59, 2.86s/it] 60%|██████ | 7425/12313 [5:33:26<3:47:20, 2.79s/it] {'loss': 0.5527, 'grad_norm': 4.4056991242013455, 'learning_rate': 1.7970741018333482e-06, 'epoch': 0.6} 60%|██████ | 7425/12313 [5:33:26<3:47:20, 2.79s/it] 60%|██████ | 7426/12313 [5:33:29<3:42:36, 2.73s/it] {'loss': 0.392, 'grad_norm': 9.887330375392434, 'learning_rate': 1.7964430338638883e-06, 'epoch': 0.6} 60%|██████ | 7426/12313 [5:33:29<3:42:36, 2.73s/it] 60%|██████ | 7427/12313 [5:33:32<3:48:12, 2.80s/it] {'loss': 0.4971, 'grad_norm': 4.068954581333052, 'learning_rate': 1.7958120145768457e-06, 'epoch': 0.6} 60%|██████ | 7427/12313 [5:33:32<3:48:12, 2.80s/it] 60%|██████ | 7428/12313 [5:33:35<3:46:40, 2.78s/it] {'loss': 0.503, 'grad_norm': 4.434746948784985, 'learning_rate': 1.7951810440158853e-06, 'epoch': 0.6} 60%|██████ | 7428/12313 [5:33:35<3:46:40, 2.78s/it] 60%|██████ | 7429/12313 [5:33:37<3:44:36, 2.76s/it] {'loss': 0.4411, 'grad_norm': 4.082130196766388, 'learning_rate': 1.7945501222246673e-06, 'epoch': 0.6} 60%|██████ | 7429/12313 [5:33:37<3:44:36, 2.76s/it] 60%|██████ | 7430/12313 [5:33:40<3:35:43, 2.65s/it] {'loss': 0.4922, 'grad_norm': 10.979428713512291, 'learning_rate': 1.793919249246846e-06, 'epoch': 0.6} 60%|██████ | 7430/12313 [5:33:40<3:35:43, 2.65s/it] 60%|██████ | 7431/12313 [5:33:42<3:35:08, 2.64s/it] {'loss': 0.5367, 'grad_norm': 4.413393240856426, 'learning_rate': 1.7932884251260767e-06, 'epoch': 0.6} 60%|██████ | 7431/12313 [5:33:42<3:35:08, 2.64s/it] 60%|██████ | 7432/12313 [5:33:45<3:37:08, 2.67s/it] {'loss': 0.5533, 'grad_norm': 3.815015398191507, 'learning_rate': 1.7926576499060078e-06, 'epoch': 0.6} 60%|██████ | 7432/12313 [5:33:45<3:37:08, 2.67s/it] 60%|██████ | 7433/12313 [5:33:48<3:43:52, 2.75s/it] {'loss': 0.4204, 'grad_norm': 4.993930990302795, 'learning_rate': 1.7920269236302868e-06, 'epoch': 0.6} 60%|██████ | 7433/12313 [5:33:48<3:43:52, 2.75s/it] 60%|██████ | 7434/12313 [5:33:51<3:50:00, 2.83s/it] {'loss': 0.5547, 'grad_norm': 4.907619542220186, 'learning_rate': 1.7913962463425544e-06, 'epoch': 0.6} 60%|██████ | 7434/12313 [5:33:51<3:50:00, 2.83s/it] 60%|██████ | 7435/12313 [5:33:54<3:51:39, 2.85s/it] {'loss': 0.6044, 'grad_norm': 4.135750795376068, 'learning_rate': 1.7907656180864519e-06, 'epoch': 0.6} 60%|██████ | 7435/12313 [5:33:54<3:51:39, 2.85s/it] 60%|██████ | 7436/12313 [5:33:56<3:41:42, 2.73s/it] {'loss': 0.5229, 'grad_norm': 15.259932160511177, 'learning_rate': 1.790135038905616e-06, 'epoch': 0.6} 60%|██████ | 7436/12313 [5:33:56<3:41:42, 2.73s/it] 60%|██████ | 7437/12313 [5:33:59<3:40:10, 2.71s/it] {'loss': 0.4935, 'grad_norm': 5.126091226363883, 'learning_rate': 1.7895045088436772e-06, 'epoch': 0.6} 60%|██████ | 7437/12313 [5:33:59<3:40:10, 2.71s/it] 60%|██████ | 7438/12313 [5:34:02<3:48:49, 2.82s/it] {'loss': 0.4883, 'grad_norm': 4.347192454488315, 'learning_rate': 1.7888740279442669e-06, 'epoch': 0.6} 60%|██████ | 7438/12313 [5:34:02<3:48:49, 2.82s/it] 60%|██████ | 7439/12313 [5:34:05<3:45:32, 2.78s/it] {'loss': 0.6231, 'grad_norm': 8.459728931856809, 'learning_rate': 1.7882435962510102e-06, 'epoch': 0.6} 60%|██████ | 7439/12313 [5:34:05<3:45:32, 2.78s/it] 60%|██████ | 7440/12313 [5:34:07<3:42:56, 2.74s/it] {'loss': 0.6246, 'grad_norm': 4.357956562088014, 'learning_rate': 1.7876132138075292e-06, 'epoch': 0.6} 60%|██████ | 7440/12313 [5:34:07<3:42:56, 2.74s/it] 60%|██████ | 7441/12313 [5:34:10<3:48:19, 2.81s/it] {'loss': 0.3546, 'grad_norm': 5.116492859940842, 'learning_rate': 1.786982880657444e-06, 'epoch': 0.6} 60%|██████ | 7441/12313 [5:34:10<3:48:19, 2.81s/it] 60%|██████ | 7442/12313 [5:34:13<3:40:34, 2.72s/it] {'loss': 0.3998, 'grad_norm': 4.281674880545698, 'learning_rate': 1.7863525968443705e-06, 'epoch': 0.6} 60%|██████ | 7442/12313 [5:34:13<3:40:34, 2.72s/it] 60%|██████ | 7443/12313 [5:34:16<3:39:38, 2.71s/it] {'loss': 0.434, 'grad_norm': 4.231714096569343, 'learning_rate': 1.785722362411919e-06, 'epoch': 0.6} 60%|██████ | 7443/12313 [5:34:16<3:39:38, 2.71s/it] 60%|██████ | 7444/12313 [5:34:18<3:36:30, 2.67s/it] {'loss': 0.4972, 'grad_norm': 4.3628558475147265, 'learning_rate': 1.7850921774037012e-06, 'epoch': 0.6} 60%|██████ | 7444/12313 [5:34:18<3:36:30, 2.67s/it] 60%|██████ | 7445/12313 [5:34:21<3:39:37, 2.71s/it] {'loss': 0.5716, 'grad_norm': 3.8871791003489866, 'learning_rate': 1.7844620418633202e-06, 'epoch': 0.6} 60%|██████ | 7445/12313 [5:34:21<3:39:37, 2.71s/it] 60%|██████ | 7446/12313 [5:34:24<3:44:28, 2.77s/it] {'loss': 0.5441, 'grad_norm': 5.078543576155098, 'learning_rate': 1.7838319558343786e-06, 'epoch': 0.6} 60%|██████ | 7446/12313 [5:34:24<3:44:28, 2.77s/it] 60%|██████ | 7447/12313 [5:34:27<3:50:01, 2.84s/it] {'loss': 0.5189, 'grad_norm': 11.062361952259796, 'learning_rate': 1.7832019193604767e-06, 'epoch': 0.6} 60%|██████ | 7447/12313 [5:34:27<3:50:01, 2.84s/it] 60%|██████ | 7448/12313 [5:34:29<3:40:45, 2.72s/it] {'loss': 0.5409, 'grad_norm': 5.15437180400868, 'learning_rate': 1.7825719324852075e-06, 'epoch': 0.6} 60%|██████ | 7448/12313 [5:34:29<3:40:45, 2.72s/it] 60%|██████ | 7449/12313 [5:34:32<3:32:20, 2.62s/it] {'loss': 0.4818, 'grad_norm': 6.299357609539531, 'learning_rate': 1.7819419952521645e-06, 'epoch': 0.6} 60%|██████ | 7449/12313 [5:34:32<3:32:20, 2.62s/it] 61%|██████ | 7450/12313 [5:34:35<3:41:26, 2.73s/it] {'loss': 0.5361, 'grad_norm': 3.001544405666737, 'learning_rate': 1.7813121077049336e-06, 'epoch': 0.61} 61%|██████ | 7450/12313 [5:34:35<3:41:26, 2.73s/it] 61%|██████ | 7451/12313 [5:34:37<3:33:56, 2.64s/it] {'loss': 0.6223, 'grad_norm': 4.256801862178172, 'learning_rate': 1.7806822698871022e-06, 'epoch': 0.61} 61%|██████ | 7451/12313 [5:34:37<3:33:56, 2.64s/it] 61%|██████ | 7452/12313 [5:34:40<3:36:52, 2.68s/it] {'loss': 0.4201, 'grad_norm': 3.963672219119554, 'learning_rate': 1.780052481842251e-06, 'epoch': 0.61} 61%|██████ | 7452/12313 [5:34:40<3:36:52, 2.68s/it] 61%|██████ | 7453/12313 [5:34:43<3:43:09, 2.76s/it] {'loss': 0.5345, 'grad_norm': 3.995116455512054, 'learning_rate': 1.7794227436139569e-06, 'epoch': 0.61} 61%|██████ | 7453/12313 [5:34:43<3:43:09, 2.76s/it] 61%|██████ | 7454/12313 [5:34:46<3:43:01, 2.75s/it] {'loss': 0.5772, 'grad_norm': 6.799955515987135, 'learning_rate': 1.778793055245796e-06, 'epoch': 0.61} 61%|██████ | 7454/12313 [5:34:46<3:43:01, 2.75s/it] 61%|██████ | 7455/12313 [5:34:48<3:38:44, 2.70s/it] {'loss': 0.6087, 'grad_norm': 5.467231768671727, 'learning_rate': 1.7781634167813388e-06, 'epoch': 0.61} 61%|██████ | 7455/12313 [5:34:48<3:38:44, 2.70s/it] 61%|██████ | 7456/12313 [5:34:51<3:32:48, 2.63s/it] {'loss': 0.4178, 'grad_norm': 5.970835497003089, 'learning_rate': 1.7775338282641525e-06, 'epoch': 0.61} 61%|██████ | 7456/12313 [5:34:51<3:32:48, 2.63s/it] 61%|██████ | 7457/12313 [5:34:53<3:35:53, 2.67s/it] {'loss': 0.5252, 'grad_norm': 4.908729638196646, 'learning_rate': 1.776904289737802e-06, 'epoch': 0.61} 61%|██████ | 7457/12313 [5:34:53<3:35:53, 2.67s/it] 61%|██████ | 7458/12313 [5:34:56<3:33:53, 2.64s/it] {'loss': 0.527, 'grad_norm': 8.614648749798091, 'learning_rate': 1.7762748012458481e-06, 'epoch': 0.61} 61%|██████ | 7458/12313 [5:34:56<3:33:53, 2.64s/it] 61%|██████ | 7459/12313 [5:34:58<3:29:57, 2.60s/it] {'loss': 0.4565, 'grad_norm': 5.933581667639831, 'learning_rate': 1.7756453628318465e-06, 'epoch': 0.61} 61%|██████ | 7459/12313 [5:34:58<3:29:57, 2.60s/it] 61%|██████ | 7460/12313 [5:35:01<3:39:13, 2.71s/it] {'loss': 0.4551, 'grad_norm': 3.8154357222510034, 'learning_rate': 1.7750159745393536e-06, 'epoch': 0.61} 61%|██████ | 7460/12313 [5:35:01<3:39:13, 2.71s/it] 61%|██████ | 7461/12313 [5:35:04<3:38:10, 2.70s/it] {'loss': 0.473, 'grad_norm': 7.8464108786714055, 'learning_rate': 1.7743866364119175e-06, 'epoch': 0.61} 61%|██████ | 7461/12313 [5:35:04<3:38:10, 2.70s/it] 61%|██████ | 7462/12313 [5:35:07<3:37:29, 2.69s/it] {'loss': 0.6192, 'grad_norm': 4.514141406557562, 'learning_rate': 1.7737573484930853e-06, 'epoch': 0.61} 61%|██████ | 7462/12313 [5:35:07<3:37:29, 2.69s/it] 61%|██████ | 7463/12313 [5:35:09<3:35:16, 2.66s/it] {'loss': 0.5783, 'grad_norm': 4.9804191071501585, 'learning_rate': 1.7731281108264025e-06, 'epoch': 0.61} 61%|██████ | 7463/12313 [5:35:09<3:35:16, 2.66s/it] 61%|██████ | 7464/12313 [5:35:12<3:40:03, 2.72s/it] {'loss': 0.4875, 'grad_norm': 3.8086232760432837, 'learning_rate': 1.7724989234554068e-06, 'epoch': 0.61} 61%|██████ | 7464/12313 [5:35:12<3:40:03, 2.72s/it] 61%|██████ | 7465/12313 [5:35:15<3:51:47, 2.87s/it] {'loss': 0.4404, 'grad_norm': 4.453392068019001, 'learning_rate': 1.7718697864236344e-06, 'epoch': 0.61} 61%|██████ | 7465/12313 [5:35:15<3:51:47, 2.87s/it] 61%|██████ | 7466/12313 [5:35:18<3:44:31, 2.78s/it] {'loss': 0.3888, 'grad_norm': 6.631970307210538, 'learning_rate': 1.771240699774621e-06, 'epoch': 0.61} 61%|██████ | 7466/12313 [5:35:18<3:44:31, 2.78s/it] 61%|██████ | 7467/12313 [5:35:20<3:35:54, 2.67s/it] {'loss': 0.5312, 'grad_norm': 3.3473485905433815, 'learning_rate': 1.7706116635518933e-06, 'epoch': 0.61} 61%|██████ | 7467/12313 [5:35:20<3:35:54, 2.67s/it] 61%|██████ | 7468/12313 [5:35:23<3:34:01, 2.65s/it] {'loss': 0.4054, 'grad_norm': 8.74164772322533, 'learning_rate': 1.7699826777989788e-06, 'epoch': 0.61} 61%|██████ | 7468/12313 [5:35:23<3:34:01, 2.65s/it] 61%|██████ | 7469/12313 [5:35:26<3:33:54, 2.65s/it] {'loss': 0.5257, 'grad_norm': 4.144159190764816, 'learning_rate': 1.7693537425593984e-06, 'epoch': 0.61} 61%|██████ | 7469/12313 [5:35:26<3:33:54, 2.65s/it] 61%|██████ | 7470/12313 [5:35:28<3:39:13, 2.72s/it] {'loss': 0.3604, 'grad_norm': 4.098313315005926, 'learning_rate': 1.7687248578766727e-06, 'epoch': 0.61} 61%|██████ | 7470/12313 [5:35:28<3:39:13, 2.72s/it] 61%|██████ | 7471/12313 [5:35:31<3:35:22, 2.67s/it] {'loss': 0.6578, 'grad_norm': 6.610586143194924, 'learning_rate': 1.7680960237943174e-06, 'epoch': 0.61} 61%|██████ | 7471/12313 [5:35:31<3:35:22, 2.67s/it] 61%|██████ | 7472/12313 [5:35:34<3:40:05, 2.73s/it] {'loss': 0.4375, 'grad_norm': 4.968803137717695, 'learning_rate': 1.7674672403558421e-06, 'epoch': 0.61} 61%|██████ | 7472/12313 [5:35:34<3:40:05, 2.73s/it] 61%|██████ | 7473/12313 [5:35:37<3:42:19, 2.76s/it] {'loss': 0.4943, 'grad_norm': 7.6507424354276, 'learning_rate': 1.7668385076047584e-06, 'epoch': 0.61} 61%|██████ | 7473/12313 [5:35:37<3:42:19, 2.76s/it] 61%|██████ | 7474/12313 [5:35:39<3:39:23, 2.72s/it] {'loss': 0.6319, 'grad_norm': 5.452374844312675, 'learning_rate': 1.7662098255845689e-06, 'epoch': 0.61} 61%|██████ | 7474/12313 [5:35:39<3:39:23, 2.72s/it] 61%|██████ | 7475/12313 [5:35:42<3:39:05, 2.72s/it] {'loss': 0.4869, 'grad_norm': 10.145687214568056, 'learning_rate': 1.7655811943387758e-06, 'epoch': 0.61} 61%|██████ | 7475/12313 [5:35:42<3:39:05, 2.72s/it] 61%|██████ | 7476/12313 [5:35:45<3:36:04, 2.68s/it] {'loss': 0.5788, 'grad_norm': 4.614412678031245, 'learning_rate': 1.764952613910878e-06, 'epoch': 0.61} 61%|██████ | 7476/12313 [5:35:45<3:36:04, 2.68s/it] 61%|██████ | 7477/12313 [5:35:47<3:33:46, 2.65s/it] {'loss': 0.505, 'grad_norm': 7.111983931763767, 'learning_rate': 1.7643240843443686e-06, 'epoch': 0.61} 61%|██████ | 7477/12313 [5:35:47<3:33:46, 2.65s/it] 61%|██████ | 7478/12313 [5:35:50<3:33:58, 2.66s/it] {'loss': 0.6297, 'grad_norm': 8.127180595467415, 'learning_rate': 1.7636956056827384e-06, 'epoch': 0.61} 61%|██████ | 7478/12313 [5:35:50<3:33:58, 2.66s/it] 61%|██████ | 7479/12313 [5:35:53<3:33:09, 2.65s/it] {'loss': 0.5801, 'grad_norm': 6.123717134281918, 'learning_rate': 1.7630671779694768e-06, 'epoch': 0.61} 61%|██████ | 7479/12313 [5:35:53<3:33:09, 2.65s/it] 61%|██████ | 7480/12313 [5:35:55<3:30:54, 2.62s/it] {'loss': 0.4418, 'grad_norm': 5.652896769901682, 'learning_rate': 1.7624388012480656e-06, 'epoch': 0.61} 61%|██████ | 7480/12313 [5:35:55<3:30:54, 2.62s/it] 61%|██████ | 7481/12313 [5:35:58<3:36:08, 2.68s/it] {'loss': 0.4421, 'grad_norm': 5.923671663679729, 'learning_rate': 1.7618104755619852e-06, 'epoch': 0.61} 61%|██████ | 7481/12313 [5:35:58<3:36:08, 2.68s/it] 61%|██████ | 7482/12313 [5:36:01<3:34:35, 2.67s/it] {'loss': 0.7917, 'grad_norm': 5.6081581939098495, 'learning_rate': 1.7611822009547143e-06, 'epoch': 0.61} 61%|██████ | 7482/12313 [5:36:01<3:34:35, 2.67s/it] 61%|██████ | 7483/12313 [5:36:03<3:38:53, 2.72s/it] {'loss': 0.567, 'grad_norm': 5.284990419872968, 'learning_rate': 1.7605539774697244e-06, 'epoch': 0.61} 61%|██████ | 7483/12313 [5:36:03<3:38:53, 2.72s/it] 61%|██████ | 7484/12313 [5:36:06<3:38:53, 2.72s/it] {'loss': 0.5853, 'grad_norm': 7.660780757932287, 'learning_rate': 1.7599258051504856e-06, 'epoch': 0.61} 61%|██████ | 7484/12313 [5:36:06<3:38:53, 2.72s/it] 61%|██████ | 7485/12313 [5:36:09<3:34:19, 2.66s/it] {'loss': 0.4899, 'grad_norm': 4.687240171351279, 'learning_rate': 1.7592976840404652e-06, 'epoch': 0.61} 61%|██████ | 7485/12313 [5:36:09<3:34:19, 2.66s/it] 61%|██████ | 7486/12313 [5:36:11<3:27:39, 2.58s/it] {'loss': 0.4768, 'grad_norm': 4.470994352643192, 'learning_rate': 1.7586696141831242e-06, 'epoch': 0.61} 61%|██████ | 7486/12313 [5:36:11<3:27:39, 2.58s/it] 61%|██████ | 7487/12313 [5:36:14<3:26:53, 2.57s/it] {'loss': 0.3794, 'grad_norm': 5.24599057514584, 'learning_rate': 1.7580415956219229e-06, 'epoch': 0.61} 61%|██████ | 7487/12313 [5:36:14<3:26:53, 2.57s/it] 61%|██████ | 7488/12313 [5:36:17<3:35:36, 2.68s/it] {'loss': 0.5302, 'grad_norm': 4.900299331302958, 'learning_rate': 1.7574136284003158e-06, 'epoch': 0.61} 61%|██████ | 7488/12313 [5:36:17<3:35:36, 2.68s/it] 61%|██████ | 7489/12313 [5:36:19<3:38:45, 2.72s/it] {'loss': 0.4212, 'grad_norm': 4.555248134756306, 'learning_rate': 1.756785712561756e-06, 'epoch': 0.61} 61%|██████ | 7489/12313 [5:36:19<3:38:45, 2.72s/it] 61%|██████ | 7490/12313 [5:36:22<3:32:35, 2.64s/it] {'loss': 0.4278, 'grad_norm': 7.786367658307978, 'learning_rate': 1.7561578481496917e-06, 'epoch': 0.61} 61%|██████ | 7490/12313 [5:36:22<3:32:35, 2.64s/it] 61%|██████ | 7491/12313 [5:36:24<3:32:52, 2.65s/it] {'loss': 0.6145, 'grad_norm': 4.5786486381903515, 'learning_rate': 1.7555300352075662e-06, 'epoch': 0.61} 61%|██████ | 7491/12313 [5:36:24<3:32:52, 2.65s/it] 61%|██████ | 7492/12313 [5:36:27<3:32:19, 2.64s/it] {'loss': 0.4126, 'grad_norm': 7.245725994286198, 'learning_rate': 1.7549022737788241e-06, 'epoch': 0.61} 61%|██████ | 7492/12313 [5:36:27<3:32:19, 2.64s/it] 61%|██████ | 7493/12313 [5:36:30<3:40:40, 2.75s/it] {'loss': 0.5058, 'grad_norm': 3.67634983056109, 'learning_rate': 1.7542745639069004e-06, 'epoch': 0.61} 61%|██████ | 7493/12313 [5:36:30<3:40:40, 2.75s/it] 61%|██████ | 7494/12313 [5:36:33<3:46:29, 2.82s/it] {'loss': 0.3767, 'grad_norm': 9.370780365887464, 'learning_rate': 1.7536469056352296e-06, 'epoch': 0.61} 61%|██████ | 7494/12313 [5:36:33<3:46:29, 2.82s/it] 61%|██████ | 7495/12313 [5:36:36<3:42:03, 2.77s/it] {'loss': 0.5582, 'grad_norm': 2.966365312799223, 'learning_rate': 1.7530192990072436e-06, 'epoch': 0.61} 61%|██████ | 7495/12313 [5:36:36<3:42:03, 2.77s/it] 61%|██████ | 7496/12313 [5:36:38<3:40:54, 2.75s/it] {'loss': 0.409, 'grad_norm': 5.219714565139399, 'learning_rate': 1.7523917440663687e-06, 'epoch': 0.61} 61%|██████ | 7496/12313 [5:36:38<3:40:54, 2.75s/it] 61%|██████ | 7497/12313 [5:36:41<3:39:11, 2.73s/it] {'loss': 0.516, 'grad_norm': 7.275963784070534, 'learning_rate': 1.7517642408560278e-06, 'epoch': 0.61} 61%|██████ | 7497/12313 [5:36:41<3:39:11, 2.73s/it] 61%|██████ | 7498/12313 [5:36:44<3:38:51, 2.73s/it] {'loss': 0.5271, 'grad_norm': 5.53725602534592, 'learning_rate': 1.7511367894196426e-06, 'epoch': 0.61} 61%|██████ | 7498/12313 [5:36:44<3:38:51, 2.73s/it] 61%|██████ | 7499/12313 [5:36:47<3:39:23, 2.73s/it] {'loss': 0.4234, 'grad_norm': 6.420027579961015, 'learning_rate': 1.7505093898006275e-06, 'epoch': 0.61} 61%|██████ | 7499/12313 [5:36:47<3:39:23, 2.73s/it] 61%|██████ | 7500/12313 [5:36:49<3:42:03, 2.77s/it] {'loss': 0.5215, 'grad_norm': 5.134669425660005, 'learning_rate': 1.749882042042396e-06, 'epoch': 0.61} 61%|██████ | 7500/12313 [5:36:49<3:42:03, 2.77s/it] 61%|██████ | 7501/12313 [5:36:52<3:44:19, 2.80s/it] {'loss': 0.4116, 'grad_norm': 3.969108933592243, 'learning_rate': 1.749254746188358e-06, 'epoch': 0.61} 61%|██████ | 7501/12313 [5:36:52<3:44:19, 2.80s/it] 61%|██████ | 7502/12313 [5:36:55<3:35:00, 2.68s/it] {'loss': 0.4249, 'grad_norm': 9.697561858550591, 'learning_rate': 1.7486275022819183e-06, 'epoch': 0.61} 61%|██████ | 7502/12313 [5:36:55<3:35:00, 2.68s/it] 61%|██████ | 7503/12313 [5:36:57<3:36:13, 2.70s/it] {'loss': 0.5176, 'grad_norm': 6.498690544614313, 'learning_rate': 1.748000310366478e-06, 'epoch': 0.61} 61%|██████ | 7503/12313 [5:36:57<3:36:13, 2.70s/it] 61%|██████ | 7504/12313 [5:37:00<3:30:01, 2.62s/it] {'loss': 0.566, 'grad_norm': 3.9324281115521202, 'learning_rate': 1.7473731704854363e-06, 'epoch': 0.61} 61%|██████ | 7504/12313 [5:37:00<3:30:01, 2.62s/it] 61%|██████ | 7505/12313 [5:37:03<3:35:24, 2.69s/it] {'loss': 0.4352, 'grad_norm': 8.158895435967443, 'learning_rate': 1.7467460826821885e-06, 'epoch': 0.61} 61%|██████ | 7505/12313 [5:37:03<3:35:24, 2.69s/it] 61%|██████ | 7506/12313 [5:37:05<3:34:04, 2.67s/it] {'loss': 0.4992, 'grad_norm': 6.125377617707914, 'learning_rate': 1.7461190470001252e-06, 'epoch': 0.61} 61%|██████ | 7506/12313 [5:37:05<3:34:04, 2.67s/it] 61%|██████ | 7507/12313 [5:37:08<3:35:45, 2.69s/it] {'loss': 0.4183, 'grad_norm': 4.208156614795222, 'learning_rate': 1.7454920634826334e-06, 'epoch': 0.61} 61%|██████ | 7507/12313 [5:37:08<3:35:45, 2.69s/it] 61%|██████ | 7508/12313 [5:37:11<3:37:56, 2.72s/it] {'loss': 0.4898, 'grad_norm': 11.55413920993596, 'learning_rate': 1.7448651321730985e-06, 'epoch': 0.61} 61%|██████ | 7508/12313 [5:37:11<3:37:56, 2.72s/it] 61%|██████ | 7509/12313 [5:37:14<3:36:02, 2.70s/it] {'loss': 0.4855, 'grad_norm': 4.906701536886049, 'learning_rate': 1.7442382531148993e-06, 'epoch': 0.61} 61%|██████ | 7509/12313 [5:37:14<3:36:02, 2.70s/it] 61%|██████ | 7510/12313 [5:37:16<3:36:51, 2.71s/it] {'loss': 0.4437, 'grad_norm': 4.012360366132999, 'learning_rate': 1.743611426351413e-06, 'epoch': 0.61} 61%|██████ | 7510/12313 [5:37:16<3:36:51, 2.71s/it] 61%|██████ | 7511/12313 [5:37:19<3:36:57, 2.71s/it] {'loss': 0.4269, 'grad_norm': 4.507122998130638, 'learning_rate': 1.7429846519260139e-06, 'epoch': 0.61} 61%|██████ | 7511/12313 [5:37:19<3:36:57, 2.71s/it] 61%|██████ | 7512/12313 [5:37:22<3:36:20, 2.70s/it] {'loss': 0.4297, 'grad_norm': 7.688714435664871, 'learning_rate': 1.7423579298820698e-06, 'epoch': 0.61} 61%|██████ | 7512/12313 [5:37:22<3:36:20, 2.70s/it] 61%|██████ | 7513/12313 [5:37:24<3:35:52, 2.70s/it] {'loss': 0.4187, 'grad_norm': 4.5702359534664145, 'learning_rate': 1.7417312602629466e-06, 'epoch': 0.61} 61%|██████ | 7513/12313 [5:37:24<3:35:52, 2.70s/it] 61%|██████ | 7514/12313 [5:37:27<3:33:44, 2.67s/it] {'loss': 0.4983, 'grad_norm': 7.677278210562309, 'learning_rate': 1.7411046431120082e-06, 'epoch': 0.61} 61%|██████ | 7514/12313 [5:37:27<3:33:44, 2.67s/it] 61%|██████ | 7515/12313 [5:37:29<3:29:49, 2.62s/it] {'loss': 0.6269, 'grad_norm': 6.1185795533491545, 'learning_rate': 1.7404780784726113e-06, 'epoch': 0.61} 61%|██████ | 7515/12313 [5:37:29<3:29:49, 2.62s/it] 61%|██████ | 7516/12313 [5:37:32<3:28:39, 2.61s/it] {'loss': 0.595, 'grad_norm': 14.544008771708137, 'learning_rate': 1.7398515663881117e-06, 'epoch': 0.61} 61%|██████ | 7516/12313 [5:37:32<3:28:39, 2.61s/it] 61%|██████ | 7517/12313 [5:37:35<3:26:57, 2.59s/it] {'loss': 0.6077, 'grad_norm': 10.989546459503224, 'learning_rate': 1.7392251069018612e-06, 'epoch': 0.61} 61%|██████ | 7517/12313 [5:37:35<3:26:57, 2.59s/it] 61%|██████ | 7518/12313 [5:37:37<3:30:09, 2.63s/it] {'loss': 0.558, 'grad_norm': 6.706531652553325, 'learning_rate': 1.7385987000572072e-06, 'epoch': 0.61} 61%|██████ | 7518/12313 [5:37:37<3:30:09, 2.63s/it] 61%|██████ | 7519/12313 [5:37:40<3:31:25, 2.65s/it] {'loss': 0.4935, 'grad_norm': 3.482810028814553, 'learning_rate': 1.7379723458974923e-06, 'epoch': 0.61} 61%|██████ | 7519/12313 [5:37:40<3:31:25, 2.65s/it] 61%|██████ | 7520/12313 [5:37:43<3:39:52, 2.75s/it] {'loss': 0.5477, 'grad_norm': 4.216977832082727, 'learning_rate': 1.737346044466059e-06, 'epoch': 0.61} 61%|██████ | 7520/12313 [5:37:43<3:39:52, 2.75s/it] 61%|██████ | 7521/12313 [5:37:46<3:37:50, 2.73s/it] {'loss': 0.561, 'grad_norm': 5.407693817919376, 'learning_rate': 1.7367197958062432e-06, 'epoch': 0.61} 61%|██████ | 7521/12313 [5:37:46<3:37:50, 2.73s/it] 61%|██████ | 7522/12313 [5:37:48<3:31:26, 2.65s/it] {'loss': 0.7106, 'grad_norm': 7.6500405315944215, 'learning_rate': 1.7360935999613777e-06, 'epoch': 0.61} 61%|██████ | 7522/12313 [5:37:48<3:31:26, 2.65s/it] 61%|██████ | 7523/12313 [5:37:51<3:31:04, 2.64s/it] {'loss': 0.4518, 'grad_norm': 8.727840725862155, 'learning_rate': 1.7354674569747914e-06, 'epoch': 0.61} 61%|██████ | 7523/12313 [5:37:51<3:31:04, 2.64s/it] 61%|██████ | 7524/12313 [5:37:54<3:33:16, 2.67s/it] {'loss': 0.4906, 'grad_norm': 6.081313481209265, 'learning_rate': 1.7348413668898124e-06, 'epoch': 0.61} 61%|██████ | 7524/12313 [5:37:54<3:33:16, 2.67s/it] 61%|██████ | 7525/12313 [5:37:56<3:34:49, 2.69s/it] {'loss': 0.5392, 'grad_norm': 4.575030134738263, 'learning_rate': 1.73421532974976e-06, 'epoch': 0.61} 61%|██████ | 7525/12313 [5:37:56<3:34:49, 2.69s/it] 61%|██████ | 7526/12313 [5:37:59<3:36:55, 2.72s/it] {'loss': 0.5111, 'grad_norm': 3.827031756918253, 'learning_rate': 1.7335893455979538e-06, 'epoch': 0.61} 61%|██████ | 7526/12313 [5:37:59<3:36:55, 2.72s/it] 61%|██████ | 7527/12313 [5:38:01<3:30:22, 2.64s/it] {'loss': 0.7858, 'grad_norm': 4.399973290074878, 'learning_rate': 1.7329634144777097e-06, 'epoch': 0.61} 61%|██████ | 7527/12313 [5:38:01<3:30:22, 2.64s/it] 61%|██████ | 7528/12313 [5:38:04<3:30:02, 2.63s/it] {'loss': 0.4, 'grad_norm': 6.275043877297012, 'learning_rate': 1.7323375364323374e-06, 'epoch': 0.61} 61%|██████ | 7528/12313 [5:38:04<3:30:02, 2.63s/it] 61%|██████ | 7529/12313 [5:38:07<3:26:56, 2.60s/it] {'loss': 0.5784, 'grad_norm': 3.7917580182798964, 'learning_rate': 1.731711711505144e-06, 'epoch': 0.61} 61%|██████ | 7529/12313 [5:38:07<3:26:56, 2.60s/it] 61%|██████ | 7530/12313 [5:38:09<3:29:36, 2.63s/it] {'loss': 0.4097, 'grad_norm': 5.449393065833403, 'learning_rate': 1.7310859397394356e-06, 'epoch': 0.61} 61%|██████ | 7530/12313 [5:38:09<3:29:36, 2.63s/it] 61%|██████ | 7531/12313 [5:38:12<3:29:57, 2.63s/it] {'loss': 0.4269, 'grad_norm': 9.087763572815122, 'learning_rate': 1.7304602211785105e-06, 'epoch': 0.61} 61%|██████ | 7531/12313 [5:38:12<3:29:57, 2.63s/it] 61%|██████ | 7532/12313 [5:38:15<3:33:52, 2.68s/it] {'loss': 0.4189, 'grad_norm': 3.9239408883458196, 'learning_rate': 1.7298345558656643e-06, 'epoch': 0.61} 61%|██████ | 7532/12313 [5:38:15<3:33:52, 2.68s/it] 61%|██████ | 7533/12313 [5:38:18<3:39:44, 2.76s/it] {'loss': 0.5171, 'grad_norm': 3.9512250145217327, 'learning_rate': 1.7292089438441912e-06, 'epoch': 0.61} 61%|██████ | 7533/12313 [5:38:18<3:39:44, 2.76s/it] 61%|██████ | 7534/12313 [5:38:20<3:39:02, 2.75s/it] {'loss': 0.5148, 'grad_norm': 5.225768936804722, 'learning_rate': 1.7285833851573802e-06, 'epoch': 0.61} 61%|██████ | 7534/12313 [5:38:20<3:39:02, 2.75s/it] 61%|██████ | 7535/12313 [5:38:23<3:37:48, 2.74s/it] {'loss': 0.6574, 'grad_norm': 8.877862953830741, 'learning_rate': 1.727957879848516e-06, 'epoch': 0.61} 61%|██████ | 7535/12313 [5:38:23<3:37:48, 2.74s/it] 61%|██████ | 7536/12313 [5:38:26<3:41:43, 2.78s/it] {'loss': 0.4649, 'grad_norm': 3.7513803990699457, 'learning_rate': 1.72733242796088e-06, 'epoch': 0.61} 61%|██████ | 7536/12313 [5:38:26<3:41:43, 2.78s/it] 61%|██████ | 7537/12313 [5:38:29<3:41:24, 2.78s/it] {'loss': 0.5629, 'grad_norm': 6.526118406530345, 'learning_rate': 1.7267070295377519e-06, 'epoch': 0.61} 61%|██████ | 7537/12313 [5:38:29<3:41:24, 2.78s/it] 61%|██████ | 7538/12313 [5:38:32<3:40:13, 2.77s/it] {'loss': 0.483, 'grad_norm': 8.83748922556791, 'learning_rate': 1.726081684622404e-06, 'epoch': 0.61} 61%|██████ | 7538/12313 [5:38:32<3:40:13, 2.77s/it] 61%|██████ | 7539/12313 [5:38:34<3:32:02, 2.67s/it] {'loss': 0.6093, 'grad_norm': 4.032292821062164, 'learning_rate': 1.7254563932581072e-06, 'epoch': 0.61} 61%|██████ | 7539/12313 [5:38:34<3:32:02, 2.67s/it] 61%|██████ | 7540/12313 [5:38:37<3:40:06, 2.77s/it] {'loss': 0.3226, 'grad_norm': 19.261864290456085, 'learning_rate': 1.7248311554881297e-06, 'epoch': 0.61} 61%|██████ | 7540/12313 [5:38:37<3:40:06, 2.77s/it] 61%|██████ | 7541/12313 [5:38:40<3:35:45, 2.71s/it] {'loss': 0.4116, 'grad_norm': 6.178514406690231, 'learning_rate': 1.7242059713557336e-06, 'epoch': 0.61} 61%|██████ | 7541/12313 [5:38:40<3:35:45, 2.71s/it] 61%|██████▏ | 7542/12313 [5:38:42<3:32:19, 2.67s/it] {'loss': 0.514, 'grad_norm': 6.272920000009786, 'learning_rate': 1.7235808409041775e-06, 'epoch': 0.61} 61%|██████▏ | 7542/12313 [5:38:42<3:32:19, 2.67s/it] 61%|██████▏ | 7543/12313 [5:38:45<3:34:35, 2.70s/it] {'loss': 0.3956, 'grad_norm': 3.700775958812784, 'learning_rate': 1.7229557641767191e-06, 'epoch': 0.61} 61%|██████▏ | 7543/12313 [5:38:45<3:34:35, 2.70s/it] 61%|██████▏ | 7544/12313 [5:38:48<3:32:40, 2.68s/it] {'loss': 0.5573, 'grad_norm': 5.184180318926046, 'learning_rate': 1.7223307412166097e-06, 'epoch': 0.61} 61%|██████▏ | 7544/12313 [5:38:48<3:32:40, 2.68s/it] 61%|██████▏ | 7545/12313 [5:38:50<3:34:39, 2.70s/it] {'loss': 0.4188, 'grad_norm': 7.17746460216716, 'learning_rate': 1.7217057720670955e-06, 'epoch': 0.61} 61%|██████▏ | 7545/12313 [5:38:50<3:34:39, 2.70s/it] 61%|██████▏ | 7546/12313 [5:38:53<3:32:27, 2.67s/it] {'loss': 0.6008, 'grad_norm': 5.19044747938764, 'learning_rate': 1.7210808567714244e-06, 'epoch': 0.61} 61%|██████▏ | 7546/12313 [5:38:53<3:32:27, 2.67s/it] 61%|██████▏ | 7547/12313 [5:38:56<3:37:28, 2.74s/it] {'loss': 0.4163, 'grad_norm': 6.637207987486905, 'learning_rate': 1.7204559953728355e-06, 'epoch': 0.61} 61%|██████▏ | 7547/12313 [5:38:56<3:37:28, 2.74s/it] 61%|██████▏ | 7548/12313 [5:38:59<3:38:14, 2.75s/it] {'loss': 0.4583, 'grad_norm': 4.123118258455451, 'learning_rate': 1.7198311879145652e-06, 'epoch': 0.61} 61%|██████▏ | 7548/12313 [5:38:59<3:38:14, 2.75s/it] 61%|██████▏ | 7549/12313 [5:39:01<3:37:49, 2.74s/it] {'loss': 0.6119, 'grad_norm': 4.634690421136184, 'learning_rate': 1.719206434439848e-06, 'epoch': 0.61} 61%|██████▏ | 7549/12313 [5:39:01<3:37:49, 2.74s/it] 61%|██████▏ | 7550/12313 [5:39:04<3:37:32, 2.74s/it] {'loss': 0.679, 'grad_norm': 5.330581154155666, 'learning_rate': 1.7185817349919137e-06, 'epoch': 0.61} 61%|██████▏ | 7550/12313 [5:39:04<3:37:32, 2.74s/it] 61%|██████▏ | 7551/12313 [5:39:07<3:31:40, 2.67s/it] {'loss': 0.5463, 'grad_norm': 4.733672202157837, 'learning_rate': 1.7179570896139869e-06, 'epoch': 0.61} 61%|██████▏ | 7551/12313 [5:39:07<3:31:40, 2.67s/it] 61%|██████▏ | 7552/12313 [5:39:09<3:31:37, 2.67s/it] {'loss': 0.5625, 'grad_norm': 3.417972262827162, 'learning_rate': 1.7173324983492912e-06, 'epoch': 0.61} 61%|██████▏ | 7552/12313 [5:39:09<3:31:37, 2.67s/it] 61%|██████▏ | 7553/12313 [5:39:12<3:34:30, 2.70s/it] {'loss': 0.541, 'grad_norm': 5.5606216442015155, 'learning_rate': 1.7167079612410448e-06, 'epoch': 0.61} 61%|██████▏ | 7553/12313 [5:39:12<3:34:30, 2.70s/it] 61%|██████▏ | 7554/12313 [5:39:15<3:32:20, 2.68s/it] {'loss': 0.5007, 'grad_norm': 7.366074745822286, 'learning_rate': 1.7160834783324608e-06, 'epoch': 0.61} 61%|██████▏ | 7554/12313 [5:39:15<3:32:20, 2.68s/it] 61%|██████▏ | 7555/12313 [5:39:17<3:28:15, 2.63s/it] {'loss': 0.4106, 'grad_norm': 4.4511321208599925, 'learning_rate': 1.7154590496667523e-06, 'epoch': 0.61} 61%|██████▏ | 7555/12313 [5:39:17<3:28:15, 2.63s/it] 61%|██████▏ | 7556/12313 [5:39:20<3:26:16, 2.60s/it] {'loss': 0.5297, 'grad_norm': 4.380414416940069, 'learning_rate': 1.7148346752871253e-06, 'epoch': 0.61} 61%|██████▏ | 7556/12313 [5:39:20<3:26:16, 2.60s/it] 61%|██████▏ | 7557/12313 [5:39:23<3:34:10, 2.70s/it] {'loss': 0.4956, 'grad_norm': 5.789276347920089, 'learning_rate': 1.7142103552367834e-06, 'epoch': 0.61} 61%|██████▏ | 7557/12313 [5:39:23<3:34:10, 2.70s/it] 61%|██████▏ | 7558/12313 [5:39:25<3:28:57, 2.64s/it] {'loss': 0.5016, 'grad_norm': 5.435165522246494, 'learning_rate': 1.713586089558925e-06, 'epoch': 0.61} 61%|██████▏ | 7558/12313 [5:39:25<3:28:57, 2.64s/it] 61%|██████▏ | 7559/12313 [5:39:28<3:30:44, 2.66s/it] {'loss': 0.4796, 'grad_norm': 3.8853370254417796, 'learning_rate': 1.7129618782967488e-06, 'epoch': 0.61} 61%|██████▏ | 7559/12313 [5:39:28<3:30:44, 2.66s/it] 61%|██████▏ | 7560/12313 [5:39:30<3:31:44, 2.67s/it] {'loss': 0.3512, 'grad_norm': 6.1117435269535765, 'learning_rate': 1.712337721493445e-06, 'epoch': 0.61} 61%|██████▏ | 7560/12313 [5:39:30<3:31:44, 2.67s/it] 61%|██████▏ | 7561/12313 [5:39:33<3:31:58, 2.68s/it] {'loss': 0.383, 'grad_norm': 5.140667780868619, 'learning_rate': 1.7117136191922013e-06, 'epoch': 0.61} 61%|██████▏ | 7561/12313 [5:39:33<3:31:58, 2.68s/it] 61%|██████▏ | 7562/12313 [5:39:36<3:32:45, 2.69s/it] {'loss': 0.4811, 'grad_norm': 5.239177233517926, 'learning_rate': 1.7110895714362035e-06, 'epoch': 0.61} 61%|██████▏ | 7562/12313 [5:39:36<3:32:45, 2.69s/it] 61%|██████▏ | 7563/12313 [5:39:38<3:30:16, 2.66s/it] {'loss': 0.5776, 'grad_norm': 5.786445992631991, 'learning_rate': 1.710465578268633e-06, 'epoch': 0.61} 61%|██████▏ | 7563/12313 [5:39:38<3:30:16, 2.66s/it] 61%|██████▏ | 7564/12313 [5:39:41<3:38:18, 2.76s/it] {'loss': 0.548, 'grad_norm': 4.264725420672594, 'learning_rate': 1.7098416397326647e-06, 'epoch': 0.61} 61%|██████▏ | 7564/12313 [5:39:41<3:38:18, 2.76s/it] 61%|██████▏ | 7565/12313 [5:39:44<3:37:15, 2.75s/it] {'loss': 0.4055, 'grad_norm': 5.098547500018314, 'learning_rate': 1.7092177558714735e-06, 'epoch': 0.61} 61%|██████▏ | 7565/12313 [5:39:44<3:37:15, 2.75s/it] 61%|██████▏ | 7566/12313 [5:39:47<3:33:25, 2.70s/it] {'loss': 0.5024, 'grad_norm': 3.972712496253238, 'learning_rate': 1.7085939267282292e-06, 'epoch': 0.61} 61%|██████▏ | 7566/12313 [5:39:47<3:33:25, 2.70s/it] 61%|██████▏ | 7567/12313 [5:39:49<3:28:37, 2.64s/it] {'loss': 0.3837, 'grad_norm': 6.4566948580335835, 'learning_rate': 1.7079701523460957e-06, 'epoch': 0.61} 61%|██████▏ | 7567/12313 [5:39:49<3:28:37, 2.64s/it] 61%|██████▏ | 7568/12313 [5:39:52<3:26:53, 2.62s/it] {'loss': 0.5221, 'grad_norm': 4.440870142659438, 'learning_rate': 1.707346432768236e-06, 'epoch': 0.61} 61%|██████▏ | 7568/12313 [5:39:52<3:26:53, 2.62s/it] 61%|██████▏ | 7569/12313 [5:39:55<3:31:51, 2.68s/it] {'loss': 0.5073, 'grad_norm': 4.191872882758366, 'learning_rate': 1.706722768037809e-06, 'epoch': 0.61} 61%|██████▏ | 7569/12313 [5:39:55<3:31:51, 2.68s/it] 61%|██████▏ | 7570/12313 [5:39:57<3:33:36, 2.70s/it] {'loss': 0.4276, 'grad_norm': 5.091945928993117, 'learning_rate': 1.7060991581979668e-06, 'epoch': 0.61} 61%|██████▏ | 7570/12313 [5:39:57<3:33:36, 2.70s/it] 61%|██████▏ | 7571/12313 [5:40:00<3:33:06, 2.70s/it] {'loss': 0.5327, 'grad_norm': 4.128925636949817, 'learning_rate': 1.7054756032918619e-06, 'epoch': 0.61} 61%|██████▏ | 7571/12313 [5:40:00<3:33:06, 2.70s/it] 61%|██████▏ | 7572/12313 [5:40:03<3:32:15, 2.69s/it] {'loss': 0.4233, 'grad_norm': 4.6497677030599265, 'learning_rate': 1.7048521033626406e-06, 'epoch': 0.61} 61%|██████▏ | 7572/12313 [5:40:03<3:32:15, 2.69s/it] 62%|██████▏ | 7573/12313 [5:40:05<3:32:06, 2.68s/it] {'loss': 0.5063, 'grad_norm': 5.259086945381799, 'learning_rate': 1.7042286584534446e-06, 'epoch': 0.62} 62%|██████▏ | 7573/12313 [5:40:05<3:32:06, 2.68s/it] 62%|██████▏ | 7574/12313 [5:40:08<3:30:04, 2.66s/it] {'loss': 0.4642, 'grad_norm': 9.344094684965398, 'learning_rate': 1.703605268607415e-06, 'epoch': 0.62} 62%|██████▏ | 7574/12313 [5:40:08<3:30:04, 2.66s/it] 62%|██████▏ | 7575/12313 [5:40:11<3:34:21, 2.71s/it] {'loss': 0.5159, 'grad_norm': 6.059064695851907, 'learning_rate': 1.7029819338676851e-06, 'epoch': 0.62} 62%|██████▏ | 7575/12313 [5:40:11<3:34:21, 2.71s/it] 62%|██████▏ | 7576/12313 [5:40:14<3:37:14, 2.75s/it] {'loss': 0.4775, 'grad_norm': 7.950438579454252, 'learning_rate': 1.702358654277388e-06, 'epoch': 0.62} 62%|██████▏ | 7576/12313 [5:40:14<3:37:14, 2.75s/it] 62%|██████▏ | 7577/12313 [5:40:16<3:36:26, 2.74s/it] {'loss': 0.4521, 'grad_norm': 5.6577857752289695, 'learning_rate': 1.7017354298796495e-06, 'epoch': 0.62} 62%|██████▏ | 7577/12313 [5:40:16<3:36:26, 2.74s/it] 62%|██████▏ | 7578/12313 [5:40:19<3:33:57, 2.71s/it] {'loss': 0.5675, 'grad_norm': 6.694027039647233, 'learning_rate': 1.701112260717595e-06, 'epoch': 0.62} 62%|██████▏ | 7578/12313 [5:40:19<3:33:57, 2.71s/it] 62%|██████▏ | 7579/12313 [5:40:22<3:31:17, 2.68s/it] {'loss': 0.4767, 'grad_norm': 7.207274495499145, 'learning_rate': 1.7004891468343445e-06, 'epoch': 0.62} 62%|██████▏ | 7579/12313 [5:40:22<3:31:17, 2.68s/it] 62%|██████▏ | 7580/12313 [5:40:24<3:25:14, 2.60s/it] {'loss': 0.4575, 'grad_norm': 10.710533616510515, 'learning_rate': 1.6998660882730127e-06, 'epoch': 0.62} 62%|██████▏ | 7580/12313 [5:40:24<3:25:14, 2.60s/it] 62%|██████▏ | 7581/12313 [5:40:27<3:22:47, 2.57s/it] {'loss': 0.4646, 'grad_norm': 9.051943184654498, 'learning_rate': 1.6992430850767133e-06, 'epoch': 0.62} 62%|██████▏ | 7581/12313 [5:40:27<3:22:47, 2.57s/it] 62%|██████▏ | 7582/12313 [5:40:29<3:24:23, 2.59s/it] {'loss': 0.5112, 'grad_norm': 6.318629144815555, 'learning_rate': 1.6986201372885551e-06, 'epoch': 0.62} 62%|██████▏ | 7582/12313 [5:40:29<3:24:23, 2.59s/it] 62%|██████▏ | 7583/12313 [5:40:32<3:23:04, 2.58s/it] {'loss': 0.5938, 'grad_norm': 9.374156721440105, 'learning_rate': 1.6979972449516414e-06, 'epoch': 0.62} 62%|██████▏ | 7583/12313 [5:40:32<3:23:04, 2.58s/it] 62%|██████▏ | 7584/12313 [5:40:34<3:24:16, 2.59s/it] {'loss': 0.5195, 'grad_norm': 5.954834061127918, 'learning_rate': 1.6973744081090737e-06, 'epoch': 0.62} 62%|██████▏ | 7584/12313 [5:40:34<3:24:16, 2.59s/it] 62%|██████▏ | 7585/12313 [5:40:37<3:28:23, 2.64s/it] {'loss': 0.5192, 'grad_norm': 4.686975998201248, 'learning_rate': 1.6967516268039502e-06, 'epoch': 0.62} 62%|██████▏ | 7585/12313 [5:40:37<3:28:23, 2.64s/it] 62%|██████▏ | 7586/12313 [5:40:40<3:31:44, 2.69s/it] {'loss': 0.5776, 'grad_norm': 5.255217548237502, 'learning_rate': 1.696128901079362e-06, 'epoch': 0.62} 62%|██████▏ | 7586/12313 [5:40:40<3:31:44, 2.69s/it] 62%|██████▏ | 7587/12313 [5:40:43<3:30:21, 2.67s/it] {'loss': 0.4449, 'grad_norm': 3.9588121129361915, 'learning_rate': 1.6955062309783993e-06, 'epoch': 0.62} 62%|██████▏ | 7587/12313 [5:40:43<3:30:21, 2.67s/it] 62%|██████▏ | 7588/12313 [5:40:45<3:27:49, 2.64s/it] {'loss': 0.6015, 'grad_norm': 2.910107855866022, 'learning_rate': 1.6948836165441487e-06, 'epoch': 0.62} 62%|██████▏ | 7588/12313 [5:40:45<3:27:49, 2.64s/it] 62%|██████▏ | 7589/12313 [5:40:48<3:34:25, 2.72s/it] {'loss': 0.4804, 'grad_norm': 4.0433138018943735, 'learning_rate': 1.6942610578196898e-06, 'epoch': 0.62} 62%|██████▏ | 7589/12313 [5:40:48<3:34:25, 2.72s/it] 62%|██████▏ | 7590/12313 [5:40:51<3:29:36, 2.66s/it] {'loss': 0.3454, 'grad_norm': 12.510674806813109, 'learning_rate': 1.6936385548481022e-06, 'epoch': 0.62} 62%|██████▏ | 7590/12313 [5:40:51<3:29:36, 2.66s/it] 62%|██████▏ | 7591/12313 [5:40:54<3:38:20, 2.77s/it] {'loss': 0.4479, 'grad_norm': 4.596018618309467, 'learning_rate': 1.6930161076724586e-06, 'epoch': 0.62} 62%|██████▏ | 7591/12313 [5:40:54<3:38:20, 2.77s/it] 62%|██████▏ | 7592/12313 [5:40:56<3:35:42, 2.74s/it] {'loss': 0.5211, 'grad_norm': 4.171966658619839, 'learning_rate': 1.69239371633583e-06, 'epoch': 0.62} 62%|██████▏ | 7592/12313 [5:40:56<3:35:42, 2.74s/it] 62%|██████▏ | 7593/12313 [5:40:59<3:33:13, 2.71s/it] {'loss': 0.4352, 'grad_norm': 8.771836978846274, 'learning_rate': 1.6917713808812808e-06, 'epoch': 0.62} 62%|██████▏ | 7593/12313 [5:40:59<3:33:13, 2.71s/it] 62%|██████▏ | 7594/12313 [5:41:02<3:34:23, 2.73s/it] {'loss': 0.5914, 'grad_norm': 4.139812780948147, 'learning_rate': 1.6911491013518752e-06, 'epoch': 0.62} 62%|██████▏ | 7594/12313 [5:41:02<3:34:23, 2.73s/it] 62%|██████▏ | 7595/12313 [5:41:04<3:33:50, 2.72s/it] {'loss': 0.5286, 'grad_norm': 5.236782993816778, 'learning_rate': 1.6905268777906713e-06, 'epoch': 0.62} 62%|██████▏ | 7595/12313 [5:41:04<3:33:50, 2.72s/it] 62%|██████▏ | 7596/12313 [5:41:07<3:35:54, 2.75s/it] {'loss': 0.3983, 'grad_norm': 5.408332820716584, 'learning_rate': 1.6899047102407228e-06, 'epoch': 0.62} 62%|██████▏ | 7596/12313 [5:41:07<3:35:54, 2.75s/it] 62%|██████▏ | 7597/12313 [5:41:10<3:38:56, 2.79s/it] {'loss': 0.5177, 'grad_norm': 3.729915257776433, 'learning_rate': 1.6892825987450811e-06, 'epoch': 0.62} 62%|██████▏ | 7597/12313 [5:41:10<3:38:56, 2.79s/it] 62%|██████▏ | 7598/12313 [5:41:13<3:38:06, 2.78s/it] {'loss': 0.5994, 'grad_norm': 4.845431694776476, 'learning_rate': 1.6886605433467937e-06, 'epoch': 0.62} 62%|██████▏ | 7598/12313 [5:41:13<3:38:06, 2.78s/it] 62%|██████▏ | 7599/12313 [5:41:16<3:36:17, 2.75s/it] {'loss': 0.4965, 'grad_norm': 4.28204183668613, 'learning_rate': 1.6880385440889016e-06, 'epoch': 0.62} 62%|██████▏ | 7599/12313 [5:41:16<3:36:17, 2.75s/it] 62%|██████▏ | 7600/12313 [5:41:19<3:41:52, 2.82s/it] {'loss': 0.5212, 'grad_norm': 3.81321254807878, 'learning_rate': 1.6874166010144454e-06, 'epoch': 0.62} 62%|██████▏ | 7600/12313 [5:41:19<3:41:52, 2.82s/it] 62%|██████▏ | 7601/12313 [5:41:21<3:34:12, 2.73s/it] {'loss': 0.6291, 'grad_norm': 4.863898183735516, 'learning_rate': 1.6867947141664606e-06, 'epoch': 0.62} 62%|██████▏ | 7601/12313 [5:41:21<3:34:12, 2.73s/it] 62%|██████▏ | 7602/12313 [5:41:24<3:32:21, 2.70s/it] {'loss': 0.5934, 'grad_norm': 3.908697856611701, 'learning_rate': 1.6861728835879764e-06, 'epoch': 0.62} 62%|██████▏ | 7602/12313 [5:41:24<3:32:21, 2.70s/it] 62%|██████▏ | 7603/12313 [5:41:26<3:34:06, 2.73s/it] {'loss': 0.4434, 'grad_norm': 4.930386518701432, 'learning_rate': 1.685551109322023e-06, 'epoch': 0.62} 62%|██████▏ | 7603/12313 [5:41:26<3:34:06, 2.73s/it] 62%|██████▏ | 7604/12313 [5:41:29<3:32:38, 2.71s/it] {'loss': 0.5494, 'grad_norm': 3.7606290186734586, 'learning_rate': 1.6849293914116215e-06, 'epoch': 0.62} 62%|██████▏ | 7604/12313 [5:41:29<3:32:38, 2.71s/it] 62%|██████▏ | 7605/12313 [5:41:32<3:30:40, 2.68s/it] {'loss': 0.5169, 'grad_norm': 4.342048831757059, 'learning_rate': 1.6843077298997924e-06, 'epoch': 0.62} 62%|██████▏ | 7605/12313 [5:41:32<3:30:40, 2.68s/it] 62%|██████▏ | 7606/12313 [5:41:34<3:30:57, 2.69s/it] {'loss': 0.5501, 'grad_norm': 5.829444571059767, 'learning_rate': 1.6836861248295522e-06, 'epoch': 0.62} 62%|██████▏ | 7606/12313 [5:41:34<3:30:57, 2.69s/it] 62%|██████▏ | 7607/12313 [5:41:38<3:40:04, 2.81s/it] {'loss': 0.4783, 'grad_norm': 4.7706445198897915, 'learning_rate': 1.6830645762439113e-06, 'epoch': 0.62} 62%|██████▏ | 7607/12313 [5:41:38<3:40:04, 2.81s/it] 62%|██████▏ | 7608/12313 [5:41:40<3:36:52, 2.77s/it] {'loss': 0.4552, 'grad_norm': 8.032579213522444, 'learning_rate': 1.6824430841858773e-06, 'epoch': 0.62} 62%|██████▏ | 7608/12313 [5:41:40<3:36:52, 2.77s/it] 62%|██████▏ | 7609/12313 [5:41:43<3:33:55, 2.73s/it] {'loss': 0.4983, 'grad_norm': 6.670715215785492, 'learning_rate': 1.6818216486984565e-06, 'epoch': 0.62} 62%|██████▏ | 7609/12313 [5:41:43<3:33:55, 2.73s/it] 62%|██████▏ | 7610/12313 [5:41:45<3:31:14, 2.70s/it] {'loss': 0.4088, 'grad_norm': 6.128265762260737, 'learning_rate': 1.6812002698246468e-06, 'epoch': 0.62} 62%|██████▏ | 7610/12313 [5:41:45<3:31:14, 2.70s/it] 62%|██████▏ | 7611/12313 [5:41:48<3:30:34, 2.69s/it] {'loss': 0.5589, 'grad_norm': 5.814013730186966, 'learning_rate': 1.6805789476074457e-06, 'epoch': 0.62} 62%|██████▏ | 7611/12313 [5:41:48<3:30:34, 2.69s/it] 62%|██████▏ | 7612/12313 [5:41:51<3:30:09, 2.68s/it] {'loss': 0.4175, 'grad_norm': 6.00192645217164, 'learning_rate': 1.6799576820898433e-06, 'epoch': 0.62} 62%|██████▏ | 7612/12313 [5:41:51<3:30:09, 2.68s/it] 62%|██████▏ | 7613/12313 [5:41:53<3:28:54, 2.67s/it] {'loss': 0.3606, 'grad_norm': 4.073976353754178, 'learning_rate': 1.6793364733148299e-06, 'epoch': 0.62} 62%|██████▏ | 7613/12313 [5:41:53<3:28:54, 2.67s/it] 62%|██████▏ | 7614/12313 [5:41:56<3:30:37, 2.69s/it] {'loss': 0.6092, 'grad_norm': 4.36296820738616, 'learning_rate': 1.67871532132539e-06, 'epoch': 0.62} 62%|██████▏ | 7614/12313 [5:41:56<3:30:37, 2.69s/it] 62%|██████▏ | 7615/12313 [5:41:59<3:27:22, 2.65s/it] {'loss': 0.4535, 'grad_norm': 6.684933223291455, 'learning_rate': 1.6780942261645022e-06, 'epoch': 0.62} 62%|██████▏ | 7615/12313 [5:41:59<3:27:22, 2.65s/it] 62%|██████▏ | 7616/12313 [5:42:02<3:39:52, 2.81s/it] {'loss': 0.5198, 'grad_norm': 4.124907550937014, 'learning_rate': 1.6774731878751443e-06, 'epoch': 0.62} 62%|██████▏ | 7616/12313 [5:42:02<3:39:52, 2.81s/it] 62%|██████▏ | 7617/12313 [5:42:05<3:37:18, 2.78s/it] {'loss': 0.6509, 'grad_norm': 2.736304544029034, 'learning_rate': 1.6768522065002895e-06, 'epoch': 0.62} 62%|██████▏ | 7617/12313 [5:42:05<3:37:18, 2.78s/it] 62%|██████▏ | 7618/12313 [5:42:07<3:34:49, 2.75s/it] {'loss': 0.5161, 'grad_norm': 3.054312384173202, 'learning_rate': 1.676231282082904e-06, 'epoch': 0.62} 62%|██████▏ | 7618/12313 [5:42:07<3:34:49, 2.75s/it] 62%|██████▏ | 7619/12313 [5:42:10<3:37:31, 2.78s/it] {'loss': 0.394, 'grad_norm': 8.402180136218686, 'learning_rate': 1.6756104146659557e-06, 'epoch': 0.62} 62%|██████▏ | 7619/12313 [5:42:10<3:37:31, 2.78s/it] 62%|██████▏ | 7620/12313 [5:42:13<3:41:24, 2.83s/it] {'loss': 0.6243, 'grad_norm': 4.6791235959713315, 'learning_rate': 1.674989604292403e-06, 'epoch': 0.62} 62%|██████▏ | 7620/12313 [5:42:13<3:41:24, 2.83s/it] 62%|██████▏ | 7621/12313 [5:42:16<3:36:06, 2.76s/it] {'loss': 0.4754, 'grad_norm': 9.604477306580446, 'learning_rate': 1.6743688510052025e-06, 'epoch': 0.62} 62%|██████▏ | 7621/12313 [5:42:16<3:36:06, 2.76s/it] 62%|██████▏ | 7622/12313 [5:42:19<3:38:17, 2.79s/it] {'loss': 0.4893, 'grad_norm': 9.89281790814999, 'learning_rate': 1.6737481548473094e-06, 'epoch': 0.62} 62%|██████▏ | 7622/12313 [5:42:19<3:38:17, 2.79s/it] 62%|██████▏ | 7623/12313 [5:42:21<3:29:29, 2.68s/it] {'loss': 0.457, 'grad_norm': 3.3991457520981383, 'learning_rate': 1.6731275158616706e-06, 'epoch': 0.62} 62%|██████▏ | 7623/12313 [5:42:21<3:29:29, 2.68s/it] 62%|██████▏ | 7624/12313 [5:42:24<3:26:22, 2.64s/it] {'loss': 0.6278, 'grad_norm': 2.80593016197633, 'learning_rate': 1.6725069340912306e-06, 'epoch': 0.62} 62%|██████▏ | 7624/12313 [5:42:24<3:26:22, 2.64s/it] 62%|██████▏ | 7625/12313 [5:42:26<3:23:22, 2.60s/it] {'loss': 0.4187, 'grad_norm': 5.418301042196, 'learning_rate': 1.6718864095789328e-06, 'epoch': 0.62} 62%|██████▏ | 7625/12313 [5:42:26<3:23:22, 2.60s/it] 62%|██████▏ | 7626/12313 [5:42:29<3:23:20, 2.60s/it] {'loss': 0.4663, 'grad_norm': 13.06513067077772, 'learning_rate': 1.671265942367712e-06, 'epoch': 0.62} 62%|██████▏ | 7626/12313 [5:42:29<3:23:20, 2.60s/it] 62%|██████▏ | 7627/12313 [5:42:31<3:27:37, 2.66s/it] {'loss': 0.4295, 'grad_norm': 4.625063932915541, 'learning_rate': 1.6706455325005022e-06, 'epoch': 0.62} 62%|██████▏ | 7627/12313 [5:42:31<3:27:37, 2.66s/it] 62%|██████▏ | 7628/12313 [5:42:34<3:25:48, 2.64s/it] {'loss': 0.4579, 'grad_norm': 3.5014064055098175, 'learning_rate': 1.6700251800202316e-06, 'epoch': 0.62} 62%|██████▏ | 7628/12313 [5:42:34<3:25:48, 2.64s/it] 62%|██████▏ | 7629/12313 [5:42:37<3:24:14, 2.62s/it] {'loss': 0.6269, 'grad_norm': 6.56770323902514, 'learning_rate': 1.6694048849698262e-06, 'epoch': 0.62} 62%|██████▏ | 7629/12313 [5:42:37<3:24:14, 2.62s/it] 62%|██████▏ | 7630/12313 [5:42:39<3:25:52, 2.64s/it] {'loss': 0.6907, 'grad_norm': 4.07016650700423, 'learning_rate': 1.668784647392208e-06, 'epoch': 0.62} 62%|██████▏ | 7630/12313 [5:42:39<3:25:52, 2.64s/it] 62%|██████▏ | 7631/12313 [5:42:42<3:27:50, 2.66s/it] {'loss': 0.5111, 'grad_norm': 5.113480533421879, 'learning_rate': 1.6681644673302915e-06, 'epoch': 0.62} 62%|██████▏ | 7631/12313 [5:42:42<3:27:50, 2.66s/it] 62%|██████▏ | 7632/12313 [5:42:45<3:29:45, 2.69s/it] {'loss': 0.5239, 'grad_norm': 4.254910794569682, 'learning_rate': 1.6675443448269924e-06, 'epoch': 0.62} 62%|██████▏ | 7632/12313 [5:42:45<3:29:45, 2.69s/it] 62%|██████▏ | 7633/12313 [5:42:47<3:27:22, 2.66s/it] {'loss': 0.5857, 'grad_norm': 5.408424083847769, 'learning_rate': 1.666924279925219e-06, 'epoch': 0.62} 62%|██████▏ | 7633/12313 [5:42:47<3:27:22, 2.66s/it] 62%|██████▏ | 7634/12313 [5:42:50<3:32:58, 2.73s/it] {'loss': 0.5784, 'grad_norm': 3.2011054128203766, 'learning_rate': 1.6663042726678752e-06, 'epoch': 0.62} 62%|██████▏ | 7634/12313 [5:42:50<3:32:58, 2.73s/it] 62%|██████▏ | 7635/12313 [5:42:53<3:31:46, 2.72s/it] {'loss': 0.3975, 'grad_norm': 5.235602080991919, 'learning_rate': 1.6656843230978647e-06, 'epoch': 0.62} 62%|██████▏ | 7635/12313 [5:42:53<3:31:46, 2.72s/it] 62%|██████▏ | 7636/12313 [5:42:56<3:29:17, 2.68s/it] {'loss': 0.5924, 'grad_norm': 5.165250298024449, 'learning_rate': 1.6650644312580833e-06, 'epoch': 0.62} 62%|██████▏ | 7636/12313 [5:42:56<3:29:17, 2.68s/it] 62%|██████▏ | 7637/12313 [5:42:58<3:29:19, 2.69s/it] {'loss': 0.5221, 'grad_norm': 4.340672727279216, 'learning_rate': 1.6644445971914235e-06, 'epoch': 0.62} 62%|██████▏ | 7637/12313 [5:42:58<3:29:19, 2.69s/it] 62%|██████▏ | 7638/12313 [5:43:01<3:29:44, 2.69s/it] {'loss': 0.443, 'grad_norm': 6.841912089854011, 'learning_rate': 1.6638248209407767e-06, 'epoch': 0.62} 62%|██████▏ | 7638/12313 [5:43:01<3:29:44, 2.69s/it] 62%|██████▏ | 7639/12313 [5:43:04<3:30:27, 2.70s/it] {'loss': 0.3495, 'grad_norm': 4.081953440344846, 'learning_rate': 1.6632051025490265e-06, 'epoch': 0.62} 62%|██████▏ | 7639/12313 [5:43:04<3:30:27, 2.70s/it] 62%|██████▏ | 7640/12313 [5:43:06<3:28:54, 2.68s/it] {'loss': 0.5034, 'grad_norm': 3.7772718364769142, 'learning_rate': 1.6625854420590538e-06, 'epoch': 0.62} 62%|██████▏ | 7640/12313 [5:43:06<3:28:54, 2.68s/it] 62%|██████▏ | 7641/12313 [5:43:09<3:24:10, 2.62s/it] {'loss': 0.6397, 'grad_norm': 5.210380997814291, 'learning_rate': 1.6619658395137375e-06, 'epoch': 0.62} 62%|██████▏ | 7641/12313 [5:43:09<3:24:10, 2.62s/it] 62%|██████▏ | 7642/12313 [5:43:11<3:25:08, 2.64s/it] {'loss': 0.4802, 'grad_norm': 6.577887175557655, 'learning_rate': 1.6613462949559494e-06, 'epoch': 0.62} 62%|██████▏ | 7642/12313 [5:43:11<3:25:08, 2.64s/it] 62%|██████▏ | 7643/12313 [5:43:14<3:27:12, 2.66s/it] {'loss': 0.7127, 'grad_norm': 3.7841172613723657, 'learning_rate': 1.6607268084285587e-06, 'epoch': 0.62} 62%|██████▏ | 7643/12313 [5:43:14<3:27:12, 2.66s/it] 62%|██████▏ | 7644/12313 [5:43:17<3:24:00, 2.62s/it] {'loss': 0.4822, 'grad_norm': 8.73460364920805, 'learning_rate': 1.6601073799744322e-06, 'epoch': 0.62} 62%|██████▏ | 7644/12313 [5:43:17<3:24:00, 2.62s/it] 62%|██████▏ | 7645/12313 [5:43:19<3:25:11, 2.64s/it] {'loss': 0.5566, 'grad_norm': 5.904973312911216, 'learning_rate': 1.6594880096364302e-06, 'epoch': 0.62} 62%|██████▏ | 7645/12313 [5:43:19<3:25:11, 2.64s/it] 62%|██████▏ | 7646/12313 [5:43:22<3:34:16, 2.75s/it] {'loss': 0.3796, 'grad_norm': 4.941626996148981, 'learning_rate': 1.6588686974574086e-06, 'epoch': 0.62} 62%|██████▏ | 7646/12313 [5:43:22<3:34:16, 2.75s/it] 62%|██████▏ | 7647/12313 [5:43:25<3:34:42, 2.76s/it] {'loss': 0.4666, 'grad_norm': 5.8291085465722325, 'learning_rate': 1.658249443480221e-06, 'epoch': 0.62} 62%|██████▏ | 7647/12313 [5:43:25<3:34:42, 2.76s/it] 62%|██████▏ | 7648/12313 [5:43:28<3:27:46, 2.67s/it] {'loss': 0.4453, 'grad_norm': 5.909142689212799, 'learning_rate': 1.6576302477477185e-06, 'epoch': 0.62} 62%|██████▏ | 7648/12313 [5:43:28<3:27:46, 2.67s/it] 62%|██████▏ | 7649/12313 [5:43:30<3:22:01, 2.60s/it] {'loss': 0.3766, 'grad_norm': 5.951228309961236, 'learning_rate': 1.6570111103027436e-06, 'epoch': 0.62} 62%|██████▏ | 7649/12313 [5:43:30<3:22:01, 2.60s/it] 62%|██████▏ | 7650/12313 [5:43:33<3:31:36, 2.72s/it] {'loss': 0.3992, 'grad_norm': 6.681405040737811, 'learning_rate': 1.6563920311881382e-06, 'epoch': 0.62} 62%|██████▏ | 7650/12313 [5:43:33<3:31:36, 2.72s/it] 62%|██████▏ | 7651/12313 [5:43:36<3:28:51, 2.69s/it] {'loss': 0.4259, 'grad_norm': 5.5260582323770135, 'learning_rate': 1.6557730104467407e-06, 'epoch': 0.62} 62%|██████▏ | 7651/12313 [5:43:36<3:28:51, 2.69s/it] 62%|██████▏ | 7652/12313 [5:43:38<3:27:42, 2.67s/it] {'loss': 0.5468, 'grad_norm': 5.076655366887786, 'learning_rate': 1.6551540481213817e-06, 'epoch': 0.62} 62%|██████▏ | 7652/12313 [5:43:38<3:27:42, 2.67s/it] 62%|██████▏ | 7653/12313 [5:43:41<3:24:31, 2.63s/it] {'loss': 0.4871, 'grad_norm': 5.8919929585227155, 'learning_rate': 1.6545351442548915e-06, 'epoch': 0.62} 62%|██████▏ | 7653/12313 [5:43:41<3:24:31, 2.63s/it] 62%|██████▏ | 7654/12313 [5:43:44<3:29:01, 2.69s/it] {'loss': 0.4153, 'grad_norm': 4.042380796567633, 'learning_rate': 1.6539162988900952e-06, 'epoch': 0.62} 62%|██████▏ | 7654/12313 [5:43:44<3:29:01, 2.69s/it] 62%|██████▏ | 7655/12313 [5:43:46<3:27:45, 2.68s/it] {'loss': 0.4534, 'grad_norm': 6.784897682604056, 'learning_rate': 1.6532975120698133e-06, 'epoch': 0.62} 62%|██████▏ | 7655/12313 [5:43:46<3:27:45, 2.68s/it] 62%|██████▏ | 7656/12313 [5:43:50<3:43:54, 2.88s/it] {'loss': 0.4155, 'grad_norm': 9.65234999480218, 'learning_rate': 1.6526787838368616e-06, 'epoch': 0.62} 62%|██████▏ | 7656/12313 [5:43:50<3:43:54, 2.88s/it] 62%|██████▏ | 7657/12313 [5:43:52<3:37:06, 2.80s/it] {'loss': 0.5921, 'grad_norm': 17.08200480716379, 'learning_rate': 1.6520601142340549e-06, 'epoch': 0.62} 62%|██████▏ | 7657/12313 [5:43:52<3:37:06, 2.80s/it] 62%|██████▏ | 7658/12313 [5:43:55<3:36:30, 2.79s/it] {'loss': 0.4526, 'grad_norm': 5.175377372780213, 'learning_rate': 1.6514415033041997e-06, 'epoch': 0.62} 62%|██████▏ | 7658/12313 [5:43:55<3:36:30, 2.79s/it] 62%|██████▏ | 7659/12313 [5:43:58<3:41:24, 2.85s/it] {'loss': 0.4555, 'grad_norm': 3.458058541785411, 'learning_rate': 1.6508229510901013e-06, 'epoch': 0.62} 62%|██████▏ | 7659/12313 [5:43:58<3:41:24, 2.85s/it] 62%|██████▏ | 7660/12313 [5:44:01<3:34:49, 2.77s/it] {'loss': 0.452, 'grad_norm': 10.604824315531577, 'learning_rate': 1.6502044576345614e-06, 'epoch': 0.62} 62%|██████▏ | 7660/12313 [5:44:01<3:34:49, 2.77s/it] 62%|██████▏ | 7661/12313 [5:44:03<3:29:58, 2.71s/it] {'loss': 0.6222, 'grad_norm': 5.411425272158323, 'learning_rate': 1.6495860229803756e-06, 'epoch': 0.62} 62%|██████▏ | 7661/12313 [5:44:03<3:29:58, 2.71s/it] 62%|██████▏ | 7662/12313 [5:44:06<3:28:29, 2.69s/it] {'loss': 0.4145, 'grad_norm': 5.1740005505908995, 'learning_rate': 1.6489676471703352e-06, 'epoch': 0.62} 62%|██████▏ | 7662/12313 [5:44:06<3:28:29, 2.69s/it] 62%|██████▏ | 7663/12313 [5:44:08<3:22:39, 2.61s/it] {'loss': 0.513, 'grad_norm': 8.085499523946917, 'learning_rate': 1.6483493302472302e-06, 'epoch': 0.62} 62%|██████▏ | 7663/12313 [5:44:08<3:22:39, 2.61s/it] 62%|██████▏ | 7664/12313 [5:44:11<3:19:22, 2.57s/it] {'loss': 0.6126, 'grad_norm': 10.292791376029198, 'learning_rate': 1.6477310722538447e-06, 'epoch': 0.62} 62%|██████▏ | 7664/12313 [5:44:11<3:19:22, 2.57s/it] 62%|██████▏ | 7665/12313 [5:44:13<3:19:23, 2.57s/it] {'loss': 0.4292, 'grad_norm': 6.0268201310361995, 'learning_rate': 1.6471128732329579e-06, 'epoch': 0.62} 62%|██████▏ | 7665/12313 [5:44:13<3:19:23, 2.57s/it] 62%|██████▏ | 7666/12313 [5:44:16<3:26:12, 2.66s/it] {'loss': 0.6152, 'grad_norm': 5.022456342536074, 'learning_rate': 1.6464947332273459e-06, 'epoch': 0.62} 62%|██████▏ | 7666/12313 [5:44:16<3:26:12, 2.66s/it] 62%|██████▏ | 7667/12313 [5:44:19<3:26:00, 2.66s/it] {'loss': 0.6125, 'grad_norm': 8.543190424818972, 'learning_rate': 1.6458766522797822e-06, 'epoch': 0.62} 62%|██████▏ | 7667/12313 [5:44:19<3:26:00, 2.66s/it] 62%|██████▏ | 7668/12313 [5:44:22<3:25:54, 2.66s/it] {'loss': 0.462, 'grad_norm': 4.1156125031275135, 'learning_rate': 1.6452586304330333e-06, 'epoch': 0.62} 62%|██████▏ | 7668/12313 [5:44:22<3:25:54, 2.66s/it] 62%|██████▏ | 7669/12313 [5:44:24<3:22:05, 2.61s/it] {'loss': 0.4678, 'grad_norm': 7.81665689239009, 'learning_rate': 1.6446406677298632e-06, 'epoch': 0.62} 62%|██████▏ | 7669/12313 [5:44:24<3:22:05, 2.61s/it] 62%|██████▏ | 7670/12313 [5:44:27<3:21:41, 2.61s/it] {'loss': 0.4414, 'grad_norm': 6.949544264134675, 'learning_rate': 1.644022764213033e-06, 'epoch': 0.62} 62%|██████▏ | 7670/12313 [5:44:27<3:21:41, 2.61s/it] 62%|██████▏ | 7671/12313 [5:44:29<3:23:20, 2.63s/it] {'loss': 0.4455, 'grad_norm': 8.196141138164219, 'learning_rate': 1.6434049199252966e-06, 'epoch': 0.62} 62%|██████▏ | 7671/12313 [5:44:29<3:23:20, 2.63s/it] 62%|██████▏ | 7672/12313 [5:44:32<3:22:18, 2.62s/it] {'loss': 0.3828, 'grad_norm': 6.110724030611538, 'learning_rate': 1.6427871349094058e-06, 'epoch': 0.62} 62%|██████▏ | 7672/12313 [5:44:32<3:22:18, 2.62s/it] 62%|██████▏ | 7673/12313 [5:44:34<3:18:27, 2.57s/it] {'loss': 0.5077, 'grad_norm': 4.134192350462065, 'learning_rate': 1.6421694092081097e-06, 'epoch': 0.62} 62%|██████▏ | 7673/12313 [5:44:34<3:18:27, 2.57s/it] 62%|██████▏ | 7674/12313 [5:44:37<3:22:49, 2.62s/it] {'loss': 0.6757, 'grad_norm': 4.263913374724854, 'learning_rate': 1.6415517428641504e-06, 'epoch': 0.62} 62%|██████▏ | 7674/12313 [5:44:37<3:22:49, 2.62s/it] 62%|██████▏ | 7675/12313 [5:44:40<3:29:45, 2.71s/it] {'loss': 0.485, 'grad_norm': 6.728733308443024, 'learning_rate': 1.640934135920266e-06, 'epoch': 0.62} 62%|██████▏ | 7675/12313 [5:44:40<3:29:45, 2.71s/it] 62%|██████▏ | 7676/12313 [5:44:44<3:49:08, 2.96s/it] {'loss': 0.4495, 'grad_norm': 6.332399518498263, 'learning_rate': 1.6403165884191935e-06, 'epoch': 0.62} 62%|██████▏ | 7676/12313 [5:44:44<3:49:08, 2.96s/it] 62%|██████▏ | 7677/12313 [5:44:46<3:43:32, 2.89s/it] {'loss': 0.4718, 'grad_norm': 5.132785407111167, 'learning_rate': 1.6396991004036638e-06, 'epoch': 0.62} 62%|██████▏ | 7677/12313 [5:44:46<3:43:32, 2.89s/it] 62%|██████▏ | 7678/12313 [5:44:49<3:38:29, 2.83s/it] {'loss': 0.5763, 'grad_norm': 4.651566891974008, 'learning_rate': 1.6390816719164022e-06, 'epoch': 0.62} 62%|██████▏ | 7678/12313 [5:44:49<3:38:29, 2.83s/it] 62%|██████▏ | 7679/12313 [5:44:52<3:36:14, 2.80s/it] {'loss': 0.722, 'grad_norm': 4.413484601105079, 'learning_rate': 1.6384643030001333e-06, 'epoch': 0.62} 62%|██████▏ | 7679/12313 [5:44:52<3:36:14, 2.80s/it] 62%|██████▏ | 7680/12313 [5:44:54<3:30:59, 2.73s/it] {'loss': 0.5068, 'grad_norm': 8.71808254189496, 'learning_rate': 1.6378469936975752e-06, 'epoch': 0.62} 62%|██████▏ | 7680/12313 [5:44:54<3:30:59, 2.73s/it] 62%|██████▏ | 7681/12313 [5:44:57<3:30:40, 2.73s/it] {'loss': 0.4299, 'grad_norm': 6.479840818255055, 'learning_rate': 1.6372297440514417e-06, 'epoch': 0.62} 62%|██████▏ | 7681/12313 [5:44:57<3:30:40, 2.73s/it] 62%|██████▏ | 7682/12313 [5:45:00<3:31:39, 2.74s/it] {'loss': 0.545, 'grad_norm': 3.683453972916311, 'learning_rate': 1.6366125541044435e-06, 'epoch': 0.62} 62%|██████▏ | 7682/12313 [5:45:00<3:31:39, 2.74s/it] 62%|██████▏ | 7683/12313 [5:45:02<3:29:27, 2.71s/it] {'loss': 0.3748, 'grad_norm': 9.69410298079771, 'learning_rate': 1.6359954238992882e-06, 'epoch': 0.62} 62%|██████▏ | 7683/12313 [5:45:02<3:29:27, 2.71s/it] 62%|██████▏ | 7684/12313 [5:45:05<3:29:52, 2.72s/it] {'loss': 0.4015, 'grad_norm': 4.97119687493066, 'learning_rate': 1.6353783534786763e-06, 'epoch': 0.62} 62%|██████▏ | 7684/12313 [5:45:05<3:29:52, 2.72s/it] 62%|██████▏ | 7685/12313 [5:45:08<3:30:31, 2.73s/it] {'loss': 0.4756, 'grad_norm': 7.7545114750163, 'learning_rate': 1.6347613428853059e-06, 'epoch': 0.62} 62%|██████▏ | 7685/12313 [5:45:08<3:30:31, 2.73s/it] 62%|██████▏ | 7686/12313 [5:45:11<3:27:55, 2.70s/it] {'loss': 0.5798, 'grad_norm': 3.5468434651588976, 'learning_rate': 1.634144392161872e-06, 'epoch': 0.62} 62%|██████▏ | 7686/12313 [5:45:11<3:27:55, 2.70s/it] 62%|██████▏ | 7687/12313 [5:45:13<3:24:22, 2.65s/it] {'loss': 0.5285, 'grad_norm': 4.706606608660131, 'learning_rate': 1.6335275013510638e-06, 'epoch': 0.62} 62%|██████▏ | 7687/12313 [5:45:13<3:24:22, 2.65s/it] 62%|██████▏ | 7688/12313 [5:45:16<3:24:04, 2.65s/it] {'loss': 0.3421, 'grad_norm': 6.147626373688174, 'learning_rate': 1.632910670495566e-06, 'epoch': 0.62} 62%|██████▏ | 7688/12313 [5:45:16<3:24:04, 2.65s/it] 62%|██████▏ | 7689/12313 [5:45:18<3:21:25, 2.61s/it] {'loss': 0.46, 'grad_norm': 15.303934361158852, 'learning_rate': 1.6322938996380617e-06, 'epoch': 0.62} 62%|██████▏ | 7689/12313 [5:45:18<3:21:25, 2.61s/it] 62%|██████▏ | 7690/12313 [5:45:21<3:17:33, 2.56s/it] {'loss': 0.4245, 'grad_norm': 14.350203161237948, 'learning_rate': 1.6316771888212275e-06, 'epoch': 0.62} 62%|██████▏ | 7690/12313 [5:45:21<3:17:33, 2.56s/it] 62%|██████▏ | 7691/12313 [5:45:24<3:28:01, 2.70s/it] {'loss': 0.566, 'grad_norm': 4.190438584869176, 'learning_rate': 1.631060538087735e-06, 'epoch': 0.62} 62%|██████▏ | 7691/12313 [5:45:24<3:28:01, 2.70s/it] 62%|██████▏ | 7692/12313 [5:45:26<3:27:09, 2.69s/it] {'loss': 0.4528, 'grad_norm': 3.82815155289053, 'learning_rate': 1.6304439474802554e-06, 'epoch': 0.62} 62%|██████▏ | 7692/12313 [5:45:26<3:27:09, 2.69s/it] 62%|██████▏ | 7693/12313 [5:45:29<3:26:44, 2.68s/it] {'loss': 0.5233, 'grad_norm': 6.351971066286823, 'learning_rate': 1.6298274170414524e-06, 'epoch': 0.62} 62%|██████▏ | 7693/12313 [5:45:29<3:26:44, 2.68s/it] 62%|██████▏ | 7694/12313 [5:45:32<3:30:15, 2.73s/it] {'loss': 0.4774, 'grad_norm': 4.36724324752686, 'learning_rate': 1.6292109468139863e-06, 'epoch': 0.62} 62%|██████▏ | 7694/12313 [5:45:32<3:30:15, 2.73s/it] 62%|██████▏ | 7695/12313 [5:45:34<3:24:27, 2.66s/it] {'loss': 0.5333, 'grad_norm': 6.655112537446748, 'learning_rate': 1.6285945368405146e-06, 'epoch': 0.62} 62%|██████▏ | 7695/12313 [5:45:34<3:24:27, 2.66s/it] 63%|██████▎ | 7696/12313 [5:45:37<3:22:34, 2.63s/it] {'loss': 0.4432, 'grad_norm': 6.018071010184688, 'learning_rate': 1.6279781871636896e-06, 'epoch': 0.63} 63%|██████▎ | 7696/12313 [5:45:37<3:22:34, 2.63s/it] 63%|██████▎ | 7697/12313 [5:45:40<3:32:09, 2.76s/it] {'loss': 0.5138, 'grad_norm': 5.306711764250664, 'learning_rate': 1.6273618978261576e-06, 'epoch': 0.63} 63%|██████▎ | 7697/12313 [5:45:40<3:32:09, 2.76s/it] 63%|██████▎ | 7698/12313 [5:45:43<3:27:10, 2.69s/it] {'loss': 0.5547, 'grad_norm': 3.9883932484916325, 'learning_rate': 1.6267456688705647e-06, 'epoch': 0.63} 63%|██████▎ | 7698/12313 [5:45:43<3:27:10, 2.69s/it] 63%|██████▎ | 7699/12313 [5:45:45<3:26:29, 2.69s/it] {'loss': 0.4235, 'grad_norm': 6.6197356790826625, 'learning_rate': 1.6261295003395506e-06, 'epoch': 0.63} 63%|██████▎ | 7699/12313 [5:45:45<3:26:29, 2.69s/it] 63%|██████▎ | 7700/12313 [5:45:48<3:34:03, 2.78s/it] {'loss': 0.4933, 'grad_norm': 3.6076841841661667, 'learning_rate': 1.6255133922757493e-06, 'epoch': 0.63} 63%|██████▎ | 7700/12313 [5:45:48<3:34:03, 2.78s/it] 63%|██████▎ | 7701/12313 [5:45:51<3:29:39, 2.73s/it] {'loss': 0.5483, 'grad_norm': 4.929421788606065, 'learning_rate': 1.6248973447217926e-06, 'epoch': 0.63} 63%|██████▎ | 7701/12313 [5:45:51<3:29:39, 2.73s/it] 63%|██████▎ | 7702/12313 [5:45:54<3:28:47, 2.72s/it] {'loss': 0.4914, 'grad_norm': 4.242914786049469, 'learning_rate': 1.6242813577203093e-06, 'epoch': 0.63} 63%|██████▎ | 7702/12313 [5:45:54<3:28:47, 2.72s/it] 63%|██████▎ | 7703/12313 [5:45:56<3:25:16, 2.67s/it] {'loss': 0.428, 'grad_norm': 4.388993067736639, 'learning_rate': 1.6236654313139213e-06, 'epoch': 0.63} 63%|██████▎ | 7703/12313 [5:45:56<3:25:16, 2.67s/it] 63%|██████▎ | 7704/12313 [5:45:59<3:20:10, 2.61s/it] {'loss': 0.5199, 'grad_norm': 6.684554645774749, 'learning_rate': 1.6230495655452466e-06, 'epoch': 0.63} 63%|██████▎ | 7704/12313 [5:45:59<3:20:10, 2.61s/it] 63%|██████▎ | 7705/12313 [5:46:01<3:21:26, 2.62s/it] {'loss': 0.4485, 'grad_norm': 4.683366955828708, 'learning_rate': 1.6224337604569012e-06, 'epoch': 0.63} 63%|██████▎ | 7705/12313 [5:46:01<3:21:26, 2.62s/it] 63%|██████▎ | 7706/12313 [5:46:04<3:23:15, 2.65s/it] {'loss': 0.4682, 'grad_norm': 4.786816490822921, 'learning_rate': 1.6218180160914959e-06, 'epoch': 0.63} 63%|██████▎ | 7706/12313 [5:46:04<3:23:15, 2.65s/it] 63%|██████▎ | 7707/12313 [5:46:07<3:26:59, 2.70s/it] {'loss': 0.423, 'grad_norm': 5.411731210706411, 'learning_rate': 1.6212023324916349e-06, 'epoch': 0.63} 63%|██████▎ | 7707/12313 [5:46:07<3:26:59, 2.70s/it] 63%|██████▎ | 7708/12313 [5:46:09<3:25:56, 2.68s/it] {'loss': 0.7241, 'grad_norm': 4.143891933212883, 'learning_rate': 1.620586709699922e-06, 'epoch': 0.63} 63%|██████▎ | 7708/12313 [5:46:09<3:25:56, 2.68s/it] 63%|██████▎ | 7709/12313 [5:46:13<3:37:59, 2.84s/it] {'loss': 0.4617, 'grad_norm': 5.695424075779182, 'learning_rate': 1.6199711477589553e-06, 'epoch': 0.63} 63%|██████▎ | 7709/12313 [5:46:13<3:37:59, 2.84s/it] 63%|██████▎ | 7710/12313 [5:46:15<3:30:49, 2.75s/it] {'loss': 0.3636, 'grad_norm': 7.132319497435146, 'learning_rate': 1.6193556467113264e-06, 'epoch': 0.63} 63%|██████▎ | 7710/12313 [5:46:15<3:30:49, 2.75s/it] 63%|██████▎ | 7711/12313 [5:46:18<3:28:11, 2.71s/it] {'loss': 0.4507, 'grad_norm': 7.5139687849034225, 'learning_rate': 1.6187402065996267e-06, 'epoch': 0.63} 63%|██████▎ | 7711/12313 [5:46:18<3:28:11, 2.71s/it] 63%|██████▎ | 7712/12313 [5:46:20<3:26:16, 2.69s/it] {'loss': 0.5443, 'grad_norm': 3.7215878058767675, 'learning_rate': 1.6181248274664413e-06, 'epoch': 0.63} 63%|██████▎ | 7712/12313 [5:46:20<3:26:16, 2.69s/it] 63%|██████▎ | 7713/12313 [5:46:23<3:33:07, 2.78s/it] {'loss': 0.4875, 'grad_norm': 3.861480100387473, 'learning_rate': 1.617509509354349e-06, 'epoch': 0.63} 63%|██████▎ | 7713/12313 [5:46:23<3:33:07, 2.78s/it] 63%|██████▎ | 7714/12313 [5:46:26<3:31:20, 2.76s/it] {'loss': 0.5528, 'grad_norm': 5.924413160412586, 'learning_rate': 1.616894252305929e-06, 'epoch': 0.63} 63%|██████▎ | 7714/12313 [5:46:26<3:31:20, 2.76s/it] 63%|██████▎ | 7715/12313 [5:46:29<3:28:45, 2.72s/it] {'loss': 0.4045, 'grad_norm': 6.36026377881963, 'learning_rate': 1.6162790563637538e-06, 'epoch': 0.63} 63%|██████▎ | 7715/12313 [5:46:29<3:28:45, 2.72s/it] 63%|██████▎ | 7716/12313 [5:46:31<3:21:49, 2.63s/it] {'loss': 0.4531, 'grad_norm': 4.4919168460790475, 'learning_rate': 1.6156639215703896e-06, 'epoch': 0.63} 63%|██████▎ | 7716/12313 [5:46:31<3:21:49, 2.63s/it] 63%|██████▎ | 7717/12313 [5:46:34<3:20:55, 2.62s/it] {'loss': 0.4291, 'grad_norm': 3.3752966345250583, 'learning_rate': 1.6150488479684022e-06, 'epoch': 0.63} 63%|██████▎ | 7717/12313 [5:46:34<3:20:55, 2.62s/it] 63%|██████▎ | 7718/12313 [5:46:36<3:20:51, 2.62s/it] {'loss': 0.5847, 'grad_norm': 4.683411418815497, 'learning_rate': 1.6144338356003513e-06, 'epoch': 0.63} 63%|██████▎ | 7718/12313 [5:46:36<3:20:51, 2.62s/it] 63%|██████▎ | 7719/12313 [5:46:39<3:18:51, 2.60s/it] {'loss': 0.4307, 'grad_norm': 6.26290414561336, 'learning_rate': 1.6138188845087926e-06, 'epoch': 0.63} 63%|██████▎ | 7719/12313 [5:46:39<3:18:51, 2.60s/it] 63%|██████▎ | 7720/12313 [5:46:42<3:24:15, 2.67s/it] {'loss': 0.6484, 'grad_norm': 7.127187742803709, 'learning_rate': 1.613203994736276e-06, 'epoch': 0.63} 63%|██████▎ | 7720/12313 [5:46:42<3:24:15, 2.67s/it] 63%|██████▎ | 7721/12313 [5:46:44<3:19:06, 2.60s/it] {'loss': 0.4218, 'grad_norm': 4.465338617099092, 'learning_rate': 1.61258916632535e-06, 'epoch': 0.63} 63%|██████▎ | 7721/12313 [5:46:44<3:19:06, 2.60s/it] 63%|██████▎ | 7722/12313 [5:46:47<3:23:32, 2.66s/it] {'loss': 0.5503, 'grad_norm': 6.635578415699874, 'learning_rate': 1.6119743993185574e-06, 'epoch': 0.63} 63%|██████▎ | 7722/12313 [5:46:47<3:23:32, 2.66s/it] 63%|██████▎ | 7723/12313 [5:46:50<3:24:16, 2.67s/it] {'loss': 0.4743, 'grad_norm': 9.293378886349172, 'learning_rate': 1.6113596937584358e-06, 'epoch': 0.63} 63%|██████▎ | 7723/12313 [5:46:50<3:24:16, 2.67s/it] 63%|██████▎ | 7724/12313 [5:46:52<3:27:40, 2.72s/it] {'loss': 0.4391, 'grad_norm': 3.9282722976253432, 'learning_rate': 1.610745049687521e-06, 'epoch': 0.63} 63%|██████▎ | 7724/12313 [5:46:52<3:27:40, 2.72s/it] 63%|██████▎ | 7725/12313 [5:46:55<3:28:37, 2.73s/it] {'loss': 0.4308, 'grad_norm': 5.900142462141413, 'learning_rate': 1.6101304671483425e-06, 'epoch': 0.63} 63%|██████▎ | 7725/12313 [5:46:55<3:28:37, 2.73s/it] 63%|██████▎ | 7726/12313 [5:46:58<3:23:19, 2.66s/it] {'loss': 0.3723, 'grad_norm': 5.538553572045406, 'learning_rate': 1.6095159461834252e-06, 'epoch': 0.63} 63%|██████▎ | 7726/12313 [5:46:58<3:23:19, 2.66s/it] 63%|██████▎ | 7727/12313 [5:47:00<3:17:30, 2.58s/it] {'loss': 0.5306, 'grad_norm': 5.156039599008567, 'learning_rate': 1.6089014868352925e-06, 'epoch': 0.63} 63%|██████▎ | 7727/12313 [5:47:00<3:17:30, 2.58s/it] 63%|██████▎ | 7728/12313 [5:47:03<3:18:37, 2.60s/it] {'loss': 0.525, 'grad_norm': 5.634533236318975, 'learning_rate': 1.608287089146461e-06, 'epoch': 0.63} 63%|██████▎ | 7728/12313 [5:47:03<3:18:37, 2.60s/it] 63%|██████▎ | 7729/12313 [5:47:05<3:15:20, 2.56s/it] {'loss': 0.3932, 'grad_norm': 5.598968838519203, 'learning_rate': 1.6076727531594428e-06, 'epoch': 0.63} 63%|██████▎ | 7729/12313 [5:47:05<3:15:20, 2.56s/it] 63%|██████▎ | 7730/12313 [5:47:08<3:19:06, 2.61s/it] {'loss': 0.4358, 'grad_norm': 5.654175409392677, 'learning_rate': 1.607058478916748e-06, 'epoch': 0.63} 63%|██████▎ | 7730/12313 [5:47:08<3:19:06, 2.61s/it] 63%|██████▎ | 7731/12313 [5:47:11<3:21:58, 2.64s/it] {'loss': 0.6522, 'grad_norm': 7.282317038390533, 'learning_rate': 1.6064442664608808e-06, 'epoch': 0.63} 63%|██████▎ | 7731/12313 [5:47:11<3:21:58, 2.64s/it] 63%|██████▎ | 7732/12313 [5:47:13<3:18:50, 2.60s/it] {'loss': 0.4913, 'grad_norm': 6.6301535369071924, 'learning_rate': 1.6058301158343408e-06, 'epoch': 0.63} 63%|██████▎ | 7732/12313 [5:47:13<3:18:50, 2.60s/it] 63%|██████▎ | 7733/12313 [5:47:16<3:21:36, 2.64s/it] {'loss': 0.4066, 'grad_norm': 4.56677620849299, 'learning_rate': 1.6052160270796252e-06, 'epoch': 0.63} 63%|██████▎ | 7733/12313 [5:47:16<3:21:36, 2.64s/it] 63%|██████▎ | 7734/12313 [5:47:19<3:24:27, 2.68s/it] {'loss': 0.6191, 'grad_norm': 6.930735350423143, 'learning_rate': 1.6046020002392242e-06, 'epoch': 0.63} 63%|██████▎ | 7734/12313 [5:47:19<3:24:27, 2.68s/it] 63%|██████▎ | 7735/12313 [5:47:22<3:27:13, 2.72s/it] {'loss': 0.4239, 'grad_norm': 3.3385261614929584, 'learning_rate': 1.603988035355627e-06, 'epoch': 0.63} 63%|██████▎ | 7735/12313 [5:47:22<3:27:13, 2.72s/it] 63%|██████▎ | 7736/12313 [5:47:24<3:24:35, 2.68s/it] {'loss': 0.5148, 'grad_norm': 6.108614723656081, 'learning_rate': 1.6033741324713143e-06, 'epoch': 0.63} 63%|██████▎ | 7736/12313 [5:47:24<3:24:35, 2.68s/it] 63%|██████▎ | 7737/12313 [5:47:27<3:21:01, 2.64s/it] {'loss': 0.5385, 'grad_norm': 5.083243850769626, 'learning_rate': 1.6027602916287665e-06, 'epoch': 0.63} 63%|██████▎ | 7737/12313 [5:47:27<3:21:01, 2.64s/it] 63%|██████▎ | 7738/12313 [5:47:29<3:17:45, 2.59s/it] {'loss': 0.4434, 'grad_norm': 4.3659844044456735, 'learning_rate': 1.6021465128704592e-06, 'epoch': 0.63} 63%|██████▎ | 7738/12313 [5:47:29<3:17:45, 2.59s/it] 63%|██████▎ | 7739/12313 [5:47:32<3:20:43, 2.63s/it] {'loss': 0.5304, 'grad_norm': 3.246650947537016, 'learning_rate': 1.60153279623886e-06, 'epoch': 0.63} 63%|██████▎ | 7739/12313 [5:47:32<3:20:43, 2.63s/it] 63%|██████▎ | 7740/12313 [5:47:34<3:18:51, 2.61s/it] {'loss': 0.4603, 'grad_norm': 6.965030627490836, 'learning_rate': 1.6009191417764366e-06, 'epoch': 0.63} 63%|██████▎ | 7740/12313 [5:47:34<3:18:51, 2.61s/it] 63%|██████▎ | 7741/12313 [5:47:37<3:24:46, 2.69s/it] {'loss': 0.4716, 'grad_norm': 4.3257443108941835, 'learning_rate': 1.600305549525651e-06, 'epoch': 0.63} 63%|██████▎ | 7741/12313 [5:47:37<3:24:46, 2.69s/it] 63%|██████▎ | 7742/12313 [5:47:40<3:24:31, 2.68s/it] {'loss': 0.5073, 'grad_norm': 5.777339582650028, 'learning_rate': 1.5996920195289586e-06, 'epoch': 0.63} 63%|██████▎ | 7742/12313 [5:47:40<3:24:31, 2.68s/it] 63%|██████▎ | 7743/12313 [5:47:43<3:23:23, 2.67s/it] {'loss': 0.6117, 'grad_norm': 11.440189242706712, 'learning_rate': 1.5990785518288144e-06, 'epoch': 0.63} 63%|██████▎ | 7743/12313 [5:47:43<3:23:23, 2.67s/it] 63%|██████▎ | 7744/12313 [5:47:45<3:20:32, 2.63s/it] {'loss': 0.4889, 'grad_norm': 5.114208771227363, 'learning_rate': 1.5984651464676664e-06, 'epoch': 0.63} 63%|██████▎ | 7744/12313 [5:47:45<3:20:32, 2.63s/it] 63%|██████▎ | 7745/12313 [5:47:48<3:19:18, 2.62s/it] {'loss': 0.3973, 'grad_norm': 5.705508333315021, 'learning_rate': 1.5978518034879583e-06, 'epoch': 0.63} 63%|██████▎ | 7745/12313 [5:47:48<3:19:18, 2.62s/it] 63%|██████▎ | 7746/12313 [5:47:50<3:18:24, 2.61s/it] {'loss': 0.5272, 'grad_norm': 8.471787190410229, 'learning_rate': 1.5972385229321313e-06, 'epoch': 0.63} 63%|██████▎ | 7746/12313 [5:47:50<3:18:24, 2.61s/it] 63%|██████▎ | 7747/12313 [5:47:53<3:16:32, 2.58s/it] {'loss': 0.6467, 'grad_norm': 4.82503142826085, 'learning_rate': 1.5966253048426212e-06, 'epoch': 0.63} 63%|██████▎ | 7747/12313 [5:47:53<3:16:32, 2.58s/it] 63%|██████▎ | 7748/12313 [5:47:55<3:16:14, 2.58s/it] {'loss': 0.544, 'grad_norm': 4.760084685878624, 'learning_rate': 1.596012149261858e-06, 'epoch': 0.63} 63%|██████▎ | 7748/12313 [5:47:55<3:16:14, 2.58s/it] 63%|██████▎ | 7749/12313 [5:47:58<3:19:08, 2.62s/it] {'loss': 0.4272, 'grad_norm': 4.772517813956784, 'learning_rate': 1.5953990562322708e-06, 'epoch': 0.63} 63%|██████▎ | 7749/12313 [5:47:58<3:19:08, 2.62s/it] 63%|██████▎ | 7750/12313 [5:48:01<3:20:57, 2.64s/it] {'loss': 0.4509, 'grad_norm': 8.327385517646334, 'learning_rate': 1.5947860257962808e-06, 'epoch': 0.63} 63%|██████▎ | 7750/12313 [5:48:01<3:20:57, 2.64s/it] 63%|██████▎ | 7751/12313 [5:48:04<3:28:39, 2.74s/it] {'loss': 0.4959, 'grad_norm': 4.076666918311391, 'learning_rate': 1.5941730579963065e-06, 'epoch': 0.63} 63%|██████▎ | 7751/12313 [5:48:04<3:28:39, 2.74s/it] 63%|██████▎ | 7752/12313 [5:48:06<3:22:26, 2.66s/it] {'loss': 0.4375, 'grad_norm': 6.179747838755071, 'learning_rate': 1.5935601528747635e-06, 'epoch': 0.63} 63%|██████▎ | 7752/12313 [5:48:06<3:22:26, 2.66s/it] 63%|██████▎ | 7753/12313 [5:48:09<3:22:54, 2.67s/it] {'loss': 0.5545, 'grad_norm': 12.93986219429753, 'learning_rate': 1.5929473104740605e-06, 'epoch': 0.63} 63%|██████▎ | 7753/12313 [5:48:09<3:22:54, 2.67s/it] 63%|██████▎ | 7754/12313 [5:48:12<3:32:13, 2.79s/it] {'loss': 0.4124, 'grad_norm': 3.6144579855996346, 'learning_rate': 1.5923345308366033e-06, 'epoch': 0.63} 63%|██████▎ | 7754/12313 [5:48:12<3:32:13, 2.79s/it] 63%|██████▎ | 7755/12313 [5:48:15<3:25:41, 2.71s/it] {'loss': 0.6357, 'grad_norm': 3.832036350634625, 'learning_rate': 1.591721814004792e-06, 'epoch': 0.63} 63%|██████▎ | 7755/12313 [5:48:15<3:25:41, 2.71s/it] 63%|██████▎ | 7756/12313 [5:48:18<3:32:07, 2.79s/it] {'loss': 0.582, 'grad_norm': 3.254583154769827, 'learning_rate': 1.5911091600210243e-06, 'epoch': 0.63} 63%|██████▎ | 7756/12313 [5:48:18<3:32:07, 2.79s/it] 63%|██████▎ | 7757/12313 [5:48:20<3:24:46, 2.70s/it] {'loss': 0.809, 'grad_norm': 4.7973970186134505, 'learning_rate': 1.5904965689276935e-06, 'epoch': 0.63} 63%|██████▎ | 7757/12313 [5:48:20<3:24:46, 2.70s/it] 63%|██████▎ | 7758/12313 [5:48:23<3:30:44, 2.78s/it] {'loss': 0.4779, 'grad_norm': 4.07832937486864, 'learning_rate': 1.5898840407671854e-06, 'epoch': 0.63} 63%|██████▎ | 7758/12313 [5:48:23<3:30:44, 2.78s/it] 63%|██████▎ | 7759/12313 [5:48:26<3:26:35, 2.72s/it] {'loss': 0.4955, 'grad_norm': 11.649929406013856, 'learning_rate': 1.5892715755818855e-06, 'epoch': 0.63} 63%|██████▎ | 7759/12313 [5:48:26<3:26:35, 2.72s/it] 63%|██████▎ | 7760/12313 [5:48:28<3:21:14, 2.65s/it] {'loss': 0.3887, 'grad_norm': 7.298974738788142, 'learning_rate': 1.588659173414173e-06, 'epoch': 0.63} 63%|██████▎ | 7760/12313 [5:48:28<3:21:14, 2.65s/it] 63%|██████▎ | 7761/12313 [5:48:31<3:21:23, 2.65s/it] {'loss': 0.5319, 'grad_norm': 6.5690578488061115, 'learning_rate': 1.5880468343064215e-06, 'epoch': 0.63} 63%|██████▎ | 7761/12313 [5:48:31<3:21:23, 2.65s/it] 63%|██████▎ | 7762/12313 [5:48:33<3:20:10, 2.64s/it] {'loss': 0.5715, 'grad_norm': 4.501194184210136, 'learning_rate': 1.5874345583010038e-06, 'epoch': 0.63} 63%|██████▎ | 7762/12313 [5:48:33<3:20:10, 2.64s/it] 63%|██████▎ | 7763/12313 [5:48:36<3:23:12, 2.68s/it] {'loss': 0.5115, 'grad_norm': 4.823495001127094, 'learning_rate': 1.5868223454402842e-06, 'epoch': 0.63} 63%|██████▎ | 7763/12313 [5:48:36<3:23:12, 2.68s/it] 63%|██████▎ | 7764/12313 [5:48:39<3:19:16, 2.63s/it] {'loss': 0.4864, 'grad_norm': 4.765174608509368, 'learning_rate': 1.5862101957666251e-06, 'epoch': 0.63} 63%|██████▎ | 7764/12313 [5:48:39<3:19:16, 2.63s/it] 63%|██████▎ | 7765/12313 [5:48:41<3:18:33, 2.62s/it] {'loss': 0.5312, 'grad_norm': 8.316397458431478, 'learning_rate': 1.5855981093223851e-06, 'epoch': 0.63} 63%|██████▎ | 7765/12313 [5:48:41<3:18:33, 2.62s/it] 63%|██████▎ | 7766/12313 [5:48:44<3:20:06, 2.64s/it] {'loss': 0.5063, 'grad_norm': 3.4813538772732096, 'learning_rate': 1.5849860861499161e-06, 'epoch': 0.63} 63%|██████▎ | 7766/12313 [5:48:44<3:20:06, 2.64s/it] 63%|██████▎ | 7767/12313 [5:48:46<3:17:50, 2.61s/it] {'loss': 0.6373, 'grad_norm': 5.709151262434003, 'learning_rate': 1.584374126291567e-06, 'epoch': 0.63} 63%|██████▎ | 7767/12313 [5:48:46<3:17:50, 2.61s/it] 63%|██████▎ | 7768/12313 [5:48:49<3:17:10, 2.60s/it] {'loss': 0.4196, 'grad_norm': 6.071499004825764, 'learning_rate': 1.5837622297896832e-06, 'epoch': 0.63} 63%|██████▎ | 7768/12313 [5:48:49<3:17:10, 2.60s/it] 63%|██████▎ | 7769/12313 [5:48:52<3:22:58, 2.68s/it] {'loss': 0.4568, 'grad_norm': 4.152234565584312, 'learning_rate': 1.5831503966866038e-06, 'epoch': 0.63} 63%|██████▎ | 7769/12313 [5:48:52<3:22:58, 2.68s/it] 63%|██████▎ | 7770/12313 [5:48:54<3:21:12, 2.66s/it] {'loss': 0.5482, 'grad_norm': 5.124374169356126, 'learning_rate': 1.5825386270246649e-06, 'epoch': 0.63} 63%|██████▎ | 7770/12313 [5:48:54<3:21:12, 2.66s/it] 63%|██████▎ | 7771/12313 [5:48:57<3:23:27, 2.69s/it] {'loss': 0.4628, 'grad_norm': 6.603032548895071, 'learning_rate': 1.5819269208461962e-06, 'epoch': 0.63} 63%|██████▎ | 7771/12313 [5:48:57<3:23:27, 2.69s/it] 63%|██████▎ | 7772/12313 [5:49:00<3:25:33, 2.72s/it] {'loss': 0.4962, 'grad_norm': 3.929016081455086, 'learning_rate': 1.5813152781935264e-06, 'epoch': 0.63} 63%|██████▎ | 7772/12313 [5:49:00<3:25:33, 2.72s/it] 63%|██████▎ | 7773/12313 [5:49:03<3:20:28, 2.65s/it] {'loss': 0.5938, 'grad_norm': 5.248036747944997, 'learning_rate': 1.5807036991089781e-06, 'epoch': 0.63} 63%|██████▎ | 7773/12313 [5:49:03<3:20:28, 2.65s/it] 63%|██████▎ | 7774/12313 [5:49:05<3:19:56, 2.64s/it] {'loss': 0.4762, 'grad_norm': 4.602869975600319, 'learning_rate': 1.5800921836348671e-06, 'epoch': 0.63} 63%|██████▎ | 7774/12313 [5:49:05<3:19:56, 2.64s/it] 63%|██████▎ | 7775/12313 [5:49:08<3:27:36, 2.74s/it] {'loss': 0.4337, 'grad_norm': 3.998867630823634, 'learning_rate': 1.5794807318135097e-06, 'epoch': 0.63} 63%|██████▎ | 7775/12313 [5:49:08<3:27:36, 2.74s/it] 63%|██████▎ | 7776/12313 [5:49:11<3:25:18, 2.72s/it] {'loss': 0.5222, 'grad_norm': 4.511923050001, 'learning_rate': 1.5788693436872132e-06, 'epoch': 0.63} 63%|██████▎ | 7776/12313 [5:49:11<3:25:18, 2.72s/it] 63%|██████▎ | 7777/12313 [5:49:14<3:25:42, 2.72s/it] {'loss': 0.5321, 'grad_norm': 4.732480730008836, 'learning_rate': 1.5782580192982827e-06, 'epoch': 0.63} 63%|██████▎ | 7777/12313 [5:49:14<3:25:42, 2.72s/it] 63%|██████▎ | 7778/12313 [5:49:16<3:25:16, 2.72s/it] {'loss': 0.4472, 'grad_norm': 6.209675234125353, 'learning_rate': 1.57764675868902e-06, 'epoch': 0.63} 63%|██████▎ | 7778/12313 [5:49:16<3:25:16, 2.72s/it] 63%|██████▎ | 7779/12313 [5:49:19<3:20:02, 2.65s/it] {'loss': 0.5362, 'grad_norm': 4.538684305840299, 'learning_rate': 1.5770355619017198e-06, 'epoch': 0.63} 63%|██████▎ | 7779/12313 [5:49:19<3:20:02, 2.65s/it] 63%|██████▎ | 7780/12313 [5:49:21<3:20:00, 2.65s/it] {'loss': 0.4968, 'grad_norm': 3.509225017297568, 'learning_rate': 1.5764244289786728e-06, 'epoch': 0.63} 63%|██████▎ | 7780/12313 [5:49:21<3:20:00, 2.65s/it] 63%|██████▎ | 7781/12313 [5:49:24<3:23:48, 2.70s/it] {'loss': 0.3996, 'grad_norm': 6.521031887625569, 'learning_rate': 1.575813359962169e-06, 'epoch': 0.63} 63%|██████▎ | 7781/12313 [5:49:24<3:23:48, 2.70s/it] 63%|██████▎ | 7782/12313 [5:49:27<3:20:52, 2.66s/it] {'loss': 0.4625, 'grad_norm': 12.929846426438688, 'learning_rate': 1.5752023548944889e-06, 'epoch': 0.63} 63%|██████▎ | 7782/12313 [5:49:27<3:20:52, 2.66s/it] 63%|██████▎ | 7783/12313 [5:49:29<3:21:01, 2.66s/it] {'loss': 0.5613, 'grad_norm': 9.52281730299787, 'learning_rate': 1.574591413817911e-06, 'epoch': 0.63} 63%|██████▎ | 7783/12313 [5:49:29<3:21:01, 2.66s/it] 63%|██████▎ | 7784/12313 [5:49:32<3:20:04, 2.65s/it] {'loss': 0.5386, 'grad_norm': 3.924489403653561, 'learning_rate': 1.57398053677471e-06, 'epoch': 0.63} 63%|██████▎ | 7784/12313 [5:49:32<3:20:04, 2.65s/it] 63%|██████▎ | 7785/12313 [5:49:35<3:24:15, 2.71s/it] {'loss': 0.4747, 'grad_norm': 3.8784061814789137, 'learning_rate': 1.5733697238071553e-06, 'epoch': 0.63} 63%|██████▎ | 7785/12313 [5:49:35<3:24:15, 2.71s/it] 63%|██████▎ | 7786/12313 [5:49:38<3:24:25, 2.71s/it] {'loss': 0.5493, 'grad_norm': 4.881789075504623, 'learning_rate': 1.5727589749575107e-06, 'epoch': 0.63} 63%|██████▎ | 7786/12313 [5:49:38<3:24:25, 2.71s/it] 63%|██████▎ | 7787/12313 [5:49:40<3:27:37, 2.75s/it] {'loss': 0.5279, 'grad_norm': 7.261971143003088, 'learning_rate': 1.5721482902680385e-06, 'epoch': 0.63} 63%|██████▎ | 7787/12313 [5:49:40<3:27:37, 2.75s/it] 63%|██████▎ | 7788/12313 [5:49:43<3:22:24, 2.68s/it] {'loss': 0.5173, 'grad_norm': 6.077300198937425, 'learning_rate': 1.5715376697809937e-06, 'epoch': 0.63} 63%|██████▎ | 7788/12313 [5:49:43<3:22:24, 2.68s/it] 63%|██████▎ | 7789/12313 [5:49:45<3:16:34, 2.61s/it] {'loss': 0.5638, 'grad_norm': 5.975876289320817, 'learning_rate': 1.570927113538629e-06, 'epoch': 0.63} 63%|██████▎ | 7789/12313 [5:49:45<3:16:34, 2.61s/it] 63%|██████▎ | 7790/12313 [5:49:48<3:23:34, 2.70s/it] {'loss': 0.4479, 'grad_norm': 3.670648216770045, 'learning_rate': 1.5703166215831899e-06, 'epoch': 0.63} 63%|██████▎ | 7790/12313 [5:49:48<3:23:34, 2.70s/it] 63%|██████▎ | 7791/12313 [5:49:51<3:20:00, 2.65s/it] {'loss': 0.5388, 'grad_norm': 3.6145233680649254, 'learning_rate': 1.5697061939569214e-06, 'epoch': 0.63} 63%|██████▎ | 7791/12313 [5:49:51<3:20:00, 2.65s/it] 63%|██████▎ | 7792/12313 [5:49:53<3:19:20, 2.65s/it] {'loss': 0.6051, 'grad_norm': 4.822710807501337, 'learning_rate': 1.56909583070206e-06, 'epoch': 0.63} 63%|██████▎ | 7792/12313 [5:49:53<3:19:20, 2.65s/it] 63%|██████▎ | 7793/12313 [5:49:56<3:25:59, 2.73s/it] {'loss': 0.4538, 'grad_norm': 6.545220131213222, 'learning_rate': 1.56848553186084e-06, 'epoch': 0.63} 63%|██████▎ | 7793/12313 [5:49:56<3:25:59, 2.73s/it] 63%|██████▎ | 7794/12313 [5:49:59<3:33:42, 2.84s/it] {'loss': 0.4151, 'grad_norm': 6.080696544546954, 'learning_rate': 1.567875297475492e-06, 'epoch': 0.63} 63%|██████▎ | 7794/12313 [5:49:59<3:33:42, 2.84s/it] 63%|██████▎ | 7795/12313 [5:50:02<3:31:35, 2.81s/it] {'loss': 0.5212, 'grad_norm': 9.899327382181253, 'learning_rate': 1.56726512758824e-06, 'epoch': 0.63} 63%|██████▎ | 7795/12313 [5:50:02<3:31:35, 2.81s/it] 63%|██████▎ | 7796/12313 [5:50:05<3:28:07, 2.76s/it] {'loss': 0.5173, 'grad_norm': 4.273991889208243, 'learning_rate': 1.566655022241304e-06, 'epoch': 0.63} 63%|██████▎ | 7796/12313 [5:50:05<3:28:07, 2.76s/it] 63%|██████▎ | 7797/12313 [5:50:07<3:23:59, 2.71s/it] {'loss': 0.4797, 'grad_norm': 5.170717612782167, 'learning_rate': 1.5660449814769021e-06, 'epoch': 0.63} 63%|██████▎ | 7797/12313 [5:50:07<3:23:59, 2.71s/it] 63%|██████▎ | 7798/12313 [5:50:10<3:23:25, 2.70s/it] {'loss': 0.4382, 'grad_norm': 5.7053519794438685, 'learning_rate': 1.5654350053372442e-06, 'epoch': 0.63} 63%|██████▎ | 7798/12313 [5:50:10<3:23:25, 2.70s/it] 63%|██████▎ | 7799/12313 [5:50:13<3:16:44, 2.62s/it] {'loss': 0.4524, 'grad_norm': 3.5081020926912667, 'learning_rate': 1.564825093864537e-06, 'epoch': 0.63} 63%|██████▎ | 7799/12313 [5:50:13<3:16:44, 2.62s/it] 63%|██████▎ | 7800/12313 [5:50:15<3:18:00, 2.63s/it] {'loss': 0.4047, 'grad_norm': 9.01478070880551, 'learning_rate': 1.5642152471009849e-06, 'epoch': 0.63} 63%|██████▎ | 7800/12313 [5:50:15<3:18:00, 2.63s/it] 63%|██████▎ | 7801/12313 [5:50:18<3:16:15, 2.61s/it] {'loss': 0.4582, 'grad_norm': 5.177815198718995, 'learning_rate': 1.563605465088785e-06, 'epoch': 0.63} 63%|██████▎ | 7801/12313 [5:50:18<3:16:15, 2.61s/it] 63%|██████▎ | 7802/12313 [5:50:21<3:20:58, 2.67s/it] {'loss': 0.4636, 'grad_norm': 8.412137360840807, 'learning_rate': 1.5629957478701303e-06, 'epoch': 0.63} 63%|██████▎ | 7802/12313 [5:50:21<3:20:58, 2.67s/it] 63%|██████▎ | 7803/12313 [5:50:23<3:21:56, 2.69s/it] {'loss': 0.5674, 'grad_norm': 5.581728871746328, 'learning_rate': 1.5623860954872116e-06, 'epoch': 0.63} 63%|██████▎ | 7803/12313 [5:50:23<3:21:56, 2.69s/it] 63%|██████▎ | 7804/12313 [5:50:26<3:18:09, 2.64s/it] {'loss': 0.4592, 'grad_norm': 4.471083148690759, 'learning_rate': 1.5617765079822133e-06, 'epoch': 0.63} 63%|██████▎ | 7804/12313 [5:50:26<3:18:09, 2.64s/it] 63%|██████▎ | 7805/12313 [5:50:29<3:20:22, 2.67s/it] {'loss': 0.5273, 'grad_norm': 3.475348192814086, 'learning_rate': 1.5611669853973141e-06, 'epoch': 0.63} 63%|██████▎ | 7805/12313 [5:50:29<3:20:22, 2.67s/it] 63%|██████▎ | 7806/12313 [5:50:31<3:21:03, 2.68s/it] {'loss': 0.4363, 'grad_norm': 5.120001622959669, 'learning_rate': 1.5605575277746912e-06, 'epoch': 0.63} 63%|██████▎ | 7806/12313 [5:50:31<3:21:03, 2.68s/it] 63%|██████▎ | 7807/12313 [5:50:34<3:16:24, 2.62s/it] {'loss': 0.4075, 'grad_norm': 4.543060760794114, 'learning_rate': 1.559948135156516e-06, 'epoch': 0.63} 63%|██████▎ | 7807/12313 [5:50:34<3:16:24, 2.62s/it] 63%|██████▎ | 7808/12313 [5:50:37<3:29:33, 2.79s/it] {'loss': 0.4653, 'grad_norm': 4.1227313596641, 'learning_rate': 1.559338807584954e-06, 'epoch': 0.63} 63%|██████▎ | 7808/12313 [5:50:37<3:29:33, 2.79s/it] 63%|██████▎ | 7809/12313 [5:50:40<3:26:44, 2.75s/it] {'loss': 0.5367, 'grad_norm': 4.831713571856866, 'learning_rate': 1.5587295451021678e-06, 'epoch': 0.63} 63%|██████▎ | 7809/12313 [5:50:40<3:26:44, 2.75s/it] 63%|██████▎ | 7810/12313 [5:50:42<3:25:28, 2.74s/it] {'loss': 0.6405, 'grad_norm': 4.2655332482022414, 'learning_rate': 1.5581203477503166e-06, 'epoch': 0.63} 63%|██████▎ | 7810/12313 [5:50:42<3:25:28, 2.74s/it] 63%|██████▎ | 7811/12313 [5:50:45<3:30:22, 2.80s/it] {'loss': 0.408, 'grad_norm': 8.517022361410643, 'learning_rate': 1.5575112155715516e-06, 'epoch': 0.63} 63%|██████▎ | 7811/12313 [5:50:45<3:30:22, 2.80s/it] 63%|██████▎ | 7812/12313 [5:50:48<3:34:12, 2.86s/it] {'loss': 0.4273, 'grad_norm': 4.14581190989536, 'learning_rate': 1.5569021486080223e-06, 'epoch': 0.63} 63%|██████▎ | 7812/12313 [5:50:48<3:34:12, 2.86s/it] 63%|██████▎ | 7813/12313 [5:50:51<3:28:17, 2.78s/it] {'loss': 0.688, 'grad_norm': 4.406741970957099, 'learning_rate': 1.5562931469018738e-06, 'epoch': 0.63} 63%|██████▎ | 7813/12313 [5:50:51<3:28:17, 2.78s/it] 63%|██████▎ | 7814/12313 [5:50:53<3:19:14, 2.66s/it] {'loss': 0.5208, 'grad_norm': 4.483314144569286, 'learning_rate': 1.555684210495245e-06, 'epoch': 0.63} 63%|██████▎ | 7814/12313 [5:50:53<3:19:14, 2.66s/it] 63%|██████▎ | 7815/12313 [5:50:56<3:22:57, 2.71s/it] {'loss': 0.5329, 'grad_norm': 4.337838645295809, 'learning_rate': 1.5550753394302702e-06, 'epoch': 0.63} 63%|██████▎ | 7815/12313 [5:50:56<3:22:57, 2.71s/it] 63%|██████▎ | 7816/12313 [5:50:59<3:22:34, 2.70s/it] {'loss': 0.5038, 'grad_norm': 6.548340782272945, 'learning_rate': 1.5544665337490822e-06, 'epoch': 0.63} 63%|██████▎ | 7816/12313 [5:50:59<3:22:34, 2.70s/it] 63%|██████▎ | 7817/12313 [5:51:01<3:15:47, 2.61s/it] {'loss': 0.415, 'grad_norm': 26.26368001727583, 'learning_rate': 1.5538577934938051e-06, 'epoch': 0.63} 63%|██████▎ | 7817/12313 [5:51:01<3:15:47, 2.61s/it] 63%|██████▎ | 7818/12313 [5:51:04<3:18:18, 2.65s/it] {'loss': 0.4125, 'grad_norm': 6.56905135630574, 'learning_rate': 1.5532491187065607e-06, 'epoch': 0.63} 63%|██████▎ | 7818/12313 [5:51:04<3:18:18, 2.65s/it] 64%|██████▎ | 7819/12313 [5:51:06<3:16:33, 2.62s/it] {'loss': 0.4559, 'grad_norm': 4.494023935708008, 'learning_rate': 1.5526405094294682e-06, 'epoch': 0.64} 64%|██████▎ | 7819/12313 [5:51:06<3:16:33, 2.62s/it] 64%|██████▎ | 7820/12313 [5:51:09<3:14:57, 2.60s/it] {'loss': 0.4749, 'grad_norm': 7.264576312866752, 'learning_rate': 1.5520319657046384e-06, 'epoch': 0.64} 64%|██████▎ | 7820/12313 [5:51:09<3:14:57, 2.60s/it] 64%|██████▎ | 7821/12313 [5:51:12<3:14:16, 2.59s/it] {'loss': 0.5266, 'grad_norm': 3.8422635582509037, 'learning_rate': 1.5514234875741785e-06, 'epoch': 0.64} 64%|██████▎ | 7821/12313 [5:51:12<3:14:16, 2.59s/it] 64%|██████▎ | 7822/12313 [5:51:14<3:12:00, 2.57s/it] {'loss': 0.5509, 'grad_norm': 4.7814474085953815, 'learning_rate': 1.550815075080193e-06, 'epoch': 0.64} 64%|██████▎ | 7822/12313 [5:51:14<3:12:00, 2.57s/it] 64%|██████▎ | 7823/12313 [5:51:17<3:14:24, 2.60s/it] {'loss': 0.4687, 'grad_norm': 14.680542072418953, 'learning_rate': 1.5502067282647821e-06, 'epoch': 0.64} 64%|██████▎ | 7823/12313 [5:51:17<3:14:24, 2.60s/it] 64%|██████▎ | 7824/12313 [5:51:20<3:19:07, 2.66s/it] {'loss': 0.5625, 'grad_norm': 5.102982342242324, 'learning_rate': 1.5495984471700382e-06, 'epoch': 0.64} 64%|██████▎ | 7824/12313 [5:51:20<3:19:07, 2.66s/it] 64%|██████▎ | 7825/12313 [5:51:22<3:20:33, 2.68s/it] {'loss': 0.3983, 'grad_norm': 6.340239866996015, 'learning_rate': 1.5489902318380512e-06, 'epoch': 0.64} 64%|██████▎ | 7825/12313 [5:51:22<3:20:33, 2.68s/it] 64%|██████▎ | 7826/12313 [5:51:25<3:24:28, 2.73s/it] {'loss': 0.5833, 'grad_norm': 5.566203991765062, 'learning_rate': 1.5483820823109078e-06, 'epoch': 0.64} 64%|██████▎ | 7826/12313 [5:51:25<3:24:28, 2.73s/it] 64%|██████▎ | 7827/12313 [5:51:28<3:21:46, 2.70s/it] {'loss': 0.6481, 'grad_norm': 4.137320108495999, 'learning_rate': 1.5477739986306878e-06, 'epoch': 0.64} 64%|██████▎ | 7827/12313 [5:51:28<3:21:46, 2.70s/it] 64%|██████▎ | 7828/12313 [5:51:30<3:19:46, 2.67s/it] {'loss': 0.4265, 'grad_norm': 4.613431700216779, 'learning_rate': 1.5471659808394669e-06, 'epoch': 0.64} 64%|██████▎ | 7828/12313 [5:51:30<3:19:46, 2.67s/it] 64%|██████▎ | 7829/12313 [5:51:33<3:21:43, 2.70s/it] {'loss': 0.3759, 'grad_norm': 3.7043285271670507, 'learning_rate': 1.546558028979318e-06, 'epoch': 0.64} 64%|██████▎ | 7829/12313 [5:51:33<3:21:43, 2.70s/it] 64%|██████▎ | 7830/12313 [5:51:36<3:18:35, 2.66s/it] {'loss': 0.4925, 'grad_norm': 5.051376359459655, 'learning_rate': 1.5459501430923073e-06, 'epoch': 0.64} 64%|██████▎ | 7830/12313 [5:51:36<3:18:35, 2.66s/it] 64%|██████▎ | 7831/12313 [5:51:39<3:32:28, 2.84s/it] {'loss': 0.4753, 'grad_norm': 4.08262450327924, 'learning_rate': 1.5453423232204968e-06, 'epoch': 0.64} 64%|██████▎ | 7831/12313 [5:51:39<3:32:28, 2.84s/it] 64%|██████▎ | 7832/12313 [5:51:42<3:30:45, 2.82s/it] {'loss': 0.4227, 'grad_norm': 4.998294324944461, 'learning_rate': 1.5447345694059462e-06, 'epoch': 0.64} 64%|██████▎ | 7832/12313 [5:51:42<3:30:45, 2.82s/it] 64%|██████▎ | 7833/12313 [5:51:44<3:25:45, 2.76s/it] {'loss': 0.5512, 'grad_norm': 13.638204885625212, 'learning_rate': 1.5441268816907077e-06, 'epoch': 0.64} 64%|██████▎ | 7833/12313 [5:51:44<3:25:45, 2.76s/it] 64%|██████▎ | 7834/12313 [5:51:47<3:25:12, 2.75s/it] {'loss': 0.5912, 'grad_norm': 22.49883181989276, 'learning_rate': 1.5435192601168293e-06, 'epoch': 0.64} 64%|██████▎ | 7834/12313 [5:51:47<3:25:12, 2.75s/it] 64%|██████▎ | 7835/12313 [5:51:50<3:21:12, 2.70s/it] {'loss': 0.425, 'grad_norm': 5.137209289732004, 'learning_rate': 1.542911704726356e-06, 'epoch': 0.64} 64%|██████▎ | 7835/12313 [5:51:50<3:21:12, 2.70s/it] 64%|██████▎ | 7836/12313 [5:51:52<3:18:01, 2.65s/it] {'loss': 0.3496, 'grad_norm': 5.554841196878051, 'learning_rate': 1.5423042155613283e-06, 'epoch': 0.64} 64%|██████▎ | 7836/12313 [5:51:52<3:18:01, 2.65s/it] 64%|██████▎ | 7837/12313 [5:51:55<3:15:15, 2.62s/it] {'loss': 0.3677, 'grad_norm': 6.808818397785924, 'learning_rate': 1.5416967926637793e-06, 'epoch': 0.64} 64%|██████▎ | 7837/12313 [5:51:55<3:15:15, 2.62s/it] 64%|██████▎ | 7838/12313 [5:51:57<3:12:43, 2.58s/it] {'loss': 0.5055, 'grad_norm': 10.986138436222886, 'learning_rate': 1.5410894360757408e-06, 'epoch': 0.64} 64%|██████▎ | 7838/12313 [5:51:57<3:12:43, 2.58s/it] 64%|██████▎ | 7839/12313 [5:52:00<3:13:03, 2.59s/it] {'loss': 0.4956, 'grad_norm': 7.0748129231448695, 'learning_rate': 1.540482145839239e-06, 'epoch': 0.64} 64%|██████▎ | 7839/12313 [5:52:00<3:13:03, 2.59s/it] 64%|██████▎ | 7840/12313 [5:52:03<3:18:36, 2.66s/it] {'loss': 0.5188, 'grad_norm': 5.517324400634849, 'learning_rate': 1.5398749219962935e-06, 'epoch': 0.64} 64%|██████▎ | 7840/12313 [5:52:03<3:18:36, 2.66s/it] 64%|██████▎ | 7841/12313 [5:52:05<3:17:06, 2.64s/it] {'loss': 0.4918, 'grad_norm': 7.369254822891894, 'learning_rate': 1.5392677645889225e-06, 'epoch': 0.64} 64%|██████▎ | 7841/12313 [5:52:05<3:17:06, 2.64s/it] 64%|██████▎ | 7842/12313 [5:52:08<3:15:23, 2.62s/it] {'loss': 0.5523, 'grad_norm': 4.443831510239287, 'learning_rate': 1.5386606736591381e-06, 'epoch': 0.64} 64%|██████▎ | 7842/12313 [5:52:08<3:15:23, 2.62s/it] 64%|██████▎ | 7843/12313 [5:52:10<3:14:41, 2.61s/it] {'loss': 0.4526, 'grad_norm': 5.484153254041072, 'learning_rate': 1.5380536492489468e-06, 'epoch': 0.64} 64%|██████▎ | 7843/12313 [5:52:10<3:14:41, 2.61s/it] 64%|██████▎ | 7844/12313 [5:52:13<3:12:46, 2.59s/it] {'loss': 0.5204, 'grad_norm': 3.6793655308252364, 'learning_rate': 1.5374466914003516e-06, 'epoch': 0.64} 64%|██████▎ | 7844/12313 [5:52:13<3:12:46, 2.59s/it] 64%|██████▎ | 7845/12313 [5:52:16<3:16:39, 2.64s/it] {'loss': 0.5365, 'grad_norm': 5.6124428134045, 'learning_rate': 1.536839800155352e-06, 'epoch': 0.64} 64%|██████▎ | 7845/12313 [5:52:16<3:16:39, 2.64s/it] 64%|██████▎ | 7846/12313 [5:52:18<3:10:07, 2.55s/it] {'loss': 0.4503, 'grad_norm': 4.415081736558317, 'learning_rate': 1.5362329755559402e-06, 'epoch': 0.64} 64%|██████▎ | 7846/12313 [5:52:18<3:10:07, 2.55s/it] 64%|██████▎ | 7847/12313 [5:52:21<3:14:35, 2.61s/it] {'loss': 0.5873, 'grad_norm': 8.36645209753681, 'learning_rate': 1.5356262176441051e-06, 'epoch': 0.64} 64%|██████▎ | 7847/12313 [5:52:21<3:14:35, 2.61s/it] 64%|██████▎ | 7848/12313 [5:52:24<3:15:16, 2.62s/it] {'loss': 0.4172, 'grad_norm': 4.410444200701278, 'learning_rate': 1.5350195264618333e-06, 'epoch': 0.64} 64%|██████▎ | 7848/12313 [5:52:24<3:15:16, 2.62s/it] 64%|██████▎ | 7849/12313 [5:52:26<3:12:28, 2.59s/it] {'loss': 0.378, 'grad_norm': 5.471308320004208, 'learning_rate': 1.5344129020511029e-06, 'epoch': 0.64} 64%|██████▎ | 7849/12313 [5:52:26<3:12:28, 2.59s/it] 64%|██████▍ | 7850/12313 [5:52:29<3:19:03, 2.68s/it] {'loss': 0.4858, 'grad_norm': 2.964980011323985, 'learning_rate': 1.5338063444538887e-06, 'epoch': 0.64} 64%|██████▍ | 7850/12313 [5:52:29<3:19:03, 2.68s/it] 64%|██████▍ | 7851/12313 [5:52:32<3:17:48, 2.66s/it] {'loss': 0.4601, 'grad_norm': 4.034123659647055, 'learning_rate': 1.533199853712162e-06, 'epoch': 0.64} 64%|██████▍ | 7851/12313 [5:52:32<3:17:48, 2.66s/it] 64%|██████▍ | 7852/12313 [5:52:34<3:17:21, 2.65s/it] {'loss': 0.5198, 'grad_norm': 5.360248478803197, 'learning_rate': 1.5325934298678896e-06, 'epoch': 0.64} 64%|██████▍ | 7852/12313 [5:52:34<3:17:21, 2.65s/it] 64%|██████▍ | 7853/12313 [5:52:37<3:16:42, 2.65s/it] {'loss': 0.5347, 'grad_norm': 4.128851671530124, 'learning_rate': 1.5319870729630303e-06, 'epoch': 0.64} 64%|██████▍ | 7853/12313 [5:52:37<3:16:42, 2.65s/it] 64%|██████▍ | 7854/12313 [5:52:39<3:15:41, 2.63s/it] {'loss': 0.4249, 'grad_norm': 5.837367955901182, 'learning_rate': 1.5313807830395437e-06, 'epoch': 0.64} 64%|██████▍ | 7854/12313 [5:52:39<3:15:41, 2.63s/it] 64%|██████▍ | 7855/12313 [5:52:42<3:15:39, 2.63s/it] {'loss': 0.5663, 'grad_norm': 4.420374712053163, 'learning_rate': 1.5307745601393808e-06, 'epoch': 0.64} 64%|██████▍ | 7855/12313 [5:52:42<3:15:39, 2.63s/it] 64%|██████▍ | 7856/12313 [5:52:45<3:14:48, 2.62s/it] {'loss': 0.569, 'grad_norm': 8.34561480709057, 'learning_rate': 1.5301684043044875e-06, 'epoch': 0.64} 64%|██████▍ | 7856/12313 [5:52:45<3:14:48, 2.62s/it] 64%|██████▍ | 7857/12313 [5:52:47<3:12:52, 2.60s/it] {'loss': 0.4581, 'grad_norm': 11.076659493261435, 'learning_rate': 1.5295623155768086e-06, 'epoch': 0.64} 64%|██████▍ | 7857/12313 [5:52:47<3:12:52, 2.60s/it] 64%|██████▍ | 7858/12313 [5:52:50<3:10:46, 2.57s/it] {'loss': 0.5524, 'grad_norm': 5.281957951201278, 'learning_rate': 1.5289562939982822e-06, 'epoch': 0.64} 64%|██████▍ | 7858/12313 [5:52:50<3:10:46, 2.57s/it] 64%|██████▍ | 7859/12313 [5:52:52<3:11:28, 2.58s/it] {'loss': 0.5083, 'grad_norm': 4.075328659102114, 'learning_rate': 1.5283503396108401e-06, 'epoch': 0.64} 64%|██████▍ | 7859/12313 [5:52:52<3:11:28, 2.58s/it] 64%|██████▍ | 7860/12313 [5:52:55<3:20:00, 2.69s/it] {'loss': 0.524, 'grad_norm': 4.739091105300953, 'learning_rate': 1.5277444524564117e-06, 'epoch': 0.64} 64%|██████▍ | 7860/12313 [5:52:55<3:20:00, 2.69s/it] 64%|██████▍ | 7861/12313 [5:52:58<3:18:18, 2.67s/it] {'loss': 0.2867, 'grad_norm': 12.502865910338423, 'learning_rate': 1.5271386325769227e-06, 'epoch': 0.64} 64%|██████▍ | 7861/12313 [5:52:58<3:18:18, 2.67s/it] 64%|██████▍ | 7862/12313 [5:53:01<3:19:27, 2.69s/it] {'loss': 0.6359, 'grad_norm': 4.833851282125769, 'learning_rate': 1.526532880014292e-06, 'epoch': 0.64} 64%|██████▍ | 7862/12313 [5:53:01<3:19:27, 2.69s/it] 64%|██████▍ | 7863/12313 [5:53:03<3:15:21, 2.63s/it] {'loss': 0.4139, 'grad_norm': 4.157660884304162, 'learning_rate': 1.5259271948104323e-06, 'epoch': 0.64} 64%|██████▍ | 7863/12313 [5:53:03<3:15:21, 2.63s/it] 64%|██████▍ | 7864/12313 [5:53:06<3:15:34, 2.64s/it] {'loss': 0.4898, 'grad_norm': 4.699011951945198, 'learning_rate': 1.5253215770072564e-06, 'epoch': 0.64} 64%|██████▍ | 7864/12313 [5:53:06<3:15:34, 2.64s/it] 64%|██████▍ | 7865/12313 [5:53:08<3:10:19, 2.57s/it] {'loss': 0.5325, 'grad_norm': 5.911530191898309, 'learning_rate': 1.5247160266466693e-06, 'epoch': 0.64} 64%|██████▍ | 7865/12313 [5:53:08<3:10:19, 2.57s/it] 64%|██████▍ | 7866/12313 [5:53:11<3:22:02, 2.73s/it] {'loss': 0.5321, 'grad_norm': 3.680767366819412, 'learning_rate': 1.5241105437705706e-06, 'epoch': 0.64} 64%|██████▍ | 7866/12313 [5:53:11<3:22:02, 2.73s/it] 64%|██████▍ | 7867/12313 [5:53:14<3:19:16, 2.69s/it] {'loss': 0.4228, 'grad_norm': 6.398260678627941, 'learning_rate': 1.523505128420858e-06, 'epoch': 0.64} 64%|██████▍ | 7867/12313 [5:53:14<3:19:16, 2.69s/it] 64%|██████▍ | 7868/12313 [5:53:16<3:18:21, 2.68s/it] {'loss': 0.5505, 'grad_norm': 3.398702627971208, 'learning_rate': 1.522899780639423e-06, 'epoch': 0.64} 64%|██████▍ | 7868/12313 [5:53:16<3:18:21, 2.68s/it] 64%|██████▍ | 7869/12313 [5:53:19<3:16:55, 2.66s/it] {'loss': 0.7697, 'grad_norm': 6.202752332409573, 'learning_rate': 1.5222945004681504e-06, 'epoch': 0.64} 64%|██████▍ | 7869/12313 [5:53:19<3:16:55, 2.66s/it] 64%|██████▍ | 7870/12313 [5:53:22<3:20:57, 2.71s/it] {'loss': 0.4696, 'grad_norm': 5.874268780003323, 'learning_rate': 1.5216892879489253e-06, 'epoch': 0.64} 64%|██████▍ | 7870/12313 [5:53:22<3:20:57, 2.71s/it] 64%|██████▍ | 7871/12313 [5:53:25<3:18:50, 2.69s/it] {'loss': 0.6132, 'grad_norm': 4.404120702182698, 'learning_rate': 1.521084143123624e-06, 'epoch': 0.64} 64%|██████▍ | 7871/12313 [5:53:25<3:18:50, 2.69s/it] 64%|██████▍ | 7872/12313 [5:53:27<3:20:25, 2.71s/it] {'loss': 0.4298, 'grad_norm': 4.8946588555368695, 'learning_rate': 1.5204790660341178e-06, 'epoch': 0.64} 64%|██████▍ | 7872/12313 [5:53:27<3:20:25, 2.71s/it] 64%|██████▍ | 7873/12313 [5:53:30<3:19:24, 2.69s/it] {'loss': 0.6014, 'grad_norm': 4.660366909329026, 'learning_rate': 1.519874056722277e-06, 'epoch': 0.64} 64%|██████▍ | 7873/12313 [5:53:30<3:19:24, 2.69s/it] 64%|██████▍ | 7874/12313 [5:53:33<3:23:23, 2.75s/it] {'loss': 0.5607, 'grad_norm': 9.926138073456057, 'learning_rate': 1.5192691152299649e-06, 'epoch': 0.64} 64%|██████▍ | 7874/12313 [5:53:33<3:23:23, 2.75s/it] 64%|██████▍ | 7875/12313 [5:53:36<3:22:56, 2.74s/it] {'loss': 0.4831, 'grad_norm': 6.862534435729142, 'learning_rate': 1.5186642415990382e-06, 'epoch': 0.64} 64%|██████▍ | 7875/12313 [5:53:36<3:22:56, 2.74s/it] 64%|██████▍ | 7876/12313 [5:53:38<3:15:18, 2.64s/it] {'loss': 0.3118, 'grad_norm': 5.109846150321177, 'learning_rate': 1.518059435871353e-06, 'epoch': 0.64} 64%|██████▍ | 7876/12313 [5:53:38<3:15:18, 2.64s/it] 64%|██████▍ | 7877/12313 [5:53:41<3:13:52, 2.62s/it] {'loss': 0.407, 'grad_norm': 4.824987683441751, 'learning_rate': 1.5174546980887585e-06, 'epoch': 0.64} 64%|██████▍ | 7877/12313 [5:53:41<3:13:52, 2.62s/it] 64%|██████▍ | 7878/12313 [5:53:43<3:14:32, 2.63s/it] {'loss': 0.6486, 'grad_norm': 3.088737083022684, 'learning_rate': 1.516850028293099e-06, 'epoch': 0.64} 64%|██████▍ | 7878/12313 [5:53:43<3:14:32, 2.63s/it] 64%|██████▍ | 7879/12313 [5:53:46<3:15:44, 2.65s/it] {'loss': 0.6395, 'grad_norm': 4.687914435447199, 'learning_rate': 1.516245426526213e-06, 'epoch': 0.64} 64%|██████▍ | 7879/12313 [5:53:46<3:15:44, 2.65s/it] 64%|██████▍ | 7880/12313 [5:53:48<3:13:15, 2.62s/it] {'loss': 0.5865, 'grad_norm': 6.45233418054512, 'learning_rate': 1.5156408928299377e-06, 'epoch': 0.64} 64%|██████▍ | 7880/12313 [5:53:48<3:13:15, 2.62s/it] 64%|██████▍ | 7881/12313 [5:53:51<3:11:32, 2.59s/it] {'loss': 0.4263, 'grad_norm': 4.4953810242052, 'learning_rate': 1.5150364272461035e-06, 'epoch': 0.64} 64%|██████▍ | 7881/12313 [5:53:51<3:11:32, 2.59s/it] 64%|██████▍ | 7882/12313 [5:53:54<3:23:23, 2.75s/it] {'loss': 0.4651, 'grad_norm': 5.174417200023911, 'learning_rate': 1.5144320298165346e-06, 'epoch': 0.64} 64%|██████▍ | 7882/12313 [5:53:54<3:23:23, 2.75s/it] 64%|██████▍ | 7883/12313 [5:53:57<3:23:11, 2.75s/it] {'loss': 0.4134, 'grad_norm': 8.999793348428359, 'learning_rate': 1.5138277005830538e-06, 'epoch': 0.64} 64%|██████▍ | 7883/12313 [5:53:57<3:23:11, 2.75s/it] 64%|██████▍ | 7884/12313 [5:54:00<3:32:53, 2.88s/it] {'loss': 0.5613, 'grad_norm': 4.942872220787506, 'learning_rate': 1.5132234395874773e-06, 'epoch': 0.64} 64%|██████▍ | 7884/12313 [5:54:00<3:32:53, 2.88s/it] 64%|██████▍ | 7885/12313 [5:54:03<3:27:34, 2.81s/it] {'loss': 0.4878, 'grad_norm': 3.7848669467000433, 'learning_rate': 1.5126192468716152e-06, 'epoch': 0.64} 64%|██████▍ | 7885/12313 [5:54:03<3:27:34, 2.81s/it] 64%|██████▍ | 7886/12313 [5:54:06<3:34:50, 2.91s/it] {'loss': 0.6558, 'grad_norm': 4.408963540747303, 'learning_rate': 1.5120151224772765e-06, 'epoch': 0.64} 64%|██████▍ | 7886/12313 [5:54:06<3:34:50, 2.91s/it] 64%|██████▍ | 7887/12313 [5:54:09<3:31:43, 2.87s/it] {'loss': 0.5179, 'grad_norm': 6.167053391776367, 'learning_rate': 1.5114110664462624e-06, 'epoch': 0.64} 64%|██████▍ | 7887/12313 [5:54:09<3:31:43, 2.87s/it] 64%|██████▍ | 7888/12313 [5:54:11<3:31:05, 2.86s/it] {'loss': 0.5723, 'grad_norm': 4.691809504965685, 'learning_rate': 1.5108070788203699e-06, 'epoch': 0.64} 64%|██████▍ | 7888/12313 [5:54:11<3:31:05, 2.86s/it] 64%|██████▍ | 7889/12313 [5:54:14<3:27:53, 2.82s/it] {'loss': 0.5001, 'grad_norm': 4.8601836373152, 'learning_rate': 1.5102031596413927e-06, 'epoch': 0.64} 64%|██████▍ | 7889/12313 [5:54:14<3:27:53, 2.82s/it] 64%|██████▍ | 7890/12313 [5:54:17<3:24:51, 2.78s/it] {'loss': 0.4755, 'grad_norm': 4.771276325593275, 'learning_rate': 1.509599308951119e-06, 'epoch': 0.64} 64%|██████▍ | 7890/12313 [5:54:17<3:24:51, 2.78s/it] 64%|██████▍ | 7891/12313 [5:54:19<3:19:22, 2.71s/it] {'loss': 0.3362, 'grad_norm': 7.6294635480914135, 'learning_rate': 1.5089955267913303e-06, 'epoch': 0.64} 64%|██████▍ | 7891/12313 [5:54:19<3:19:22, 2.71s/it] 64%|██████▍ | 7892/12313 [5:54:22<3:13:04, 2.62s/it] {'loss': 0.3609, 'grad_norm': 4.020044219945879, 'learning_rate': 1.5083918132038072e-06, 'epoch': 0.64} 64%|██████▍ | 7892/12313 [5:54:22<3:13:04, 2.62s/it] 64%|██████▍ | 7893/12313 [5:54:25<3:15:37, 2.66s/it] {'loss': 0.4598, 'grad_norm': 8.75122403334399, 'learning_rate': 1.5077881682303225e-06, 'epoch': 0.64} 64%|██████▍ | 7893/12313 [5:54:25<3:15:37, 2.66s/it] 64%|██████▍ | 7894/12313 [5:54:27<3:11:40, 2.60s/it] {'loss': 0.5443, 'grad_norm': 12.260107016009812, 'learning_rate': 1.5071845919126448e-06, 'epoch': 0.64} 64%|██████▍ | 7894/12313 [5:54:27<3:11:40, 2.60s/it] 64%|██████▍ | 7895/12313 [5:54:29<3:07:39, 2.55s/it] {'loss': 0.456, 'grad_norm': 6.260976119003239, 'learning_rate': 1.5065810842925399e-06, 'epoch': 0.64} 64%|██████▍ | 7895/12313 [5:54:29<3:07:39, 2.55s/it] 64%|██████▍ | 7896/12313 [5:54:32<3:11:26, 2.60s/it] {'loss': 0.4861, 'grad_norm': 4.832778705163437, 'learning_rate': 1.5059776454117658e-06, 'epoch': 0.64} 64%|██████▍ | 7896/12313 [5:54:32<3:11:26, 2.60s/it] 64%|██████▍ | 7897/12313 [5:54:35<3:16:45, 2.67s/it] {'loss': 0.4146, 'grad_norm': 4.117223333471925, 'learning_rate': 1.505374275312078e-06, 'epoch': 0.64} 64%|██████▍ | 7897/12313 [5:54:35<3:16:45, 2.67s/it] 64%|██████▍ | 7898/12313 [5:54:38<3:15:25, 2.66s/it] {'loss': 0.4328, 'grad_norm': 7.715748444314561, 'learning_rate': 1.504770974035226e-06, 'epoch': 0.64} 64%|██████▍ | 7898/12313 [5:54:38<3:15:25, 2.66s/it] 64%|██████▍ | 7899/12313 [5:54:40<3:16:29, 2.67s/it] {'loss': 0.5428, 'grad_norm': 3.698130972907306, 'learning_rate': 1.5041677416229556e-06, 'epoch': 0.64} 64%|██████▍ | 7899/12313 [5:54:40<3:16:29, 2.67s/it] 64%|██████▍ | 7900/12313 [5:54:43<3:15:35, 2.66s/it] {'loss': 0.5696, 'grad_norm': 8.80799380441798, 'learning_rate': 1.5035645781170078e-06, 'epoch': 0.64} 64%|██████▍ | 7900/12313 [5:54:43<3:15:35, 2.66s/it] 64%|██████▍ | 7901/12313 [5:54:46<3:16:01, 2.67s/it] {'loss': 0.6523, 'grad_norm': 5.951094606026148, 'learning_rate': 1.502961483559116e-06, 'epoch': 0.64} 64%|██████▍ | 7901/12313 [5:54:46<3:16:01, 2.67s/it] 64%|██████▍ | 7902/12313 [5:54:48<3:19:48, 2.72s/it] {'loss': 0.4707, 'grad_norm': 5.956077850717039, 'learning_rate': 1.502358457991014e-06, 'epoch': 0.64} 64%|██████▍ | 7902/12313 [5:54:48<3:19:48, 2.72s/it] 64%|██████▍ | 7903/12313 [5:54:51<3:18:24, 2.70s/it] {'loss': 0.4322, 'grad_norm': 17.140276946196106, 'learning_rate': 1.5017555014544273e-06, 'epoch': 0.64} 64%|██████▍ | 7903/12313 [5:54:51<3:18:24, 2.70s/it] 64%|██████▍ | 7904/12313 [5:54:54<3:15:02, 2.65s/it] {'loss': 0.523, 'grad_norm': 4.87844418439407, 'learning_rate': 1.5011526139910754e-06, 'epoch': 0.64} 64%|██████▍ | 7904/12313 [5:54:54<3:15:02, 2.65s/it] 64%|██████▍ | 7905/12313 [5:54:56<3:15:33, 2.66s/it] {'loss': 0.4496, 'grad_norm': 4.800050176008073, 'learning_rate': 1.5005497956426773e-06, 'epoch': 0.64} 64%|██████▍ | 7905/12313 [5:54:56<3:15:33, 2.66s/it] 64%|██████▍ | 7906/12313 [5:54:59<3:16:34, 2.68s/it] {'loss': 0.4226, 'grad_norm': 15.944468140204993, 'learning_rate': 1.4999470464509432e-06, 'epoch': 0.64} 64%|██████▍ | 7906/12313 [5:54:59<3:16:34, 2.68s/it] 64%|██████▍ | 7907/12313 [5:55:02<3:17:56, 2.70s/it] {'loss': 0.4787, 'grad_norm': 10.898961363880764, 'learning_rate': 1.4993443664575807e-06, 'epoch': 0.64} 64%|██████▍ | 7907/12313 [5:55:02<3:17:56, 2.70s/it] 64%|██████▍ | 7908/12313 [5:55:05<3:20:45, 2.73s/it] {'loss': 0.4948, 'grad_norm': 3.386355939222327, 'learning_rate': 1.4987417557042928e-06, 'epoch': 0.64} 64%|██████▍ | 7908/12313 [5:55:05<3:20:45, 2.73s/it] 64%|██████▍ | 7909/12313 [5:55:07<3:18:03, 2.70s/it] {'loss': 0.504, 'grad_norm': 4.044784764443736, 'learning_rate': 1.4981392142327761e-06, 'epoch': 0.64} 64%|██████▍ | 7909/12313 [5:55:07<3:18:03, 2.70s/it] 64%|██████▍ | 7910/12313 [5:55:10<3:13:39, 2.64s/it] {'loss': 0.5462, 'grad_norm': 7.7128980212700915, 'learning_rate': 1.4975367420847225e-06, 'epoch': 0.64} 64%|██████▍ | 7910/12313 [5:55:10<3:13:39, 2.64s/it] 64%|██████▍ | 7911/12313 [5:55:12<3:14:51, 2.66s/it] {'loss': 0.4488, 'grad_norm': 7.797675381291463, 'learning_rate': 1.4969343393018224e-06, 'epoch': 0.64} 64%|██████▍ | 7911/12313 [5:55:12<3:14:51, 2.66s/it] 64%|██████▍ | 7912/12313 [5:55:15<3:15:53, 2.67s/it] {'loss': 0.5137, 'grad_norm': 3.7094574496346198, 'learning_rate': 1.4963320059257565e-06, 'epoch': 0.64} 64%|██████▍ | 7912/12313 [5:55:15<3:15:53, 2.67s/it] 64%|██████▍ | 7913/12313 [5:55:18<3:15:00, 2.66s/it] {'loss': 0.4704, 'grad_norm': 6.476581873676165, 'learning_rate': 1.4957297419982047e-06, 'epoch': 0.64} 64%|██████▍ | 7913/12313 [5:55:18<3:15:00, 2.66s/it] 64%|██████▍ | 7914/12313 [5:55:20<3:11:09, 2.61s/it] {'loss': 0.463, 'grad_norm': 4.73377261927438, 'learning_rate': 1.4951275475608387e-06, 'epoch': 0.64} 64%|██████▍ | 7914/12313 [5:55:20<3:11:09, 2.61s/it] 64%|██████▍ | 7915/12313 [5:55:23<3:16:27, 2.68s/it] {'loss': 0.5192, 'grad_norm': 3.1667300398717386, 'learning_rate': 1.4945254226553288e-06, 'epoch': 0.64} 64%|██████▍ | 7915/12313 [5:55:23<3:16:27, 2.68s/it] 64%|██████▍ | 7916/12313 [5:55:26<3:24:23, 2.79s/it] {'loss': 0.6137, 'grad_norm': 4.733929299465015, 'learning_rate': 1.4939233673233387e-06, 'epoch': 0.64} 64%|██████▍ | 7916/12313 [5:55:26<3:24:23, 2.79s/it] 64%|██████▍ | 7917/12313 [5:55:29<3:21:35, 2.75s/it] {'loss': 0.4966, 'grad_norm': 5.618667686017774, 'learning_rate': 1.4933213816065257e-06, 'epoch': 0.64} 64%|██████▍ | 7917/12313 [5:55:29<3:21:35, 2.75s/it] 64%|██████▍ | 7918/12313 [5:55:32<3:21:48, 2.76s/it] {'loss': 0.4784, 'grad_norm': 3.847952094017611, 'learning_rate': 1.492719465546546e-06, 'epoch': 0.64} 64%|██████▍ | 7918/12313 [5:55:32<3:21:48, 2.76s/it] 64%|██████▍ | 7919/12313 [5:55:34<3:22:04, 2.76s/it] {'loss': 0.4471, 'grad_norm': 4.697213023502698, 'learning_rate': 1.492117619185049e-06, 'epoch': 0.64} 64%|██████▍ | 7919/12313 [5:55:34<3:22:04, 2.76s/it] 64%|██████▍ | 7920/12313 [5:55:37<3:18:45, 2.71s/it] {'loss': 0.5378, 'grad_norm': 5.479689495939854, 'learning_rate': 1.4915158425636772e-06, 'epoch': 0.64} 64%|██████▍ | 7920/12313 [5:55:37<3:18:45, 2.71s/it] 64%|██████▍ | 7921/12313 [5:55:39<3:13:29, 2.64s/it] {'loss': 0.4669, 'grad_norm': 3.3937052015697367, 'learning_rate': 1.4909141357240731e-06, 'epoch': 0.64} 64%|██████▍ | 7921/12313 [5:55:39<3:13:29, 2.64s/it] 64%|██████▍ | 7922/12313 [5:55:42<3:11:27, 2.62s/it] {'loss': 0.4361, 'grad_norm': 4.013372087007401, 'learning_rate': 1.4903124987078698e-06, 'epoch': 0.64} 64%|██████▍ | 7922/12313 [5:55:42<3:11:27, 2.62s/it] 64%|██████▍ | 7923/12313 [5:55:45<3:11:33, 2.62s/it] {'loss': 0.4443, 'grad_norm': 3.746362491526315, 'learning_rate': 1.4897109315566974e-06, 'epoch': 0.64} 64%|██████▍ | 7923/12313 [5:55:45<3:11:33, 2.62s/it] 64%|██████▍ | 7924/12313 [5:55:48<3:19:23, 2.73s/it] {'loss': 0.446, 'grad_norm': 4.102308375643804, 'learning_rate': 1.4891094343121827e-06, 'epoch': 0.64} 64%|██████▍ | 7924/12313 [5:55:48<3:19:23, 2.73s/it] 64%|██████▍ | 7925/12313 [5:55:50<3:16:37, 2.69s/it] {'loss': 0.7058, 'grad_norm': 3.8787498068648634, 'learning_rate': 1.488508007015944e-06, 'epoch': 0.64} 64%|██████▍ | 7925/12313 [5:55:50<3:16:37, 2.69s/it] 64%|██████▍ | 7926/12313 [5:55:53<3:16:44, 2.69s/it] {'loss': 0.4578, 'grad_norm': 7.193908220842526, 'learning_rate': 1.487906649709598e-06, 'epoch': 0.64} 64%|██████▍ | 7926/12313 [5:55:53<3:16:44, 2.69s/it] 64%|██████▍ | 7927/12313 [5:55:56<3:16:34, 2.69s/it] {'loss': 0.5096, 'grad_norm': 7.588715391068836, 'learning_rate': 1.4873053624347567e-06, 'epoch': 0.64} 64%|██████▍ | 7927/12313 [5:55:56<3:16:34, 2.69s/it] 64%|██████▍ | 7928/12313 [5:55:58<3:16:28, 2.69s/it] {'loss': 0.5221, 'grad_norm': 5.508197098569508, 'learning_rate': 1.4867041452330238e-06, 'epoch': 0.64} 64%|██████▍ | 7928/12313 [5:55:58<3:16:28, 2.69s/it] 64%|██████▍ | 7929/12313 [5:56:01<3:13:22, 2.65s/it] {'loss': 0.4898, 'grad_norm': 3.6948692770323928, 'learning_rate': 1.4861029981460007e-06, 'epoch': 0.64} 64%|██████▍ | 7929/12313 [5:56:01<3:13:22, 2.65s/it] 64%|██████▍ | 7930/12313 [5:56:03<3:13:22, 2.65s/it] {'loss': 0.4907, 'grad_norm': 3.770725561207981, 'learning_rate': 1.4855019212152852e-06, 'epoch': 0.64} 64%|██████▍ | 7930/12313 [5:56:03<3:13:22, 2.65s/it] 64%|██████▍ | 7931/12313 [5:56:06<3:13:38, 2.65s/it] {'loss': 0.3815, 'grad_norm': 6.044600128479535, 'learning_rate': 1.484900914482467e-06, 'epoch': 0.64} 64%|██████▍ | 7931/12313 [5:56:06<3:13:38, 2.65s/it] 64%|██████▍ | 7932/12313 [5:56:09<3:15:06, 2.67s/it] {'loss': 0.5212, 'grad_norm': 5.141487441925867, 'learning_rate': 1.484299977989134e-06, 'epoch': 0.64} 64%|██████▍ | 7932/12313 [5:56:09<3:15:06, 2.67s/it] 64%|██████▍ | 7933/12313 [5:56:12<3:17:26, 2.70s/it] {'loss': 0.465, 'grad_norm': 6.722063710271542, 'learning_rate': 1.4836991117768657e-06, 'epoch': 0.64} 64%|██████▍ | 7933/12313 [5:56:12<3:17:26, 2.70s/it] 64%|██████▍ | 7934/12313 [5:56:14<3:14:22, 2.66s/it] {'loss': 0.3926, 'grad_norm': 6.60995189007181, 'learning_rate': 1.4830983158872414e-06, 'epoch': 0.64} 64%|██████▍ | 7934/12313 [5:56:14<3:14:22, 2.66s/it] 64%|██████▍ | 7935/12313 [5:56:17<3:16:25, 2.69s/it] {'loss': 0.5679, 'grad_norm': 5.159161950257783, 'learning_rate': 1.482497590361831e-06, 'epoch': 0.64} 64%|██████▍ | 7935/12313 [5:56:17<3:16:25, 2.69s/it] 64%|██████▍ | 7936/12313 [5:56:20<3:15:15, 2.68s/it] {'loss': 0.4301, 'grad_norm': 4.722697499573461, 'learning_rate': 1.4818969352422018e-06, 'epoch': 0.64} 64%|██████▍ | 7936/12313 [5:56:20<3:15:15, 2.68s/it] 64%|██████▍ | 7937/12313 [5:56:22<3:14:50, 2.67s/it] {'loss': 0.5284, 'grad_norm': 5.60413330763897, 'learning_rate': 1.4812963505699179e-06, 'epoch': 0.64} 64%|██████▍ | 7937/12313 [5:56:22<3:14:50, 2.67s/it] 64%|██████▍ | 7938/12313 [5:56:25<3:10:56, 2.62s/it] {'loss': 0.4881, 'grad_norm': 4.906758714354853, 'learning_rate': 1.4806958363865342e-06, 'epoch': 0.64} 64%|██████▍ | 7938/12313 [5:56:25<3:10:56, 2.62s/it] 64%|██████▍ | 7939/12313 [5:56:27<3:05:57, 2.55s/it] {'loss': 0.4121, 'grad_norm': 5.416581495053993, 'learning_rate': 1.4800953927336036e-06, 'epoch': 0.64} 64%|██████▍ | 7939/12313 [5:56:27<3:05:57, 2.55s/it] 64%|██████▍ | 7940/12313 [5:56:30<3:07:31, 2.57s/it] {'loss': 0.4134, 'grad_norm': 3.1479449081387605, 'learning_rate': 1.4794950196526753e-06, 'epoch': 0.64} 64%|██████▍ | 7940/12313 [5:56:30<3:07:31, 2.57s/it] 64%|██████▍ | 7941/12313 [5:56:33<3:15:49, 2.69s/it] {'loss': 0.3979, 'grad_norm': 4.49719543896871, 'learning_rate': 1.4788947171852899e-06, 'epoch': 0.64} 64%|██████▍ | 7941/12313 [5:56:33<3:15:49, 2.69s/it] 65%|██████▍ | 7942/12313 [5:56:35<3:14:00, 2.66s/it] {'loss': 0.4008, 'grad_norm': 5.0629449929653205, 'learning_rate': 1.4782944853729856e-06, 'epoch': 0.65} 65%|██████▍ | 7942/12313 [5:56:35<3:14:00, 2.66s/it] 65%|██████▍ | 7943/12313 [5:56:38<3:21:44, 2.77s/it] {'loss': 0.6205, 'grad_norm': 4.398126234430731, 'learning_rate': 1.4776943242572966e-06, 'epoch': 0.65} 65%|██████▍ | 7943/12313 [5:56:38<3:21:44, 2.77s/it] 65%|██████▍ | 7944/12313 [5:56:41<3:14:10, 2.67s/it] {'loss': 0.5619, 'grad_norm': 4.1699063676723425, 'learning_rate': 1.4770942338797491e-06, 'epoch': 0.65} 65%|██████▍ | 7944/12313 [5:56:41<3:14:10, 2.67s/it] 65%|██████▍ | 7945/12313 [5:56:43<3:11:18, 2.63s/it] {'loss': 0.4946, 'grad_norm': 83.64698952954932, 'learning_rate': 1.4764942142818667e-06, 'epoch': 0.65} 65%|██████▍ | 7945/12313 [5:56:43<3:11:18, 2.63s/it] 65%|██████▍ | 7946/12313 [5:56:46<3:08:25, 2.59s/it] {'loss': 0.6233, 'grad_norm': 5.2534053694927625, 'learning_rate': 1.475894265505169e-06, 'epoch': 0.65} 65%|██████▍ | 7946/12313 [5:56:46<3:08:25, 2.59s/it] 65%|██████▍ | 7947/12313 [5:56:49<3:10:18, 2.62s/it] {'loss': 0.4209, 'grad_norm': 3.941627067484926, 'learning_rate': 1.4752943875911673e-06, 'epoch': 0.65} 65%|██████▍ | 7947/12313 [5:56:49<3:10:18, 2.62s/it] 65%|██████▍ | 7948/12313 [5:56:51<3:09:01, 2.60s/it] {'loss': 0.5124, 'grad_norm': 9.922976413484275, 'learning_rate': 1.4746945805813707e-06, 'epoch': 0.65} 65%|██████▍ | 7948/12313 [5:56:51<3:09:01, 2.60s/it] 65%|██████▍ | 7949/12313 [5:56:54<3:11:48, 2.64s/it] {'loss': 0.3986, 'grad_norm': 6.9619380350835085, 'learning_rate': 1.4740948445172834e-06, 'epoch': 0.65} 65%|██████▍ | 7949/12313 [5:56:54<3:11:48, 2.64s/it] 65%|██████▍ | 7950/12313 [5:56:56<3:12:41, 2.65s/it] {'loss': 0.4537, 'grad_norm': 4.2939089133962325, 'learning_rate': 1.4734951794404035e-06, 'epoch': 0.65} 65%|██████▍ | 7950/12313 [5:56:56<3:12:41, 2.65s/it] 65%|██████▍ | 7951/12313 [5:56:59<3:18:09, 2.73s/it] {'loss': 0.5554, 'grad_norm': 3.1416028655227293, 'learning_rate': 1.4728955853922238e-06, 'epoch': 0.65} 65%|██████▍ | 7951/12313 [5:56:59<3:18:09, 2.73s/it] 65%|██████▍ | 7952/12313 [5:57:02<3:15:07, 2.68s/it] {'loss': 0.4172, 'grad_norm': 17.016801772420305, 'learning_rate': 1.4722960624142336e-06, 'epoch': 0.65} 65%|██████▍ | 7952/12313 [5:57:02<3:15:07, 2.68s/it] 65%|██████▍ | 7953/12313 [5:57:05<3:14:48, 2.68s/it] {'loss': 0.3805, 'grad_norm': 5.911786929784765, 'learning_rate': 1.4716966105479175e-06, 'epoch': 0.65} 65%|██████▍ | 7953/12313 [5:57:05<3:14:48, 2.68s/it] 65%|██████▍ | 7954/12313 [5:57:07<3:15:43, 2.69s/it] {'loss': 0.4062, 'grad_norm': 4.538417255604256, 'learning_rate': 1.471097229834753e-06, 'epoch': 0.65} 65%|██████▍ | 7954/12313 [5:57:07<3:15:43, 2.69s/it] 65%|██████▍ | 7955/12313 [5:57:10<3:11:43, 2.64s/it] {'loss': 0.4817, 'grad_norm': 7.061375321676026, 'learning_rate': 1.4704979203162148e-06, 'epoch': 0.65} 65%|██████▍ | 7955/12313 [5:57:10<3:11:43, 2.64s/it] 65%|██████▍ | 7956/12313 [5:57:12<3:07:54, 2.59s/it] {'loss': 0.3514, 'grad_norm': 8.396993615373008, 'learning_rate': 1.4698986820337729e-06, 'epoch': 0.65} 65%|██████▍ | 7956/12313 [5:57:12<3:07:54, 2.59s/it] 65%|██████▍ | 7957/12313 [5:57:15<3:09:00, 2.60s/it] {'loss': 0.4232, 'grad_norm': 5.11600701972491, 'learning_rate': 1.4692995150288896e-06, 'epoch': 0.65} 65%|██████▍ | 7957/12313 [5:57:15<3:09:00, 2.60s/it] 65%|██████▍ | 7958/12313 [5:57:17<3:05:25, 2.55s/it] {'loss': 0.6237, 'grad_norm': 3.9201591191720446, 'learning_rate': 1.4687004193430248e-06, 'epoch': 0.65} 65%|██████▍ | 7958/12313 [5:57:17<3:05:25, 2.55s/it] 65%|██████▍ | 7959/12313 [5:57:20<3:09:27, 2.61s/it] {'loss': 0.4223, 'grad_norm': 4.347166743753607, 'learning_rate': 1.4681013950176338e-06, 'epoch': 0.65} 65%|██████▍ | 7959/12313 [5:57:20<3:09:27, 2.61s/it] 65%|██████▍ | 7960/12313 [5:57:23<3:11:47, 2.64s/it] {'loss': 0.4167, 'grad_norm': 4.5755798327245625, 'learning_rate': 1.4675024420941643e-06, 'epoch': 0.65} 65%|██████▍ | 7960/12313 [5:57:23<3:11:47, 2.64s/it] 65%|██████▍ | 7961/12313 [5:57:26<3:18:16, 2.73s/it] {'loss': 0.531, 'grad_norm': 3.768261457483465, 'learning_rate': 1.4669035606140613e-06, 'epoch': 0.65} 65%|██████▍ | 7961/12313 [5:57:26<3:18:16, 2.73s/it] 65%|██████▍ | 7962/12313 [5:57:28<3:15:17, 2.69s/it] {'loss': 0.4719, 'grad_norm': 4.347537189926498, 'learning_rate': 1.4663047506187649e-06, 'epoch': 0.65} 65%|██████▍ | 7962/12313 [5:57:28<3:15:17, 2.69s/it] 65%|██████▍ | 7963/12313 [5:57:31<3:15:10, 2.69s/it] {'loss': 0.55, 'grad_norm': 4.256186132283668, 'learning_rate': 1.4657060121497095e-06, 'epoch': 0.65} 65%|██████▍ | 7963/12313 [5:57:31<3:15:10, 2.69s/it] 65%|██████▍ | 7964/12313 [5:57:34<3:17:22, 2.72s/it] {'loss': 0.762, 'grad_norm': 4.071283516303743, 'learning_rate': 1.4651073452483228e-06, 'epoch': 0.65} 65%|██████▍ | 7964/12313 [5:57:34<3:17:22, 2.72s/it] 65%|██████▍ | 7965/12313 [5:57:37<3:16:37, 2.71s/it] {'loss': 0.5069, 'grad_norm': 10.936892803098575, 'learning_rate': 1.4645087499560313e-06, 'epoch': 0.65} 65%|██████▍ | 7965/12313 [5:57:37<3:16:37, 2.71s/it] 65%|██████▍ | 7966/12313 [5:57:39<3:16:12, 2.71s/it] {'loss': 0.4825, 'grad_norm': 4.622669916018039, 'learning_rate': 1.4639102263142546e-06, 'epoch': 0.65} 65%|██████▍ | 7966/12313 [5:57:39<3:16:12, 2.71s/it] 65%|██████▍ | 7967/12313 [5:57:42<3:12:36, 2.66s/it] {'loss': 0.5098, 'grad_norm': 5.995534947389931, 'learning_rate': 1.463311774364406e-06, 'epoch': 0.65} 65%|██████▍ | 7967/12313 [5:57:42<3:12:36, 2.66s/it] 65%|██████▍ | 7968/12313 [5:57:44<3:10:24, 2.63s/it] {'loss': 0.6188, 'grad_norm': 81.25869393551474, 'learning_rate': 1.4627133941478958e-06, 'epoch': 0.65} 65%|██████▍ | 7968/12313 [5:57:44<3:10:24, 2.63s/it] 65%|██████▍ | 7969/12313 [5:57:47<3:19:55, 2.76s/it] {'loss': 0.4715, 'grad_norm': 5.489822227742221, 'learning_rate': 1.46211508570613e-06, 'epoch': 0.65} 65%|██████▍ | 7969/12313 [5:57:47<3:19:55, 2.76s/it] 65%|██████▍ | 7970/12313 [5:57:50<3:19:45, 2.76s/it] {'loss': 0.3925, 'grad_norm': 5.211788919279321, 'learning_rate': 1.4615168490805066e-06, 'epoch': 0.65} 65%|██████▍ | 7970/12313 [5:57:50<3:19:45, 2.76s/it] 65%|██████▍ | 7971/12313 [5:57:53<3:18:08, 2.74s/it] {'loss': 0.5827, 'grad_norm': 4.441097732454784, 'learning_rate': 1.4609186843124208e-06, 'epoch': 0.65} 65%|██████▍ | 7971/12313 [5:57:53<3:18:08, 2.74s/it] 65%|██████▍ | 7972/12313 [5:57:56<3:19:45, 2.76s/it] {'loss': 0.5865, 'grad_norm': 4.918274629630136, 'learning_rate': 1.4603205914432638e-06, 'epoch': 0.65} 65%|██████▍ | 7972/12313 [5:57:56<3:19:45, 2.76s/it] 65%|██████▍ | 7973/12313 [5:57:58<3:18:28, 2.74s/it] {'loss': 0.427, 'grad_norm': 5.381182030506248, 'learning_rate': 1.4597225705144189e-06, 'epoch': 0.65} 65%|██████▍ | 7973/12313 [5:57:58<3:18:28, 2.74s/it] 65%|██████▍ | 7974/12313 [5:58:01<3:19:20, 2.76s/it] {'loss': 0.5468, 'grad_norm': 4.47373690879082, 'learning_rate': 1.459124621567266e-06, 'epoch': 0.65} 65%|██████▍ | 7974/12313 [5:58:01<3:19:20, 2.76s/it] 65%|██████▍ | 7975/12313 [5:58:04<3:15:36, 2.71s/it] {'loss': 0.4893, 'grad_norm': 4.853763535922953, 'learning_rate': 1.4585267446431817e-06, 'epoch': 0.65} 65%|██████▍ | 7975/12313 [5:58:04<3:15:36, 2.71s/it] 65%|██████▍ | 7976/12313 [5:58:06<3:13:24, 2.68s/it] {'loss': 0.6503, 'grad_norm': 12.67331094044437, 'learning_rate': 1.4579289397835344e-06, 'epoch': 0.65} 65%|██████▍ | 7976/12313 [5:58:06<3:13:24, 2.68s/it] 65%|██████▍ | 7977/12313 [5:58:10<3:23:30, 2.82s/it] {'loss': 0.48, 'grad_norm': 3.5376573305444725, 'learning_rate': 1.4573312070296885e-06, 'epoch': 0.65} 65%|██████▍ | 7977/12313 [5:58:10<3:23:30, 2.82s/it] 65%|██████▍ | 7978/12313 [5:58:12<3:16:08, 2.71s/it] {'loss': 0.4793, 'grad_norm': 6.986996329380094, 'learning_rate': 1.4567335464230062e-06, 'epoch': 0.65} 65%|██████▍ | 7978/12313 [5:58:12<3:16:08, 2.71s/it] 65%|██████▍ | 7979/12313 [5:58:15<3:18:29, 2.75s/it] {'loss': 0.567, 'grad_norm': 6.098834362052517, 'learning_rate': 1.4561359580048394e-06, 'epoch': 0.65} 65%|██████▍ | 7979/12313 [5:58:15<3:18:29, 2.75s/it] 65%|██████▍ | 7980/12313 [5:58:17<3:15:22, 2.71s/it] {'loss': 0.5839, 'grad_norm': 5.728827122709186, 'learning_rate': 1.4555384418165405e-06, 'epoch': 0.65} 65%|██████▍ | 7980/12313 [5:58:17<3:15:22, 2.71s/it] 65%|██████▍ | 7981/12313 [5:58:20<3:14:27, 2.69s/it] {'loss': 0.721, 'grad_norm': 6.108126965155767, 'learning_rate': 1.4549409978994543e-06, 'epoch': 0.65} 65%|██████▍ | 7981/12313 [5:58:20<3:14:27, 2.69s/it] 65%|██████▍ | 7982/12313 [5:58:23<3:13:39, 2.68s/it] {'loss': 0.3989, 'grad_norm': 12.175353754173727, 'learning_rate': 1.45434362629492e-06, 'epoch': 0.65} 65%|██████▍ | 7982/12313 [5:58:23<3:13:39, 2.68s/it] 65%|██████▍ | 7983/12313 [5:58:25<3:11:24, 2.65s/it] {'loss': 0.5943, 'grad_norm': 5.726270390332232, 'learning_rate': 1.453746327044272e-06, 'epoch': 0.65} 65%|██████▍ | 7983/12313 [5:58:25<3:11:24, 2.65s/it] 65%|██████▍ | 7984/12313 [5:58:28<3:12:13, 2.66s/it] {'loss': 0.3702, 'grad_norm': 8.228427503586767, 'learning_rate': 1.4531491001888421e-06, 'epoch': 0.65} 65%|██████▍ | 7984/12313 [5:58:28<3:12:13, 2.66s/it] 65%|██████▍ | 7985/12313 [5:58:31<3:12:38, 2.67s/it] {'loss': 0.4701, 'grad_norm': 6.718803197230006, 'learning_rate': 1.4525519457699527e-06, 'epoch': 0.65} 65%|██████▍ | 7985/12313 [5:58:31<3:12:38, 2.67s/it] 65%|██████▍ | 7986/12313 [5:58:33<3:11:54, 2.66s/it] {'loss': 0.454, 'grad_norm': 6.288459331865035, 'learning_rate': 1.451954863828926e-06, 'epoch': 0.65} 65%|██████▍ | 7986/12313 [5:58:33<3:11:54, 2.66s/it] 65%|██████▍ | 7987/12313 [5:58:36<3:12:55, 2.68s/it] {'loss': 0.4614, 'grad_norm': 5.184071201153414, 'learning_rate': 1.4513578544070753e-06, 'epoch': 0.65} 65%|██████▍ | 7987/12313 [5:58:36<3:12:55, 2.68s/it] 65%|██████▍ | 7988/12313 [5:58:39<3:16:34, 2.73s/it] {'loss': 0.4276, 'grad_norm': 6.12187597162872, 'learning_rate': 1.4507609175457121e-06, 'epoch': 0.65} 65%|██████▍ | 7988/12313 [5:58:39<3:16:34, 2.73s/it] 65%|██████▍ | 7989/12313 [5:58:42<3:16:30, 2.73s/it] {'loss': 0.4831, 'grad_norm': 6.050006695871517, 'learning_rate': 1.4501640532861405e-06, 'epoch': 0.65} 65%|██████▍ | 7989/12313 [5:58:42<3:16:30, 2.73s/it] 65%|██████▍ | 7990/12313 [5:58:45<3:29:35, 2.91s/it] {'loss': 0.3898, 'grad_norm': 6.399026819639625, 'learning_rate': 1.4495672616696594e-06, 'epoch': 0.65} 65%|██████▍ | 7990/12313 [5:58:45<3:29:35, 2.91s/it] 65%|██████▍ | 7991/12313 [5:58:48<3:24:33, 2.84s/it] {'loss': 0.5073, 'grad_norm': 5.876464671442483, 'learning_rate': 1.448970542737565e-06, 'epoch': 0.65} 65%|██████▍ | 7991/12313 [5:58:48<3:24:33, 2.84s/it] 65%|██████▍ | 7992/12313 [5:58:50<3:23:42, 2.83s/it] {'loss': 0.5262, 'grad_norm': 4.813231306921125, 'learning_rate': 1.4483738965311455e-06, 'epoch': 0.65} 65%|██████▍ | 7992/12313 [5:58:50<3:23:42, 2.83s/it] 65%|██████▍ | 7993/12313 [5:58:53<3:22:29, 2.81s/it] {'loss': 0.4813, 'grad_norm': 3.7301648640363716, 'learning_rate': 1.4477773230916872e-06, 'epoch': 0.65} 65%|██████▍ | 7993/12313 [5:58:53<3:22:29, 2.81s/it] 65%|██████▍ | 7994/12313 [5:58:56<3:17:04, 2.74s/it] {'loss': 0.442, 'grad_norm': 5.647909285913068, 'learning_rate': 1.44718082246047e-06, 'epoch': 0.65} 65%|██████▍ | 7994/12313 [5:58:56<3:17:04, 2.74s/it] 65%|██████▍ | 7995/12313 [5:58:58<3:09:14, 2.63s/it] {'loss': 0.3569, 'grad_norm': 5.414289909475496, 'learning_rate': 1.4465843946787683e-06, 'epoch': 0.65} 65%|██████▍ | 7995/12313 [5:58:58<3:09:14, 2.63s/it] 65%|██████▍ | 7996/12313 [5:59:01<3:08:46, 2.62s/it] {'loss': 0.485, 'grad_norm': 7.022409413689571, 'learning_rate': 1.44598803978785e-06, 'epoch': 0.65} 65%|██████▍ | 7996/12313 [5:59:01<3:08:46, 2.62s/it] 65%|██████▍ | 7997/12313 [5:59:03<3:09:50, 2.64s/it] {'loss': 0.6726, 'grad_norm': 5.325732469754769, 'learning_rate': 1.4453917578289823e-06, 'epoch': 0.65} 65%|██████▍ | 7997/12313 [5:59:03<3:09:50, 2.64s/it] 65%|██████▍ | 7998/12313 [5:59:06<3:09:22, 2.63s/it] {'loss': 0.5305, 'grad_norm': 6.073037858190773, 'learning_rate': 1.4447955488434223e-06, 'epoch': 0.65} 65%|██████▍ | 7998/12313 [5:59:06<3:09:22, 2.63s/it] 65%|██████▍ | 7999/12313 [5:59:09<3:04:16, 2.56s/it] {'loss': 0.4777, 'grad_norm': 7.0964517666533515, 'learning_rate': 1.4441994128724258e-06, 'epoch': 0.65} 65%|██████▍ | 7999/12313 [5:59:09<3:04:16, 2.56s/it] 65%|██████▍ | 8000/12313 [5:59:11<3:00:37, 2.51s/it] {'loss': 0.5041, 'grad_norm': 5.760466887618525, 'learning_rate': 1.443603349957243e-06, 'epoch': 0.65} 65%|██████▍ | 8000/12313 [5:59:11<3:00:37, 2.51s/it] 65%|██████▍ | 8001/12313 [5:59:14<3:03:11, 2.55s/it] {'loss': 0.5213, 'grad_norm': 4.591683516314596, 'learning_rate': 1.4430073601391175e-06, 'epoch': 0.65} 65%|██████▍ | 8001/12313 [5:59:14<3:03:11, 2.55s/it] 65%|██████▍ | 8002/12313 [5:59:16<3:06:53, 2.60s/it] {'loss': 0.5668, 'grad_norm': 5.711693932834374, 'learning_rate': 1.442411443459289e-06, 'epoch': 0.65} 65%|██████▍ | 8002/12313 [5:59:16<3:06:53, 2.60s/it] 65%|██████▍ | 8003/12313 [5:59:19<3:16:04, 2.73s/it] {'loss': 0.6332, 'grad_norm': 5.284351713643368, 'learning_rate': 1.44181559995899e-06, 'epoch': 0.65} 65%|██████▍ | 8003/12313 [5:59:19<3:16:04, 2.73s/it] 65%|██████▌ | 8004/12313 [5:59:22<3:10:03, 2.65s/it] {'loss': 0.4598, 'grad_norm': 7.634523671563961, 'learning_rate': 1.4412198296794516e-06, 'epoch': 0.65} 65%|██████▌ | 8004/12313 [5:59:22<3:10:03, 2.65s/it] 65%|██████▌ | 8005/12313 [5:59:25<3:20:26, 2.79s/it] {'loss': 0.585, 'grad_norm': 4.421291562696544, 'learning_rate': 1.4406241326618981e-06, 'epoch': 0.65} 65%|██████▌ | 8005/12313 [5:59:25<3:20:26, 2.79s/it] 65%|██████▌ | 8006/12313 [5:59:28<3:18:51, 2.77s/it] {'loss': 0.7653, 'grad_norm': 4.721051742091286, 'learning_rate': 1.4400285089475468e-06, 'epoch': 0.65} 65%|██████▌ | 8006/12313 [5:59:28<3:18:51, 2.77s/it] 65%|██████▌ | 8007/12313 [5:59:30<3:17:19, 2.75s/it] {'loss': 0.3979, 'grad_norm': 5.351925711601577, 'learning_rate': 1.4394329585776143e-06, 'epoch': 0.65} 65%|██████▌ | 8007/12313 [5:59:30<3:17:19, 2.75s/it] 65%|██████▌ | 8008/12313 [5:59:33<3:13:43, 2.70s/it] {'loss': 0.4846, 'grad_norm': 4.447327868885781, 'learning_rate': 1.4388374815933078e-06, 'epoch': 0.65} 65%|██████▌ | 8008/12313 [5:59:33<3:13:43, 2.70s/it] 65%|██████▌ | 8009/12313 [5:59:35<3:09:58, 2.65s/it] {'loss': 0.4739, 'grad_norm': 4.016961136441612, 'learning_rate': 1.4382420780358306e-06, 'epoch': 0.65} 65%|██████▌ | 8009/12313 [5:59:35<3:09:58, 2.65s/it] 65%|██████▌ | 8010/12313 [5:59:39<3:31:27, 2.95s/it] {'loss': 0.3597, 'grad_norm': 5.414485646946576, 'learning_rate': 1.4376467479463832e-06, 'epoch': 0.65} 65%|██████▌ | 8010/12313 [5:59:39<3:31:27, 2.95s/it] 65%|██████▌ | 8011/12313 [5:59:42<3:26:58, 2.89s/it] {'loss': 0.5058, 'grad_norm': 11.24686014696604, 'learning_rate': 1.4370514913661576e-06, 'epoch': 0.65} 65%|██████▌ | 8011/12313 [5:59:42<3:26:58, 2.89s/it] 65%|██████▌ | 8012/12313 [5:59:44<3:16:57, 2.75s/it] {'loss': 0.6707, 'grad_norm': 3.574284491275758, 'learning_rate': 1.436456308336343e-06, 'epoch': 0.65} 65%|██████▌ | 8012/12313 [5:59:44<3:16:57, 2.75s/it] 65%|██████▌ | 8013/12313 [5:59:47<3:13:54, 2.71s/it] {'loss': 0.4094, 'grad_norm': 4.818858068808228, 'learning_rate': 1.4358611988981242e-06, 'epoch': 0.65} 65%|██████▌ | 8013/12313 [5:59:47<3:13:54, 2.71s/it] 65%|██████▌ | 8014/12313 [5:59:50<3:14:15, 2.71s/it] {'loss': 0.5239, 'grad_norm': 4.129816073543707, 'learning_rate': 1.4352661630926783e-06, 'epoch': 0.65} 65%|██████▌ | 8014/12313 [5:59:50<3:14:15, 2.71s/it] 65%|██████▌ | 8015/12313 [5:59:52<3:12:47, 2.69s/it] {'loss': 0.4763, 'grad_norm': 7.2250869844226155, 'learning_rate': 1.4346712009611786e-06, 'epoch': 0.65} 65%|██████▌ | 8015/12313 [5:59:52<3:12:47, 2.69s/it] 65%|██████▌ | 8016/12313 [5:59:55<3:08:08, 2.63s/it] {'loss': 0.4563, 'grad_norm': 6.519568020534992, 'learning_rate': 1.434076312544794e-06, 'epoch': 0.65} 65%|██████▌ | 8016/12313 [5:59:55<3:08:08, 2.63s/it] 65%|██████▌ | 8017/12313 [5:59:57<3:08:03, 2.63s/it] {'loss': 0.4024, 'grad_norm': 5.982613565742739, 'learning_rate': 1.4334814978846863e-06, 'epoch': 0.65} 65%|██████▌ | 8017/12313 [5:59:57<3:08:03, 2.63s/it] 65%|██████▌ | 8018/12313 [6:00:00<3:08:19, 2.63s/it] {'loss': 0.4997, 'grad_norm': 6.189908706083811, 'learning_rate': 1.4328867570220148e-06, 'epoch': 0.65} 65%|██████▌ | 8018/12313 [6:00:00<3:08:19, 2.63s/it] 65%|██████▌ | 8019/12313 [6:00:03<3:11:13, 2.67s/it] {'loss': 0.5354, 'grad_norm': 6.213197790148053, 'learning_rate': 1.4322920899979327e-06, 'epoch': 0.65} 65%|██████▌ | 8019/12313 [6:00:03<3:11:13, 2.67s/it] 65%|██████▌ | 8020/12313 [6:00:05<3:11:44, 2.68s/it] {'loss': 0.3249, 'grad_norm': 4.184480691904144, 'learning_rate': 1.4316974968535873e-06, 'epoch': 0.65} 65%|██████▌ | 8020/12313 [6:00:05<3:11:44, 2.68s/it] 65%|██████▌ | 8021/12313 [6:00:08<3:09:40, 2.65s/it] {'loss': 0.4938, 'grad_norm': 4.810772139069584, 'learning_rate': 1.4311029776301216e-06, 'epoch': 0.65} 65%|██████▌ | 8021/12313 [6:00:08<3:09:40, 2.65s/it] 65%|██████▌ | 8022/12313 [6:00:11<3:14:20, 2.72s/it] {'loss': 0.4544, 'grad_norm': 6.385985541045838, 'learning_rate': 1.4305085323686714e-06, 'epoch': 0.65} 65%|██████▌ | 8022/12313 [6:00:11<3:14:20, 2.72s/it] 65%|██████▌ | 8023/12313 [6:00:14<3:13:42, 2.71s/it] {'loss': 0.5221, 'grad_norm': 4.1470201913398155, 'learning_rate': 1.4299141611103717e-06, 'epoch': 0.65} 65%|██████▌ | 8023/12313 [6:00:14<3:13:42, 2.71s/it] 65%|██████▌ | 8024/12313 [6:00:16<3:11:47, 2.68s/it] {'loss': 0.5142, 'grad_norm': 4.404511419426839, 'learning_rate': 1.4293198638963476e-06, 'epoch': 0.65} 65%|██████▌ | 8024/12313 [6:00:16<3:11:47, 2.68s/it] 65%|██████▌ | 8025/12313 [6:00:19<3:12:26, 2.69s/it] {'loss': 0.6298, 'grad_norm': 4.771427690672761, 'learning_rate': 1.4287256407677225e-06, 'epoch': 0.65} 65%|██████▌ | 8025/12313 [6:00:19<3:12:26, 2.69s/it] 65%|██████▌ | 8026/12313 [6:00:22<3:18:56, 2.78s/it] {'loss': 0.4597, 'grad_norm': 5.110969640062156, 'learning_rate': 1.4281314917656144e-06, 'epoch': 0.65} 65%|██████▌ | 8026/12313 [6:00:22<3:18:56, 2.78s/it] 65%|██████▌ | 8027/12313 [6:00:25<3:15:51, 2.74s/it] {'loss': 0.5317, 'grad_norm': 4.07743528288326, 'learning_rate': 1.4275374169311345e-06, 'epoch': 0.65} 65%|██████▌ | 8027/12313 [6:00:25<3:15:51, 2.74s/it] 65%|██████▌ | 8028/12313 [6:00:27<3:17:17, 2.76s/it] {'loss': 0.6096, 'grad_norm': 3.381461916822027, 'learning_rate': 1.426943416305388e-06, 'epoch': 0.65} 65%|██████▌ | 8028/12313 [6:00:27<3:17:17, 2.76s/it] 65%|██████▌ | 8029/12313 [6:00:30<3:21:34, 2.82s/it] {'loss': 0.3527, 'grad_norm': 6.120291184514056, 'learning_rate': 1.4263494899294794e-06, 'epoch': 0.65} 65%|██████▌ | 8029/12313 [6:00:30<3:21:34, 2.82s/it] 65%|██████▌ | 8030/12313 [6:00:33<3:16:42, 2.76s/it] {'loss': 0.479, 'grad_norm': 6.654443944669531, 'learning_rate': 1.4257556378445025e-06, 'epoch': 0.65} 65%|██████▌ | 8030/12313 [6:00:33<3:16:42, 2.76s/it] 65%|██████▌ | 8031/12313 [6:00:35<3:11:43, 2.69s/it] {'loss': 0.4986, 'grad_norm': 5.964757640753549, 'learning_rate': 1.4251618600915503e-06, 'epoch': 0.65} 65%|██████▌ | 8031/12313 [6:00:35<3:11:43, 2.69s/it] 65%|██████▌ | 8032/12313 [6:00:38<3:08:32, 2.64s/it] {'loss': 0.4979, 'grad_norm': 4.555104801980313, 'learning_rate': 1.4245681567117097e-06, 'epoch': 0.65} 65%|██████▌ | 8032/12313 [6:00:38<3:08:32, 2.64s/it] 65%|██████▌ | 8033/12313 [6:00:41<3:06:15, 2.61s/it] {'loss': 0.4747, 'grad_norm': 4.9235533317425, 'learning_rate': 1.4239745277460614e-06, 'epoch': 0.65} 65%|██████▌ | 8033/12313 [6:00:41<3:06:15, 2.61s/it] 65%|██████▌ | 8034/12313 [6:00:43<3:04:21, 2.59s/it] {'loss': 0.5299, 'grad_norm': 6.579534318663932, 'learning_rate': 1.4233809732356798e-06, 'epoch': 0.65} 65%|██████▌ | 8034/12313 [6:00:43<3:04:21, 2.59s/it] 65%|██████▌ | 8035/12313 [6:00:46<3:05:07, 2.60s/it] {'loss': 0.3675, 'grad_norm': 5.083100688498042, 'learning_rate': 1.4227874932216378e-06, 'epoch': 0.65} 65%|██████▌ | 8035/12313 [6:00:46<3:05:07, 2.60s/it] 65%|██████▌ | 8036/12313 [6:00:48<3:04:39, 2.59s/it] {'loss': 0.5651, 'grad_norm': 23.029068662665217, 'learning_rate': 1.4221940877450006e-06, 'epoch': 0.65} 65%|██████▌ | 8036/12313 [6:00:48<3:04:39, 2.59s/it] 65%|██████▌ | 8037/12313 [6:00:51<3:07:29, 2.63s/it] {'loss': 0.52, 'grad_norm': 5.851390054577819, 'learning_rate': 1.4216007568468272e-06, 'epoch': 0.65} 65%|██████▌ | 8037/12313 [6:00:51<3:07:29, 2.63s/it] 65%|██████▌ | 8038/12313 [6:00:54<3:20:47, 2.82s/it] {'loss': 0.5162, 'grad_norm': 3.2181868045877704, 'learning_rate': 1.4210075005681737e-06, 'epoch': 0.65} 65%|██████▌ | 8038/12313 [6:00:54<3:20:47, 2.82s/it] 65%|██████▌ | 8039/12313 [6:00:57<3:12:49, 2.71s/it] {'loss': 0.4435, 'grad_norm': 7.483188427836994, 'learning_rate': 1.420414318950092e-06, 'epoch': 0.65} 65%|██████▌ | 8039/12313 [6:00:57<3:12:49, 2.71s/it] 65%|██████▌ | 8040/12313 [6:00:59<3:11:47, 2.69s/it] {'loss': 0.441, 'grad_norm': 4.4784108288995315, 'learning_rate': 1.4198212120336255e-06, 'epoch': 0.65} 65%|██████▌ | 8040/12313 [6:00:59<3:11:47, 2.69s/it] 65%|██████▌ | 8041/12313 [6:01:02<3:12:44, 2.71s/it] {'loss': 0.492, 'grad_norm': 6.391569173443781, 'learning_rate': 1.4192281798598133e-06, 'epoch': 0.65} 65%|██████▌ | 8041/12313 [6:01:02<3:12:44, 2.71s/it] 65%|██████▌ | 8042/12313 [6:01:05<3:14:10, 2.73s/it] {'loss': 0.5992, 'grad_norm': 5.75063446601935, 'learning_rate': 1.4186352224696926e-06, 'epoch': 0.65} 65%|██████▌ | 8042/12313 [6:01:05<3:14:10, 2.73s/it] 65%|██████▌ | 8043/12313 [6:01:07<3:10:40, 2.68s/it] {'loss': 0.4642, 'grad_norm': 6.211326632664685, 'learning_rate': 1.4180423399042902e-06, 'epoch': 0.65} 65%|██████▌ | 8043/12313 [6:01:07<3:10:40, 2.68s/it] 65%|██████▌ | 8044/12313 [6:01:10<3:17:24, 2.77s/it] {'loss': 0.5696, 'grad_norm': 11.997397806821628, 'learning_rate': 1.4174495322046316e-06, 'epoch': 0.65} 65%|██████▌ | 8044/12313 [6:01:10<3:17:24, 2.77s/it] 65%|██████▌ | 8045/12313 [6:01:13<3:13:37, 2.72s/it] {'loss': 0.4744, 'grad_norm': 7.193039419958394, 'learning_rate': 1.4168567994117375e-06, 'epoch': 0.65} 65%|██████▌ | 8045/12313 [6:01:13<3:13:37, 2.72s/it] 65%|██████▌ | 8046/12313 [6:01:16<3:09:25, 2.66s/it] {'loss': 0.3842, 'grad_norm': 4.494283159785974, 'learning_rate': 1.41626414156662e-06, 'epoch': 0.65} 65%|██████▌ | 8046/12313 [6:01:16<3:09:25, 2.66s/it] 65%|██████▌ | 8047/12313 [6:01:18<3:11:06, 2.69s/it] {'loss': 0.4835, 'grad_norm': 5.857275012139722, 'learning_rate': 1.4156715587102875e-06, 'epoch': 0.65} 65%|██████▌ | 8047/12313 [6:01:18<3:11:06, 2.69s/it] 65%|██████▌ | 8048/12313 [6:01:21<3:17:30, 2.78s/it] {'loss': 0.4392, 'grad_norm': 6.4200372823639675, 'learning_rate': 1.4150790508837453e-06, 'epoch': 0.65} 65%|██████▌ | 8048/12313 [6:01:21<3:17:30, 2.78s/it] 65%|██████▌ | 8049/12313 [6:01:24<3:13:36, 2.72s/it] {'loss': 0.429, 'grad_norm': 3.22921759731994, 'learning_rate': 1.4144866181279908e-06, 'epoch': 0.65} 65%|██████▌ | 8049/12313 [6:01:24<3:13:36, 2.72s/it] 65%|██████▌ | 8050/12313 [6:01:26<3:11:20, 2.69s/it] {'loss': 0.4329, 'grad_norm': 4.691847016227414, 'learning_rate': 1.4138942604840167e-06, 'epoch': 0.65} 65%|██████▌ | 8050/12313 [6:01:26<3:11:20, 2.69s/it] 65%|██████▌ | 8051/12313 [6:01:29<3:12:27, 2.71s/it] {'loss': 0.4094, 'grad_norm': 5.045037115274969, 'learning_rate': 1.4133019779928115e-06, 'epoch': 0.65} 65%|██████▌ | 8051/12313 [6:01:29<3:12:27, 2.71s/it] 65%|██████▌ | 8052/12313 [6:01:32<3:10:04, 2.68s/it] {'loss': 0.6154, 'grad_norm': 3.3537113111805317, 'learning_rate': 1.4127097706953591e-06, 'epoch': 0.65} 65%|██████▌ | 8052/12313 [6:01:32<3:10:04, 2.68s/it] 65%|██████▌ | 8053/12313 [6:01:35<3:11:12, 2.69s/it] {'loss': 0.3856, 'grad_norm': 6.660882673507178, 'learning_rate': 1.4121176386326352e-06, 'epoch': 0.65} 65%|██████▌ | 8053/12313 [6:01:35<3:11:12, 2.69s/it] 65%|██████▌ | 8054/12313 [6:01:37<3:10:15, 2.68s/it] {'loss': 0.5311, 'grad_norm': 4.011343252532216, 'learning_rate': 1.4115255818456138e-06, 'epoch': 0.65} 65%|██████▌ | 8054/12313 [6:01:37<3:10:15, 2.68s/it] 65%|██████▌ | 8055/12313 [6:01:40<3:04:39, 2.60s/it] {'loss': 0.4813, 'grad_norm': 3.3532310150002513, 'learning_rate': 1.4109336003752619e-06, 'epoch': 0.65} 65%|██████▌ | 8055/12313 [6:01:40<3:04:39, 2.60s/it] 65%|██████▌ | 8056/12313 [6:01:42<3:04:52, 2.61s/it] {'loss': 0.4928, 'grad_norm': 4.86729352164219, 'learning_rate': 1.4103416942625397e-06, 'epoch': 0.65} 65%|██████▌ | 8056/12313 [6:01:42<3:04:52, 2.61s/it] 65%|██████▌ | 8057/12313 [6:01:45<3:05:58, 2.62s/it] {'loss': 0.447, 'grad_norm': 3.96263482227693, 'learning_rate': 1.4097498635484057e-06, 'epoch': 0.65} 65%|██████▌ | 8057/12313 [6:01:45<3:05:58, 2.62s/it] 65%|██████▌ | 8058/12313 [6:01:48<3:06:14, 2.63s/it] {'loss': 0.3607, 'grad_norm': 5.204864204379556, 'learning_rate': 1.4091581082738122e-06, 'epoch': 0.65} 65%|██████▌ | 8058/12313 [6:01:48<3:06:14, 2.63s/it] 65%|██████▌ | 8059/12313 [6:01:50<3:13:21, 2.73s/it] {'loss': 0.448, 'grad_norm': 4.292415805341833, 'learning_rate': 1.4085664284797041e-06, 'epoch': 0.65} 65%|██████▌ | 8059/12313 [6:01:50<3:13:21, 2.73s/it] 65%|██████▌ | 8060/12313 [6:01:53<3:15:57, 2.76s/it] {'loss': 0.6328, 'grad_norm': 7.880826154296216, 'learning_rate': 1.407974824207022e-06, 'epoch': 0.65} 65%|██████▌ | 8060/12313 [6:01:53<3:15:57, 2.76s/it] 65%|██████▌ | 8061/12313 [6:01:56<3:15:34, 2.76s/it] {'loss': 0.4029, 'grad_norm': 5.310250315901305, 'learning_rate': 1.4073832954967032e-06, 'epoch': 0.65} 65%|██████▌ | 8061/12313 [6:01:56<3:15:34, 2.76s/it] 65%|██████▌ | 8062/12313 [6:01:59<3:13:38, 2.73s/it] {'loss': 0.5177, 'grad_norm': 6.728553995913191, 'learning_rate': 1.406791842389677e-06, 'epoch': 0.65} 65%|██████▌ | 8062/12313 [6:01:59<3:13:38, 2.73s/it] 65%|██████▌ | 8063/12313 [6:02:02<3:19:54, 2.82s/it] {'loss': 0.4751, 'grad_norm': 5.981311174156778, 'learning_rate': 1.4062004649268696e-06, 'epoch': 0.65} 65%|██████▌ | 8063/12313 [6:02:02<3:19:54, 2.82s/it] 65%|██████▌ | 8064/12313 [6:02:05<3:27:16, 2.93s/it] {'loss': 0.4305, 'grad_norm': 6.024059197477494, 'learning_rate': 1.405609163149202e-06, 'epoch': 0.65} 65%|██████▌ | 8064/12313 [6:02:05<3:27:16, 2.93s/it] 65%|██████▌ | 8065/12313 [6:02:08<3:23:08, 2.87s/it] {'loss': 0.4975, 'grad_norm': 7.798448146594784, 'learning_rate': 1.4050179370975886e-06, 'epoch': 0.65} 65%|██████▌ | 8065/12313 [6:02:08<3:23:08, 2.87s/it] 66%|██████▌ | 8066/12313 [6:02:10<3:15:20, 2.76s/it] {'loss': 0.4111, 'grad_norm': 4.4502104987754105, 'learning_rate': 1.4044267868129374e-06, 'epoch': 0.66} 66%|██████▌ | 8066/12313 [6:02:10<3:15:20, 2.76s/it] 66%|██████▌ | 8067/12313 [6:02:13<3:15:57, 2.77s/it] {'loss': 0.5769, 'grad_norm': 3.807078310048636, 'learning_rate': 1.4038357123361556e-06, 'epoch': 0.66} 66%|██████▌ | 8067/12313 [6:02:13<3:15:57, 2.77s/it] 66%|██████▌ | 8068/12313 [6:02:16<3:14:02, 2.74s/it] {'loss': 0.5461, 'grad_norm': 4.154767587614258, 'learning_rate': 1.4032447137081414e-06, 'epoch': 0.66} 66%|██████▌ | 8068/12313 [6:02:16<3:14:02, 2.74s/it] 66%|██████▌ | 8069/12313 [6:02:18<3:09:03, 2.67s/it] {'loss': 0.5508, 'grad_norm': 5.8170559826468935, 'learning_rate': 1.4026537909697873e-06, 'epoch': 0.66} 66%|██████▌ | 8069/12313 [6:02:18<3:09:03, 2.67s/it] 66%|██████▌ | 8070/12313 [6:02:21<3:08:11, 2.66s/it] {'loss': 0.4174, 'grad_norm': 6.732343822655561, 'learning_rate': 1.4020629441619831e-06, 'epoch': 0.66} 66%|██████▌ | 8070/12313 [6:02:21<3:08:11, 2.66s/it] 66%|██████▌ | 8071/12313 [6:02:24<3:10:20, 2.69s/it] {'loss': 0.3694, 'grad_norm': 15.814471928424329, 'learning_rate': 1.4014721733256137e-06, 'epoch': 0.66} 66%|██████▌ | 8071/12313 [6:02:24<3:10:20, 2.69s/it] 66%|██████▌ | 8072/12313 [6:02:26<3:11:23, 2.71s/it] {'loss': 0.5148, 'grad_norm': 5.728680709044823, 'learning_rate': 1.4008814785015548e-06, 'epoch': 0.66} 66%|██████▌ | 8072/12313 [6:02:26<3:11:23, 2.71s/it] 66%|██████▌ | 8073/12313 [6:02:29<3:10:21, 2.69s/it] {'loss': 0.3364, 'grad_norm': 5.735426011218967, 'learning_rate': 1.4002908597306817e-06, 'epoch': 0.66} 66%|██████▌ | 8073/12313 [6:02:29<3:10:21, 2.69s/it] 66%|██████▌ | 8074/12313 [6:02:32<3:13:13, 2.74s/it] {'loss': 0.4943, 'grad_norm': 3.8279642749863916, 'learning_rate': 1.3997003170538608e-06, 'epoch': 0.66} 66%|██████▌ | 8074/12313 [6:02:32<3:13:13, 2.74s/it] 66%|██████▌ | 8075/12313 [6:02:34<3:10:55, 2.70s/it] {'loss': 0.6124, 'grad_norm': 8.191610658404883, 'learning_rate': 1.3991098505119537e-06, 'epoch': 0.66} 66%|██████▌ | 8075/12313 [6:02:34<3:10:55, 2.70s/it] 66%|██████▌ | 8076/12313 [6:02:37<3:12:56, 2.73s/it] {'loss': 0.6547, 'grad_norm': 22.562198626943964, 'learning_rate': 1.3985194601458192e-06, 'epoch': 0.66} 66%|██████▌ | 8076/12313 [6:02:37<3:12:56, 2.73s/it] 66%|██████▌ | 8077/12313 [6:02:40<3:08:27, 2.67s/it] {'loss': 0.2968, 'grad_norm': 5.841079008969222, 'learning_rate': 1.3979291459963087e-06, 'epoch': 0.66} 66%|██████▌ | 8077/12313 [6:02:40<3:08:27, 2.67s/it] 66%|██████▌ | 8078/12313 [6:02:42<3:04:42, 2.62s/it] {'loss': 0.43, 'grad_norm': 3.714067743777654, 'learning_rate': 1.397338908104269e-06, 'epoch': 0.66} 66%|██████▌ | 8078/12313 [6:02:42<3:04:42, 2.62s/it] 66%|██████▌ | 8079/12313 [6:02:45<3:11:00, 2.71s/it] {'loss': 0.643, 'grad_norm': 4.366280468082913, 'learning_rate': 1.3967487465105401e-06, 'epoch': 0.66} 66%|██████▌ | 8079/12313 [6:02:45<3:11:00, 2.71s/it] 66%|██████▌ | 8080/12313 [6:02:48<3:13:25, 2.74s/it] {'loss': 0.6495, 'grad_norm': 4.043858317470031, 'learning_rate': 1.3961586612559602e-06, 'epoch': 0.66} 66%|██████▌ | 8080/12313 [6:02:48<3:13:25, 2.74s/it] 66%|██████▌ | 8081/12313 [6:02:51<3:12:25, 2.73s/it] {'loss': 0.3702, 'grad_norm': 4.59900089958901, 'learning_rate': 1.3955686523813588e-06, 'epoch': 0.66} 66%|██████▌ | 8081/12313 [6:02:51<3:12:25, 2.73s/it] 66%|██████▌ | 8082/12313 [6:02:54<3:15:01, 2.77s/it] {'loss': 0.5347, 'grad_norm': 6.449912994740261, 'learning_rate': 1.3949787199275606e-06, 'epoch': 0.66} 66%|██████▌ | 8082/12313 [6:02:54<3:15:01, 2.77s/it] 66%|██████▌ | 8083/12313 [6:02:56<3:11:31, 2.72s/it] {'loss': 0.4742, 'grad_norm': 5.373213623100858, 'learning_rate': 1.3943888639353866e-06, 'epoch': 0.66} 66%|██████▌ | 8083/12313 [6:02:56<3:11:31, 2.72s/it] 66%|██████▌ | 8084/12313 [6:02:59<3:09:26, 2.69s/it] {'loss': 0.4213, 'grad_norm': 5.296870374063085, 'learning_rate': 1.3937990844456528e-06, 'epoch': 0.66} 66%|██████▌ | 8084/12313 [6:02:59<3:09:26, 2.69s/it] 66%|██████▌ | 8085/12313 [6:03:01<3:09:49, 2.69s/it] {'loss': 0.3738, 'grad_norm': 3.8879241039555734, 'learning_rate': 1.393209381499167e-06, 'epoch': 0.66} 66%|██████▌ | 8085/12313 [6:03:01<3:09:49, 2.69s/it] 66%|██████▌ | 8086/12313 [6:03:04<3:05:02, 2.63s/it] {'loss': 0.4136, 'grad_norm': 4.289208191857055, 'learning_rate': 1.3926197551367355e-06, 'epoch': 0.66} 66%|██████▌ | 8086/12313 [6:03:04<3:05:02, 2.63s/it] 66%|██████▌ | 8087/12313 [6:03:07<3:08:15, 2.67s/it] {'loss': 0.3542, 'grad_norm': 5.234771066049035, 'learning_rate': 1.3920302053991564e-06, 'epoch': 0.66} 66%|██████▌ | 8087/12313 [6:03:07<3:08:15, 2.67s/it] 66%|██████▌ | 8088/12313 [6:03:09<3:09:10, 2.69s/it] {'loss': 0.4513, 'grad_norm': 5.571643681394787, 'learning_rate': 1.3914407323272216e-06, 'epoch': 0.66} 66%|██████▌ | 8088/12313 [6:03:09<3:09:10, 2.69s/it] 66%|██████▌ | 8089/12313 [6:03:12<3:05:19, 2.63s/it] {'loss': 0.5061, 'grad_norm': 7.213207108596739, 'learning_rate': 1.3908513359617217e-06, 'epoch': 0.66} 66%|██████▌ | 8089/12313 [6:03:12<3:05:19, 2.63s/it] 66%|██████▌ | 8090/12313 [6:03:15<3:07:20, 2.66s/it] {'loss': 0.3634, 'grad_norm': 4.780947638090955, 'learning_rate': 1.39026201634344e-06, 'epoch': 0.66} 66%|██████▌ | 8090/12313 [6:03:15<3:07:20, 2.66s/it] 66%|██████▌ | 8091/12313 [6:03:18<3:14:24, 2.76s/it] {'loss': 0.6613, 'grad_norm': 3.0166870867114053, 'learning_rate': 1.3896727735131538e-06, 'epoch': 0.66} 66%|██████▌ | 8091/12313 [6:03:18<3:14:24, 2.76s/it] 66%|██████▌ | 8092/12313 [6:03:21<3:18:09, 2.82s/it] {'loss': 0.4642, 'grad_norm': 3.8519737800090974, 'learning_rate': 1.3890836075116343e-06, 'epoch': 0.66} 66%|██████▌ | 8092/12313 [6:03:21<3:18:09, 2.82s/it] 66%|██████▌ | 8093/12313 [6:03:23<3:13:07, 2.75s/it] {'loss': 0.5912, 'grad_norm': 4.334939110181824, 'learning_rate': 1.3884945183796505e-06, 'epoch': 0.66} 66%|██████▌ | 8093/12313 [6:03:23<3:13:07, 2.75s/it] 66%|██████▌ | 8094/12313 [6:03:26<3:07:17, 2.66s/it] {'loss': 0.7303, 'grad_norm': 6.294374675194089, 'learning_rate': 1.3879055061579635e-06, 'epoch': 0.66} 66%|██████▌ | 8094/12313 [6:03:26<3:07:17, 2.66s/it] 66%|██████▌ | 8095/12313 [6:03:28<3:01:50, 2.59s/it] {'loss': 0.5085, 'grad_norm': 3.7598306069040546, 'learning_rate': 1.3873165708873286e-06, 'epoch': 0.66} 66%|██████▌ | 8095/12313 [6:03:28<3:01:50, 2.59s/it] 66%|██████▌ | 8096/12313 [6:03:31<3:03:50, 2.62s/it] {'loss': 0.5744, 'grad_norm': 6.863418465636099, 'learning_rate': 1.3867277126084989e-06, 'epoch': 0.66} 66%|██████▌ | 8096/12313 [6:03:31<3:03:50, 2.62s/it] 66%|██████▌ | 8097/12313 [6:03:34<3:09:28, 2.70s/it] {'loss': 0.5576, 'grad_norm': 7.054233776542322, 'learning_rate': 1.3861389313622197e-06, 'epoch': 0.66} 66%|██████▌ | 8097/12313 [6:03:34<3:09:28, 2.70s/it] 66%|██████▌ | 8098/12313 [6:03:36<3:07:40, 2.67s/it] {'loss': 0.423, 'grad_norm': 4.4365820824808555, 'learning_rate': 1.3855502271892313e-06, 'epoch': 0.66} 66%|██████▌ | 8098/12313 [6:03:36<3:07:40, 2.67s/it] 66%|██████▌ | 8099/12313 [6:03:39<3:08:49, 2.69s/it] {'loss': 0.4621, 'grad_norm': 6.862928291636716, 'learning_rate': 1.3849616001302696e-06, 'epoch': 0.66} 66%|██████▌ | 8099/12313 [6:03:39<3:08:49, 2.69s/it] 66%|██████▌ | 8100/12313 [6:03:41<3:03:28, 2.61s/it] {'loss': 0.4926, 'grad_norm': 5.667668412588925, 'learning_rate': 1.3843730502260639e-06, 'epoch': 0.66} 66%|██████▌ | 8100/12313 [6:03:41<3:03:28, 2.61s/it] 66%|██████▌ | 8101/12313 [6:03:44<3:04:36, 2.63s/it] {'loss': 0.6139, 'grad_norm': 4.227345644846646, 'learning_rate': 1.3837845775173375e-06, 'epoch': 0.66} 66%|██████▌ | 8101/12313 [6:03:44<3:04:36, 2.63s/it] 66%|██████▌ | 8102/12313 [6:03:47<3:03:56, 2.62s/it] {'loss': 0.5144, 'grad_norm': 4.744805542732661, 'learning_rate': 1.383196182044811e-06, 'epoch': 0.66} 66%|██████▌ | 8102/12313 [6:03:47<3:03:56, 2.62s/it] 66%|██████▌ | 8103/12313 [6:03:49<3:05:41, 2.65s/it] {'loss': 0.561, 'grad_norm': 3.999861099199686, 'learning_rate': 1.3826078638491994e-06, 'epoch': 0.66} 66%|██████▌ | 8103/12313 [6:03:49<3:05:41, 2.65s/it] 66%|██████▌ | 8104/12313 [6:03:52<3:05:41, 2.65s/it] {'loss': 0.4381, 'grad_norm': 4.070333113027189, 'learning_rate': 1.3820196229712085e-06, 'epoch': 0.66} 66%|██████▌ | 8104/12313 [6:03:52<3:05:41, 2.65s/it] 66%|██████▌ | 8105/12313 [6:03:55<3:04:14, 2.63s/it] {'loss': 0.6255, 'grad_norm': 6.884502103701524, 'learning_rate': 1.3814314594515443e-06, 'epoch': 0.66} 66%|██████▌ | 8105/12313 [6:03:55<3:04:14, 2.63s/it] 66%|██████▌ | 8106/12313 [6:03:57<3:04:20, 2.63s/it] {'loss': 0.548, 'grad_norm': 4.918735322058878, 'learning_rate': 1.3808433733309028e-06, 'epoch': 0.66} 66%|██████▌ | 8106/12313 [6:03:57<3:04:20, 2.63s/it] 66%|██████▌ | 8107/12313 [6:04:00<3:08:20, 2.69s/it] {'loss': 0.3939, 'grad_norm': 16.27622530232076, 'learning_rate': 1.380255364649976e-06, 'epoch': 0.66} 66%|██████▌ | 8107/12313 [6:04:00<3:08:20, 2.69s/it] 66%|██████▌ | 8108/12313 [6:04:03<3:07:30, 2.68s/it] {'loss': 0.581, 'grad_norm': 10.25631338164915, 'learning_rate': 1.3796674334494529e-06, 'epoch': 0.66} 66%|██████▌ | 8108/12313 [6:04:03<3:07:30, 2.68s/it] 66%|██████▌ | 8109/12313 [6:04:05<3:06:28, 2.66s/it] {'loss': 0.5879, 'grad_norm': 5.903422438123663, 'learning_rate': 1.3790795797700129e-06, 'epoch': 0.66} 66%|██████▌ | 8109/12313 [6:04:05<3:06:28, 2.66s/it] 66%|██████▌ | 8110/12313 [6:04:08<3:07:13, 2.67s/it] {'loss': 0.4939, 'grad_norm': 5.425671652540845, 'learning_rate': 1.3784918036523346e-06, 'epoch': 0.66} 66%|██████▌ | 8110/12313 [6:04:08<3:07:13, 2.67s/it] 66%|██████▌ | 8111/12313 [6:04:11<3:09:57, 2.71s/it] {'loss': 0.5689, 'grad_norm': 6.259094931538714, 'learning_rate': 1.377904105137087e-06, 'epoch': 0.66} 66%|██████▌ | 8111/12313 [6:04:11<3:09:57, 2.71s/it] 66%|██████▌ | 8112/12313 [6:04:14<3:09:03, 2.70s/it] {'loss': 0.4444, 'grad_norm': 5.861839266316476, 'learning_rate': 1.3773164842649377e-06, 'epoch': 0.66} 66%|██████▌ | 8112/12313 [6:04:14<3:09:03, 2.70s/it] 66%|██████▌ | 8113/12313 [6:04:16<3:10:04, 2.72s/it] {'loss': 0.3919, 'grad_norm': 6.629921332417931, 'learning_rate': 1.376728941076546e-06, 'epoch': 0.66} 66%|██████▌ | 8113/12313 [6:04:16<3:10:04, 2.72s/it] 66%|██████▌ | 8114/12313 [6:04:19<3:09:34, 2.71s/it] {'loss': 0.5354, 'grad_norm': 6.496651649704183, 'learning_rate': 1.3761414756125658e-06, 'epoch': 0.66} 66%|██████▌ | 8114/12313 [6:04:19<3:09:34, 2.71s/it] 66%|██████▌ | 8115/12313 [6:04:22<3:06:33, 2.67s/it] {'loss': 0.5213, 'grad_norm': 4.3858502738087655, 'learning_rate': 1.3755540879136474e-06, 'epoch': 0.66} 66%|██████▌ | 8115/12313 [6:04:22<3:06:33, 2.67s/it] 66%|██████▌ | 8116/12313 [6:04:24<3:06:37, 2.67s/it] {'loss': 0.5628, 'grad_norm': 5.705183500815472, 'learning_rate': 1.3749667780204365e-06, 'epoch': 0.66} 66%|██████▌ | 8116/12313 [6:04:24<3:06:37, 2.67s/it] 66%|██████▌ | 8117/12313 [6:04:27<3:03:26, 2.62s/it] {'loss': 0.518, 'grad_norm': 3.5828316864822236, 'learning_rate': 1.3743795459735692e-06, 'epoch': 0.66} 66%|██████▌ | 8117/12313 [6:04:27<3:03:26, 2.62s/it] 66%|██████▌ | 8118/12313 [6:04:29<3:05:10, 2.65s/it] {'loss': 0.6037, 'grad_norm': 4.1843405148298025, 'learning_rate': 1.373792391813681e-06, 'epoch': 0.66} 66%|██████▌ | 8118/12313 [6:04:29<3:05:10, 2.65s/it] 66%|██████▌ | 8119/12313 [6:04:32<3:05:35, 2.66s/it] {'loss': 0.3632, 'grad_norm': 5.543134932829188, 'learning_rate': 1.3732053155813987e-06, 'epoch': 0.66} 66%|██████▌ | 8119/12313 [6:04:32<3:05:35, 2.66s/it] 66%|██████▌ | 8120/12313 [6:04:35<3:07:35, 2.68s/it] {'loss': 0.4956, 'grad_norm': 3.9215981353205938, 'learning_rate': 1.3726183173173441e-06, 'epoch': 0.66} 66%|██████▌ | 8120/12313 [6:04:35<3:07:35, 2.68s/it] 66%|██████▌ | 8121/12313 [6:04:37<3:03:36, 2.63s/it] {'loss': 0.479, 'grad_norm': 6.56518519854498, 'learning_rate': 1.3720313970621369e-06, 'epoch': 0.66} 66%|██████▌ | 8121/12313 [6:04:37<3:03:36, 2.63s/it] 66%|██████▌ | 8122/12313 [6:04:40<3:09:11, 2.71s/it] {'loss': 0.5228, 'grad_norm': 8.721756176365712, 'learning_rate': 1.3714445548563856e-06, 'epoch': 0.66} 66%|██████▌ | 8122/12313 [6:04:40<3:09:11, 2.71s/it] 66%|██████▌ | 8123/12313 [6:04:43<3:11:08, 2.74s/it] {'loss': 0.5035, 'grad_norm': 6.414976174126078, 'learning_rate': 1.3708577907406988e-06, 'epoch': 0.66} 66%|██████▌ | 8123/12313 [6:04:43<3:11:08, 2.74s/it] 66%|██████▌ | 8124/12313 [6:04:46<3:11:56, 2.75s/it] {'loss': 0.4776, 'grad_norm': 5.440110547817854, 'learning_rate': 1.3702711047556777e-06, 'epoch': 0.66} 66%|██████▌ | 8124/12313 [6:04:46<3:11:56, 2.75s/it] 66%|██████▌ | 8125/12313 [6:04:49<3:10:15, 2.73s/it] {'loss': 0.3457, 'grad_norm': 5.614130528625265, 'learning_rate': 1.3696844969419174e-06, 'epoch': 0.66} 66%|██████▌ | 8125/12313 [6:04:49<3:10:15, 2.73s/it] 66%|██████▌ | 8126/12313 [6:04:51<3:12:35, 2.76s/it] {'loss': 0.445, 'grad_norm': 4.022888633337103, 'learning_rate': 1.3690979673400067e-06, 'epoch': 0.66} 66%|██████▌ | 8126/12313 [6:04:51<3:12:35, 2.76s/it] 66%|██████▌ | 8127/12313 [6:04:54<3:11:20, 2.74s/it] {'loss': 0.3523, 'grad_norm': 3.741552498785602, 'learning_rate': 1.3685115159905325e-06, 'epoch': 0.66} 66%|██████▌ | 8127/12313 [6:04:54<3:11:20, 2.74s/it] 66%|██████▌ | 8128/12313 [6:04:57<3:17:01, 2.82s/it] {'loss': 0.738, 'grad_norm': 5.950815202545946, 'learning_rate': 1.3679251429340717e-06, 'epoch': 0.66} 66%|██████▌ | 8128/12313 [6:04:57<3:17:01, 2.82s/it] 66%|██████▌ | 8129/12313 [6:05:00<3:09:57, 2.72s/it] {'loss': 0.4257, 'grad_norm': 9.51105805660359, 'learning_rate': 1.367338848211201e-06, 'epoch': 0.66} 66%|██████▌ | 8129/12313 [6:05:00<3:09:57, 2.72s/it] 66%|██████▌ | 8130/12313 [6:05:02<3:06:58, 2.68s/it] {'loss': 0.4203, 'grad_norm': 2.9131542456623314, 'learning_rate': 1.3667526318624862e-06, 'epoch': 0.66} 66%|██████▌ | 8130/12313 [6:05:02<3:06:58, 2.68s/it] 66%|██████▌ | 8131/12313 [6:05:05<3:12:45, 2.77s/it] {'loss': 0.6132, 'grad_norm': 6.754300006832807, 'learning_rate': 1.366166493928493e-06, 'epoch': 0.66} 66%|██████▌ | 8131/12313 [6:05:05<3:12:45, 2.77s/it] 66%|██████▌ | 8132/12313 [6:05:08<3:09:19, 2.72s/it] {'loss': 0.4595, 'grad_norm': 11.724662577057464, 'learning_rate': 1.3655804344497775e-06, 'epoch': 0.66} 66%|██████▌ | 8132/12313 [6:05:08<3:09:19, 2.72s/it] 66%|██████▌ | 8133/12313 [6:05:10<3:05:10, 2.66s/it] {'loss': 0.4081, 'grad_norm': 5.387659360154549, 'learning_rate': 1.364994453466891e-06, 'epoch': 0.66} 66%|██████▌ | 8133/12313 [6:05:10<3:05:10, 2.66s/it] 66%|██████▌ | 8134/12313 [6:05:13<3:06:36, 2.68s/it] {'loss': 0.4898, 'grad_norm': 5.107977084416256, 'learning_rate': 1.3644085510203813e-06, 'epoch': 0.66} 66%|██████▌ | 8134/12313 [6:05:13<3:06:36, 2.68s/it] 66%|██████▌ | 8135/12313 [6:05:16<3:03:58, 2.64s/it] {'loss': 0.5417, 'grad_norm': 3.8833833190727183, 'learning_rate': 1.363822727150791e-06, 'epoch': 0.66} 66%|██████▌ | 8135/12313 [6:05:16<3:03:58, 2.64s/it] 66%|██████▌ | 8136/12313 [6:05:18<3:03:22, 2.63s/it] {'loss': 0.4968, 'grad_norm': 7.8141538363601795, 'learning_rate': 1.363236981898654e-06, 'epoch': 0.66} 66%|██████▌ | 8136/12313 [6:05:18<3:03:22, 2.63s/it] 66%|██████▌ | 8137/12313 [6:05:21<3:00:19, 2.59s/it] {'loss': 0.5024, 'grad_norm': 3.590897276111095, 'learning_rate': 1.3626513153045024e-06, 'epoch': 0.66} 66%|██████▌ | 8137/12313 [6:05:21<3:00:19, 2.59s/it] 66%|██████▌ | 8138/12313 [6:05:23<3:01:41, 2.61s/it] {'loss': 0.4954, 'grad_norm': 3.612475255145642, 'learning_rate': 1.3620657274088606e-06, 'epoch': 0.66} 66%|██████▌ | 8138/12313 [6:05:23<3:01:41, 2.61s/it] 66%|██████▌ | 8139/12313 [6:05:26<3:03:41, 2.64s/it] {'loss': 0.668, 'grad_norm': 14.567400918143866, 'learning_rate': 1.3614802182522469e-06, 'epoch': 0.66} 66%|██████▌ | 8139/12313 [6:05:26<3:03:41, 2.64s/it] 66%|██████▌ | 8140/12313 [6:05:29<3:05:27, 2.67s/it] {'loss': 0.4346, 'grad_norm': 11.017893005217958, 'learning_rate': 1.3608947878751777e-06, 'epoch': 0.66} 66%|██████▌ | 8140/12313 [6:05:29<3:05:27, 2.67s/it] 66%|██████▌ | 8141/12313 [6:05:31<3:04:18, 2.65s/it] {'loss': 0.5231, 'grad_norm': 3.7950075970163617, 'learning_rate': 1.3603094363181596e-06, 'epoch': 0.66} 66%|██████▌ | 8141/12313 [6:05:31<3:04:18, 2.65s/it] 66%|██████▌ | 8142/12313 [6:05:34<3:09:19, 2.72s/it] {'loss': 0.6696, 'grad_norm': 7.66686320728164, 'learning_rate': 1.3597241636216965e-06, 'epoch': 0.66} 66%|██████▌ | 8142/12313 [6:05:34<3:09:19, 2.72s/it] 66%|██████▌ | 8143/12313 [6:05:37<3:13:51, 2.79s/it] {'loss': 0.5376, 'grad_norm': 5.608044387753557, 'learning_rate': 1.3591389698262875e-06, 'epoch': 0.66} 66%|██████▌ | 8143/12313 [6:05:37<3:13:51, 2.79s/it] 66%|██████▌ | 8144/12313 [6:05:40<3:16:44, 2.83s/it] {'loss': 0.526, 'grad_norm': 4.03444037697512, 'learning_rate': 1.3585538549724242e-06, 'epoch': 0.66} 66%|██████▌ | 8144/12313 [6:05:40<3:16:44, 2.83s/it] 66%|██████▌ | 8145/12313 [6:05:42<3:07:25, 2.70s/it] {'loss': 0.5056, 'grad_norm': 5.082198135850972, 'learning_rate': 1.3579688191005926e-06, 'epoch': 0.66} 66%|██████▌ | 8145/12313 [6:05:43<3:07:25, 2.70s/it] 66%|██████▌ | 8146/12313 [6:05:45<3:07:17, 2.70s/it] {'loss': 0.6148, 'grad_norm': 3.3444467710171346, 'learning_rate': 1.3573838622512743e-06, 'epoch': 0.66} 66%|██████▌ | 8146/12313 [6:05:45<3:07:17, 2.70s/it] 66%|██████▌ | 8147/12313 [6:05:48<3:05:11, 2.67s/it] {'loss': 0.5947, 'grad_norm': 4.212117525236555, 'learning_rate': 1.3567989844649448e-06, 'epoch': 0.66} 66%|██████▌ | 8147/12313 [6:05:48<3:05:11, 2.67s/it] 66%|██████▌ | 8148/12313 [6:05:50<3:05:27, 2.67s/it] {'loss': 0.5863, 'grad_norm': 5.234823635180376, 'learning_rate': 1.3562141857820765e-06, 'epoch': 0.66} 66%|██████▌ | 8148/12313 [6:05:50<3:05:27, 2.67s/it] 66%|██████▌ | 8149/12313 [6:05:53<3:05:56, 2.68s/it] {'loss': 0.4493, 'grad_norm': 9.716406215450048, 'learning_rate': 1.3556294662431325e-06, 'epoch': 0.66} 66%|██████▌ | 8149/12313 [6:05:53<3:05:56, 2.68s/it] 66%|██████▌ | 8150/12313 [6:05:56<3:12:47, 2.78s/it] {'loss': 0.4364, 'grad_norm': 5.722283978029896, 'learning_rate': 1.3550448258885734e-06, 'epoch': 0.66} 66%|██████▌ | 8150/12313 [6:05:56<3:12:47, 2.78s/it] 66%|██████▌ | 8151/12313 [6:05:59<3:10:32, 2.75s/it] {'loss': 0.742, 'grad_norm': 4.763814608798664, 'learning_rate': 1.3544602647588528e-06, 'epoch': 0.66} 66%|██████▌ | 8151/12313 [6:05:59<3:10:32, 2.75s/it] 66%|██████▌ | 8152/12313 [6:06:02<3:09:12, 2.73s/it] {'loss': 0.4423, 'grad_norm': 5.80232988441981, 'learning_rate': 1.3538757828944188e-06, 'epoch': 0.66} 66%|██████▌ | 8152/12313 [6:06:02<3:09:12, 2.73s/it] 66%|██████▌ | 8153/12313 [6:06:04<3:11:39, 2.76s/it] {'loss': 0.5588, 'grad_norm': 7.5957193433728145, 'learning_rate': 1.353291380335715e-06, 'epoch': 0.66} 66%|██████▌ | 8153/12313 [6:06:04<3:11:39, 2.76s/it] 66%|██████▌ | 8154/12313 [6:06:07<3:05:24, 2.67s/it] {'loss': 0.5969, 'grad_norm': 4.336543165787254, 'learning_rate': 1.3527070571231786e-06, 'epoch': 0.66} 66%|██████▌ | 8154/12313 [6:06:07<3:05:24, 2.67s/it] 66%|██████▌ | 8155/12313 [6:06:09<3:04:36, 2.66s/it] {'loss': 0.4803, 'grad_norm': 4.034182275364077, 'learning_rate': 1.3521228132972414e-06, 'epoch': 0.66} 66%|██████▌ | 8155/12313 [6:06:09<3:04:36, 2.66s/it] 66%|██████▌ | 8156/12313 [6:06:12<3:02:05, 2.63s/it] {'loss': 0.4521, 'grad_norm': 6.6167843827609865, 'learning_rate': 1.3515386488983317e-06, 'epoch': 0.66} 66%|██████▌ | 8156/12313 [6:06:12<3:02:05, 2.63s/it] 66%|██████▌ | 8157/12313 [6:06:14<2:56:48, 2.55s/it] {'loss': 0.3106, 'grad_norm': 7.161780111206926, 'learning_rate': 1.3509545639668691e-06, 'epoch': 0.66} 66%|██████▌ | 8157/12313 [6:06:14<2:56:48, 2.55s/it] 66%|██████▋ | 8158/12313 [6:06:17<2:58:04, 2.57s/it] {'loss': 0.4119, 'grad_norm': 5.4096275250827945, 'learning_rate': 1.3503705585432687e-06, 'epoch': 0.66} 66%|██████▋ | 8158/12313 [6:06:17<2:58:04, 2.57s/it] 66%|██████▋ | 8159/12313 [6:06:19<2:55:40, 2.54s/it] {'loss': 0.6203, 'grad_norm': 7.137614931386881, 'learning_rate': 1.349786632667942e-06, 'epoch': 0.66} 66%|██████▋ | 8159/12313 [6:06:19<2:55:40, 2.54s/it] 66%|██████▋ | 8160/12313 [6:06:22<3:00:50, 2.61s/it] {'loss': 0.749, 'grad_norm': 4.038451931034832, 'learning_rate': 1.3492027863812924e-06, 'epoch': 0.66} 66%|██████▋ | 8160/12313 [6:06:22<3:00:50, 2.61s/it] 66%|██████▋ | 8161/12313 [6:06:25<3:03:13, 2.65s/it] {'loss': 0.5766, 'grad_norm': 4.101256157779943, 'learning_rate': 1.3486190197237189e-06, 'epoch': 0.66} 66%|██████▋ | 8161/12313 [6:06:25<3:03:13, 2.65s/it] 66%|██████▋ | 8162/12313 [6:06:28<3:07:45, 2.71s/it] {'loss': 0.3621, 'grad_norm': 3.827717281308639, 'learning_rate': 1.348035332735617e-06, 'epoch': 0.66} 66%|██████▋ | 8162/12313 [6:06:28<3:07:45, 2.71s/it] 66%|██████▋ | 8163/12313 [6:06:31<3:11:24, 2.77s/it] {'loss': 0.6257, 'grad_norm': 3.2797754851269856, 'learning_rate': 1.3474517254573731e-06, 'epoch': 0.66} 66%|██████▋ | 8163/12313 [6:06:31<3:11:24, 2.77s/it] 66%|██████▋ | 8164/12313 [6:06:33<3:08:25, 2.72s/it] {'loss': 0.413, 'grad_norm': 7.6698914171017245, 'learning_rate': 1.3468681979293702e-06, 'epoch': 0.66} 66%|██████▋ | 8164/12313 [6:06:33<3:08:25, 2.72s/it] 66%|██████▋ | 8165/12313 [6:06:36<3:04:37, 2.67s/it] {'loss': 0.3934, 'grad_norm': 6.602624984901768, 'learning_rate': 1.3462847501919843e-06, 'epoch': 0.66} 66%|██████▋ | 8165/12313 [6:06:36<3:04:37, 2.67s/it] 66%|██████▋ | 8166/12313 [6:06:39<3:04:06, 2.66s/it] {'loss': 0.7646, 'grad_norm': 7.1560351882255215, 'learning_rate': 1.3457013822855886e-06, 'epoch': 0.66} 66%|██████▋ | 8166/12313 [6:06:39<3:04:06, 2.66s/it] 66%|██████▋ | 8167/12313 [6:06:41<3:04:54, 2.68s/it] {'loss': 0.408, 'grad_norm': 6.274238416078604, 'learning_rate': 1.345118094250547e-06, 'epoch': 0.66} 66%|██████▋ | 8167/12313 [6:06:41<3:04:54, 2.68s/it] 66%|██████▋ | 8168/12313 [6:06:44<3:05:24, 2.68s/it] {'loss': 0.4495, 'grad_norm': 4.400441017779448, 'learning_rate': 1.3445348861272217e-06, 'epoch': 0.66} 66%|██████▋ | 8168/12313 [6:06:44<3:05:24, 2.68s/it] 66%|██████▋ | 8169/12313 [6:06:47<3:06:41, 2.70s/it] {'loss': 0.3428, 'grad_norm': 4.941239230227576, 'learning_rate': 1.3439517579559675e-06, 'epoch': 0.66} 66%|██████▋ | 8169/12313 [6:06:47<3:06:41, 2.70s/it] 66%|██████▋ | 8170/12313 [6:06:49<3:04:56, 2.68s/it] {'loss': 0.4209, 'grad_norm': 6.964343826754793, 'learning_rate': 1.3433687097771337e-06, 'epoch': 0.66} 66%|██████▋ | 8170/12313 [6:06:49<3:04:56, 2.68s/it] 66%|██████▋ | 8171/12313 [6:06:52<2:59:33, 2.60s/it] {'loss': 0.5785, 'grad_norm': 5.512709601692857, 'learning_rate': 1.3427857416310626e-06, 'epoch': 0.66} 66%|██████▋ | 8171/12313 [6:06:52<2:59:33, 2.60s/it] 66%|██████▋ | 8172/12313 [6:06:54<2:58:38, 2.59s/it] {'loss': 0.4546, 'grad_norm': 4.294567753627923, 'learning_rate': 1.3422028535580947e-06, 'epoch': 0.66} 66%|██████▋ | 8172/12313 [6:06:54<2:58:38, 2.59s/it] 66%|██████▋ | 8173/12313 [6:06:57<2:58:31, 2.59s/it] {'loss': 0.8187, 'grad_norm': 4.0701024523531135, 'learning_rate': 1.3416200455985607e-06, 'epoch': 0.66} 66%|██████▋ | 8173/12313 [6:06:57<2:58:31, 2.59s/it] 66%|██████▋ | 8174/12313 [6:06:59<2:56:53, 2.56s/it] {'loss': 0.4059, 'grad_norm': 7.655445454850961, 'learning_rate': 1.3410373177927893e-06, 'epoch': 0.66} 66%|██████▋ | 8174/12313 [6:06:59<2:56:53, 2.56s/it] 66%|██████▋ | 8175/12313 [6:07:02<2:58:18, 2.59s/it] {'loss': 0.5913, 'grad_norm': 2.7798324043491838, 'learning_rate': 1.3404546701811022e-06, 'epoch': 0.66} 66%|██████▋ | 8175/12313 [6:07:02<2:58:18, 2.59s/it] 66%|██████▋ | 8176/12313 [6:07:05<2:58:06, 2.58s/it] {'loss': 0.5026, 'grad_norm': 3.646058875181944, 'learning_rate': 1.3398721028038155e-06, 'epoch': 0.66} 66%|██████▋ | 8176/12313 [6:07:05<2:58:06, 2.58s/it] 66%|██████▋ | 8177/12313 [6:07:07<2:58:08, 2.58s/it] {'loss': 0.6419, 'grad_norm': 4.496311517930436, 'learning_rate': 1.3392896157012386e-06, 'epoch': 0.66} 66%|██████▋ | 8177/12313 [6:07:07<2:58:08, 2.58s/it] 66%|██████▋ | 8178/12313 [6:07:10<3:05:10, 2.69s/it] {'loss': 0.5958, 'grad_norm': 17.84153072314891, 'learning_rate': 1.3387072089136776e-06, 'epoch': 0.66} 66%|██████▋ | 8178/12313 [6:07:10<3:05:10, 2.69s/it] 66%|██████▋ | 8179/12313 [6:07:13<2:59:26, 2.60s/it] {'loss': 0.5721, 'grad_norm': 3.8720341467833834, 'learning_rate': 1.3381248824814326e-06, 'epoch': 0.66} 66%|██████▋ | 8179/12313 [6:07:13<2:59:26, 2.60s/it] 66%|██████▋ | 8180/12313 [6:07:15<2:59:42, 2.61s/it] {'loss': 0.44, 'grad_norm': 4.7847332539123935, 'learning_rate': 1.337542636444795e-06, 'epoch': 0.66} 66%|██████▋ | 8180/12313 [6:07:15<2:59:42, 2.61s/it] 66%|██████▋ | 8181/12313 [6:07:18<2:58:25, 2.59s/it] {'loss': 0.3977, 'grad_norm': 6.021015620083501, 'learning_rate': 1.3369604708440548e-06, 'epoch': 0.66} 66%|██████▋ | 8181/12313 [6:07:18<2:58:25, 2.59s/it] 66%|██████▋ | 8182/12313 [6:07:20<2:58:59, 2.60s/it] {'loss': 0.469, 'grad_norm': 11.600335989741517, 'learning_rate': 1.3363783857194957e-06, 'epoch': 0.66} 66%|██████▋ | 8182/12313 [6:07:20<2:58:59, 2.60s/it] 66%|██████▋ | 8183/12313 [6:07:23<3:05:09, 2.69s/it] {'loss': 0.4263, 'grad_norm': 3.859966985311204, 'learning_rate': 1.3357963811113938e-06, 'epoch': 0.66} 66%|██████▋ | 8183/12313 [6:07:23<3:05:09, 2.69s/it] 66%|██████▋ | 8184/12313 [6:07:26<3:07:50, 2.73s/it] {'loss': 0.6374, 'grad_norm': 18.782716428504518, 'learning_rate': 1.3352144570600203e-06, 'epoch': 0.66} 66%|██████▋ | 8184/12313 [6:07:26<3:07:50, 2.73s/it] 66%|██████▋ | 8185/12313 [6:07:29<3:04:58, 2.69s/it] {'loss': 0.4308, 'grad_norm': 3.387091226384912, 'learning_rate': 1.3346326136056425e-06, 'epoch': 0.66} 66%|██████▋ | 8185/12313 [6:07:29<3:04:58, 2.69s/it] 66%|██████▋ | 8186/12313 [6:07:31<3:01:00, 2.63s/it] {'loss': 0.5107, 'grad_norm': 4.252596910764206, 'learning_rate': 1.3340508507885194e-06, 'epoch': 0.66} 66%|██████▋ | 8186/12313 [6:07:31<3:01:00, 2.63s/it] 66%|██████▋ | 8187/12313 [6:07:34<2:58:20, 2.59s/it] {'loss': 0.4946, 'grad_norm': 7.6038955526056995, 'learning_rate': 1.3334691686489064e-06, 'epoch': 0.66} 66%|██████▋ | 8187/12313 [6:07:34<2:58:20, 2.59s/it] 66%|██████▋ | 8188/12313 [6:07:36<3:02:51, 2.66s/it] {'loss': 0.4381, 'grad_norm': 4.976169303513984, 'learning_rate': 1.3328875672270547e-06, 'epoch': 0.66} 66%|██████▋ | 8188/12313 [6:07:36<3:02:51, 2.66s/it] 67%|██████▋ | 8189/12313 [6:07:39<3:03:14, 2.67s/it] {'loss': 0.6222, 'grad_norm': 4.667554378601018, 'learning_rate': 1.332306046563206e-06, 'epoch': 0.67} 67%|██████▋ | 8189/12313 [6:07:39<3:03:14, 2.67s/it] 67%|██████▋ | 8190/12313 [6:07:42<2:58:11, 2.59s/it] {'loss': 0.6358, 'grad_norm': 4.379049488125857, 'learning_rate': 1.3317246066975981e-06, 'epoch': 0.67} 67%|██████▋ | 8190/12313 [6:07:42<2:58:11, 2.59s/it] 67%|██████▋ | 8191/12313 [6:07:45<3:08:34, 2.74s/it] {'loss': 0.4243, 'grad_norm': 3.794618143261633, 'learning_rate': 1.3311432476704655e-06, 'epoch': 0.67} 67%|██████▋ | 8191/12313 [6:07:45<3:08:34, 2.74s/it] 67%|██████▋ | 8192/12313 [6:07:47<3:04:59, 2.69s/it] {'loss': 0.5623, 'grad_norm': 5.072184639847683, 'learning_rate': 1.3305619695220332e-06, 'epoch': 0.67} 67%|██████▋ | 8192/12313 [6:07:47<3:04:59, 2.69s/it] 67%|██████▋ | 8193/12313 [6:07:50<3:06:56, 2.72s/it] {'loss': 0.4993, 'grad_norm': 7.59051423468418, 'learning_rate': 1.3299807722925231e-06, 'epoch': 0.67} 67%|██████▋ | 8193/12313 [6:07:50<3:06:56, 2.72s/it] 67%|██████▋ | 8194/12313 [6:07:53<3:07:25, 2.73s/it] {'loss': 0.4178, 'grad_norm': 8.819021574798183, 'learning_rate': 1.3293996560221526e-06, 'epoch': 0.67} 67%|██████▋ | 8194/12313 [6:07:53<3:07:25, 2.73s/it] 67%|██████▋ | 8195/12313 [6:07:55<3:06:30, 2.72s/it] {'loss': 0.3488, 'grad_norm': 6.287522877063349, 'learning_rate': 1.3288186207511303e-06, 'epoch': 0.67} 67%|██████▋ | 8195/12313 [6:07:55<3:06:30, 2.72s/it] 67%|██████▋ | 8196/12313 [6:07:58<3:09:27, 2.76s/it] {'loss': 0.4812, 'grad_norm': 5.563934142204648, 'learning_rate': 1.3282376665196603e-06, 'epoch': 0.67} 67%|██████▋ | 8196/12313 [6:07:58<3:09:27, 2.76s/it] 67%|██████▋ | 8197/12313 [6:08:01<3:06:40, 2.72s/it] {'loss': 0.4407, 'grad_norm': 3.033804351653404, 'learning_rate': 1.327656793367943e-06, 'epoch': 0.67} 67%|██████▋ | 8197/12313 [6:08:01<3:06:40, 2.72s/it] 67%|██████▋ | 8198/12313 [6:08:03<3:01:20, 2.64s/it] {'loss': 0.4281, 'grad_norm': 3.998217455418668, 'learning_rate': 1.3270760013361713e-06, 'epoch': 0.67} 67%|██████▋ | 8198/12313 [6:08:03<3:01:20, 2.64s/it] 67%|██████▋ | 8199/12313 [6:08:06<3:00:30, 2.63s/it] {'loss': 0.6847, 'grad_norm': 4.64706121281191, 'learning_rate': 1.3264952904645317e-06, 'epoch': 0.67} 67%|██████▋ | 8199/12313 [6:08:06<3:00:30, 2.63s/it] 67%|██████▋ | 8200/12313 [6:08:09<3:03:13, 2.67s/it] {'loss': 0.4931, 'grad_norm': 6.436231964294674, 'learning_rate': 1.325914660793207e-06, 'epoch': 0.67} 67%|██████▋ | 8200/12313 [6:08:09<3:03:13, 2.67s/it] 67%|██████▋ | 8201/12313 [6:08:12<3:04:01, 2.69s/it] {'loss': 0.5129, 'grad_norm': 3.707584274007003, 'learning_rate': 1.3253341123623756e-06, 'epoch': 0.67} 67%|██████▋ | 8201/12313 [6:08:12<3:04:01, 2.69s/it] 67%|██████▋ | 8202/12313 [6:08:14<3:05:47, 2.71s/it] {'loss': 0.5491, 'grad_norm': 6.262622978288038, 'learning_rate': 1.3247536452122064e-06, 'epoch': 0.67} 67%|██████▋ | 8202/12313 [6:08:14<3:05:47, 2.71s/it]/usr/local/lib/python3.9/dist-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants. warnings.warn( 67%|██████▋ | 8203/12313 [6:08:53<15:28:12, 13.55s/it] {'loss': 0.4305, 'grad_norm': 4.982037820278029, 'learning_rate': 1.3241732593828644e-06, 'epoch': 0.67} 67%|██████▋ | 8203/12313 [6:08:53<15:28:12, 13.55s/it] 67%|██████▋ | 8204/12313 [6:08:56<11:45:02, 10.30s/it] {'loss': 0.4896, 'grad_norm': 6.381634581129901, 'learning_rate': 1.3235929549145105e-06, 'epoch': 0.67} 67%|██████▋ | 8204/12313 [6:08:56<11:45:02, 10.30s/it] 67%|██████▋ | 8205/12313 [6:08:58<9:02:44, 7.93s/it] {'loss': 0.5373, 'grad_norm': 7.969785119936784, 'learning_rate': 1.3230127318472972e-06, 'epoch': 0.67} 67%|██████▋ | 8205/12313 [6:08:58<9:02:44, 7.93s/it] 67%|██████▋ | 8206/12313 [6:09:01<7:18:22, 6.40s/it] {'loss': 0.476, 'grad_norm': 6.0137267480338945, 'learning_rate': 1.3224325902213736e-06, 'epoch': 0.67} 67%|██████▋ | 8206/12313 [6:09:01<7:18:22, 6.40s/it] 67%|██████▋ | 8207/12313 [6:09:04<6:00:45, 5.27s/it] {'loss': 0.4782, 'grad_norm': 6.68986461696678, 'learning_rate': 1.3218525300768837e-06, 'epoch': 0.67} 67%|██████▋ | 8207/12313 [6:09:04<6:00:45, 5.27s/it] 67%|██████▋ | 8208/12313 [6:09:06<5:07:36, 4.50s/it] {'loss': 0.4462, 'grad_norm': 6.316034448019237, 'learning_rate': 1.3212725514539635e-06, 'epoch': 0.67} 67%|██████▋ | 8208/12313 [6:09:06<5:07:36, 4.50s/it] 67%|██████▋ | 8209/12313 [6:09:09<4:31:05, 3.96s/it] {'loss': 0.3812, 'grad_norm': 5.228047836489257, 'learning_rate': 1.3206926543927435e-06, 'epoch': 0.67} 67%|██████▋ | 8209/12313 [6:09:09<4:31:05, 3.96s/it] 67%|██████▋ | 8210/12313 [6:09:12<4:04:06, 3.57s/it] {'loss': 0.469, 'grad_norm': 6.369850283074102, 'learning_rate': 1.320112838933351e-06, 'epoch': 0.67} 67%|██████▋ | 8210/12313 [6:09:12<4:04:06, 3.57s/it] 67%|██████▋ | 8211/12313 [6:09:14<3:44:57, 3.29s/it] {'loss': 0.4444, 'grad_norm': 4.859894104230745, 'learning_rate': 1.3195331051159058e-06, 'epoch': 0.67} 67%|██████▋ | 8211/12313 [6:09:14<3:44:57, 3.29s/it] 67%|██████▋ | 8212/12313 [6:09:17<3:33:05, 3.12s/it] {'loss': 0.4554, 'grad_norm': 4.0624444842962735, 'learning_rate': 1.3189534529805212e-06, 'epoch': 0.67} 67%|██████▋ | 8212/12313 [6:09:17<3:33:05, 3.12s/it] 67%|██████▋ | 8213/12313 [6:09:20<3:29:51, 3.07s/it] {'loss': 0.5201, 'grad_norm': 3.4817004946398287, 'learning_rate': 1.318373882567307e-06, 'epoch': 0.67} 67%|██████▋ | 8213/12313 [6:09:20<3:29:51, 3.07s/it] 67%|██████▋ | 8214/12313 [6:09:23<3:30:39, 3.08s/it] {'loss': 0.4444, 'grad_norm': 4.704982903669908, 'learning_rate': 1.3177943939163677e-06, 'epoch': 0.67} 67%|██████▋ | 8214/12313 [6:09:23<3:30:39, 3.08s/it] 67%|██████▋ | 8215/12313 [6:09:26<3:22:41, 2.97s/it] {'loss': 0.4768, 'grad_norm': 5.794219397597491, 'learning_rate': 1.3172149870677985e-06, 'epoch': 0.67} 67%|██████▋ | 8215/12313 [6:09:26<3:22:41, 2.97s/it] 67%|██████▋ | 8216/12313 [6:09:28<3:12:49, 2.82s/it] {'loss': 0.5479, 'grad_norm': 6.208905832820019, 'learning_rate': 1.3166356620616932e-06, 'epoch': 0.67} 67%|██████▋ | 8216/12313 [6:09:28<3:12:49, 2.82s/it] 67%|██████▋ | 8217/12313 [6:09:31<3:09:13, 2.77s/it] {'loss': 0.4532, 'grad_norm': 8.198766362247154, 'learning_rate': 1.3160564189381376e-06, 'epoch': 0.67} 67%|██████▋ | 8217/12313 [6:09:31<3:09:13, 2.77s/it] 67%|██████▋ | 8218/12313 [6:09:34<3:10:17, 2.79s/it] {'loss': 0.4955, 'grad_norm': 4.331763449279288, 'learning_rate': 1.3154772577372104e-06, 'epoch': 0.67} 67%|██████▋ | 8218/12313 [6:09:34<3:10:17, 2.79s/it] 67%|██████▋ | 8219/12313 [6:09:37<3:11:57, 2.81s/it] {'loss': 0.4523, 'grad_norm': 5.10837125089982, 'learning_rate': 1.3148981784989884e-06, 'epoch': 0.67} 67%|██████▋ | 8219/12313 [6:09:37<3:11:57, 2.81s/it] 67%|██████▋ | 8220/12313 [6:09:39<3:10:05, 2.79s/it] {'loss': 0.6125, 'grad_norm': 3.5163841336361807, 'learning_rate': 1.3143191812635408e-06, 'epoch': 0.67} 67%|██████▋ | 8220/12313 [6:09:39<3:10:05, 2.79s/it] 67%|██████▋ | 8221/12313 [6:09:42<3:05:24, 2.72s/it] {'loss': 0.4425, 'grad_norm': 11.844210364989475, 'learning_rate': 1.3137402660709314e-06, 'epoch': 0.67} 67%|██████▋ | 8221/12313 [6:09:42<3:05:24, 2.72s/it] 67%|██████▋ | 8222/12313 [6:09:45<3:01:57, 2.67s/it] {'loss': 0.6123, 'grad_norm': 7.996601522252169, 'learning_rate': 1.3131614329612158e-06, 'epoch': 0.67} 67%|██████▋ | 8222/12313 [6:09:45<3:01:57, 2.67s/it] 67%|██████▋ | 8223/12313 [6:09:47<2:57:14, 2.60s/it] {'loss': 0.5059, 'grad_norm': 6.114487257718311, 'learning_rate': 1.3125826819744493e-06, 'epoch': 0.67} 67%|██████▋ | 8223/12313 [6:09:47<2:57:14, 2.60s/it] 67%|██████▋ | 8224/12313 [6:09:51<3:22:04, 2.97s/it] {'loss': 0.4303, 'grad_norm': 4.43142805188028, 'learning_rate': 1.3120040131506767e-06, 'epoch': 0.67} 67%|██████▋ | 8224/12313 [6:09:51<3:22:04, 2.97s/it] 67%|██████▋ | 8225/12313 [6:09:54<3:22:36, 2.97s/it] {'loss': 0.4586, 'grad_norm': 5.687962653675712, 'learning_rate': 1.3114254265299379e-06, 'epoch': 0.67} 67%|██████▋ | 8225/12313 [6:09:54<3:22:36, 2.97s/it] 67%|██████▋ | 8226/12313 [6:09:56<3:11:25, 2.81s/it] {'loss': 0.5445, 'grad_norm': 4.1918238610968475, 'learning_rate': 1.310846922152269e-06, 'epoch': 0.67} 67%|██████▋ | 8226/12313 [6:09:56<3:11:25, 2.81s/it] 67%|██████▋ | 8227/12313 [6:10:00<3:20:39, 2.95s/it] {'loss': 0.5172, 'grad_norm': 5.179655298264121, 'learning_rate': 1.310268500057701e-06, 'epoch': 0.67} 67%|██████▋ | 8227/12313 [6:10:00<3:20:39, 2.95s/it] 67%|██████▋ | 8228/12313 [6:10:02<3:13:18, 2.84s/it] {'loss': 0.5864, 'grad_norm': 7.779574674479231, 'learning_rate': 1.309690160286255e-06, 'epoch': 0.67} 67%|██████▋ | 8228/12313 [6:10:02<3:13:18, 2.84s/it] 67%|██████▋ | 8229/12313 [6:10:05<3:04:34, 2.71s/it] {'loss': 0.548, 'grad_norm': 6.122052156829049, 'learning_rate': 1.3091119028779514e-06, 'epoch': 0.67} 67%|██████▋ | 8229/12313 [6:10:05<3:04:34, 2.71s/it] 67%|██████▋ | 8230/12313 [6:10:07<3:04:08, 2.71s/it] {'loss': 0.4313, 'grad_norm': 4.789126863334509, 'learning_rate': 1.308533727872801e-06, 'epoch': 0.67} 67%|██████▋ | 8230/12313 [6:10:07<3:04:08, 2.71s/it] 67%|██████▋ | 8231/12313 [6:10:10<3:06:58, 2.75s/it] {'loss': 0.532, 'grad_norm': 5.766521551851652, 'learning_rate': 1.3079556353108106e-06, 'epoch': 0.67} 67%|██████▋ | 8231/12313 [6:10:10<3:06:58, 2.75s/it] 67%|██████▋ | 8232/12313 [6:10:13<3:02:16, 2.68s/it] {'loss': 0.443, 'grad_norm': 6.05863385781591, 'learning_rate': 1.307377625231981e-06, 'epoch': 0.67} 67%|██████▋ | 8232/12313 [6:10:13<3:02:16, 2.68s/it] 67%|██████▋ | 8233/12313 [6:10:15<3:05:05, 2.72s/it] {'loss': 0.5801, 'grad_norm': 4.181674552131915, 'learning_rate': 1.3067996976763086e-06, 'epoch': 0.67} 67%|██████▋ | 8233/12313 [6:10:15<3:05:05, 2.72s/it] 67%|██████▋ | 8234/12313 [6:10:18<3:03:35, 2.70s/it] {'loss': 0.5704, 'grad_norm': 3.206551391730835, 'learning_rate': 1.3062218526837828e-06, 'epoch': 0.67} 67%|██████▋ | 8234/12313 [6:10:18<3:03:35, 2.70s/it] 67%|██████▋ | 8235/12313 [6:10:21<2:59:10, 2.64s/it] {'loss': 0.4123, 'grad_norm': 10.62932913375947, 'learning_rate': 1.3056440902943856e-06, 'epoch': 0.67} 67%|██████▋ | 8235/12313 [6:10:21<2:59:10, 2.64s/it] 67%|██████▋ | 8236/12313 [6:10:23<2:58:43, 2.63s/it] {'loss': 0.5647, 'grad_norm': 4.544980879348178, 'learning_rate': 1.305066410548097e-06, 'epoch': 0.67} 67%|██████▋ | 8236/12313 [6:10:23<2:58:43, 2.63s/it] 67%|██████▋ | 8237/12313 [6:10:26<2:59:02, 2.64s/it] {'loss': 0.5001, 'grad_norm': 5.011004929662773, 'learning_rate': 1.304488813484889e-06, 'epoch': 0.67} 67%|██████▋ | 8237/12313 [6:10:26<2:59:02, 2.64s/it] 67%|██████▋ | 8238/12313 [6:10:28<2:54:23, 2.57s/it] {'loss': 0.4763, 'grad_norm': 5.610021787019349, 'learning_rate': 1.303911299144727e-06, 'epoch': 0.67} 67%|██████▋ | 8238/12313 [6:10:28<2:54:23, 2.57s/it] 67%|██████▋ | 8239/12313 [6:10:31<2:57:40, 2.62s/it] {'loss': 0.4336, 'grad_norm': 87.36388987621034, 'learning_rate': 1.3033338675675726e-06, 'epoch': 0.67} 67%|██████▋ | 8239/12313 [6:10:31<2:57:40, 2.62s/it] 67%|██████▋ | 8240/12313 [6:10:34<3:02:27, 2.69s/it] {'loss': 0.4406, 'grad_norm': 3.9673278123195566, 'learning_rate': 1.3027565187933828e-06, 'epoch': 0.67} 67%|██████▋ | 8240/12313 [6:10:34<3:02:27, 2.69s/it] 67%|██████▋ | 8241/12313 [6:10:36<2:58:45, 2.63s/it] {'loss': 0.4505, 'grad_norm': 4.0150982480618165, 'learning_rate': 1.3021792528621041e-06, 'epoch': 0.67} 67%|██████▋ | 8241/12313 [6:10:36<2:58:45, 2.63s/it] 67%|██████▋ | 8242/12313 [6:10:39<3:00:23, 2.66s/it] {'loss': 0.4101, 'grad_norm': 3.8387858911160473, 'learning_rate': 1.3016020698136827e-06, 'epoch': 0.67} 67%|██████▋ | 8242/12313 [6:10:39<3:00:23, 2.66s/it] 67%|██████▋ | 8243/12313 [6:10:42<2:58:56, 2.64s/it] {'loss': 0.5057, 'grad_norm': 3.2719359158581245, 'learning_rate': 1.3010249696880558e-06, 'epoch': 0.67} 67%|██████▋ | 8243/12313 [6:10:42<2:58:56, 2.64s/it] 67%|██████▋ | 8244/12313 [6:10:44<3:00:04, 2.66s/it] {'loss': 0.3984, 'grad_norm': 8.689632620129073, 'learning_rate': 1.3004479525251545e-06, 'epoch': 0.67} 67%|██████▋ | 8244/12313 [6:10:44<3:00:04, 2.66s/it] 67%|██████▋ | 8245/12313 [6:10:47<3:01:31, 2.68s/it] {'loss': 0.48, 'grad_norm': 4.334649919352921, 'learning_rate': 1.2998710183649066e-06, 'epoch': 0.67} 67%|██████▋ | 8245/12313 [6:10:47<3:01:31, 2.68s/it] 67%|██████▋ | 8246/12313 [6:10:51<3:19:34, 2.94s/it] {'loss': 0.4935, 'grad_norm': 3.604389609859541, 'learning_rate': 1.2992941672472332e-06, 'epoch': 0.67} 67%|██████▋ | 8246/12313 [6:10:51<3:19:34, 2.94s/it] 67%|██████▋ | 8247/12313 [6:10:53<3:09:18, 2.79s/it] {'loss': 0.4603, 'grad_norm': 4.775702711045651, 'learning_rate': 1.2987173992120478e-06, 'epoch': 0.67} 67%|██████▋ | 8247/12313 [6:10:53<3:09:18, 2.79s/it] 67%|██████▋ | 8248/12313 [6:10:56<3:04:16, 2.72s/it] {'loss': 0.4972, 'grad_norm': 5.179644996464377, 'learning_rate': 1.2981407142992618e-06, 'epoch': 0.67} 67%|██████▋ | 8248/12313 [6:10:56<3:04:16, 2.72s/it] 67%|██████▋ | 8249/12313 [6:10:58<3:07:25, 2.77s/it] {'loss': 0.5587, 'grad_norm': 4.832968999586265, 'learning_rate': 1.2975641125487777e-06, 'epoch': 0.67} 67%|██████▋ | 8249/12313 [6:10:58<3:07:25, 2.77s/it] 67%|██████▋ | 8250/12313 [6:11:02<3:20:13, 2.96s/it] {'loss': 0.335, 'grad_norm': 4.3612144381710065, 'learning_rate': 1.2969875940004923e-06, 'epoch': 0.67} 67%|██████▋ | 8250/12313 [6:11:02<3:20:13, 2.96s/it] 67%|██████▋ | 8251/12313 [6:11:04<3:12:56, 2.85s/it] {'loss': 0.6306, 'grad_norm': 6.055517461311078, 'learning_rate': 1.2964111586942996e-06, 'epoch': 0.67} 67%|██████▋ | 8251/12313 [6:11:04<3:12:56, 2.85s/it] 67%|██████▋ | 8252/12313 [6:11:07<3:07:31, 2.77s/it] {'loss': 0.5962, 'grad_norm': 3.996269903879433, 'learning_rate': 1.2958348066700833e-06, 'epoch': 0.67} 67%|██████▋ | 8252/12313 [6:11:07<3:07:31, 2.77s/it] 67%|██████▋ | 8253/12313 [6:11:10<3:04:29, 2.73s/it] {'loss': 0.453, 'grad_norm': 6.099178781598084, 'learning_rate': 1.2952585379677268e-06, 'epoch': 0.67} 67%|██████▋ | 8253/12313 [6:11:10<3:04:29, 2.73s/it] 67%|██████▋ | 8254/12313 [6:11:12<3:02:37, 2.70s/it] {'loss': 0.4299, 'grad_norm': 6.116289571948397, 'learning_rate': 1.2946823526271023e-06, 'epoch': 0.67} 67%|██████▋ | 8254/12313 [6:11:12<3:02:37, 2.70s/it] 67%|██████▋ | 8255/12313 [6:11:15<3:05:27, 2.74s/it] {'loss': 0.5551, 'grad_norm': 5.092428454455709, 'learning_rate': 1.2941062506880811e-06, 'epoch': 0.67} 67%|██████▋ | 8255/12313 [6:11:15<3:05:27, 2.74s/it] 67%|██████▋ | 8256/12313 [6:11:18<2:59:56, 2.66s/it] {'loss': 0.6384, 'grad_norm': 5.682161639107994, 'learning_rate': 1.2935302321905252e-06, 'epoch': 0.67} 67%|██████▋ | 8256/12313 [6:11:18<2:59:56, 2.66s/it] 67%|██████▋ | 8257/12313 [6:11:20<3:03:26, 2.71s/it] {'loss': 0.6603, 'grad_norm': 4.3807459106680575, 'learning_rate': 1.292954297174291e-06, 'epoch': 0.67} 67%|██████▋ | 8257/12313 [6:11:20<3:03:26, 2.71s/it] 67%|██████▋ | 8258/12313 [6:11:23<3:09:32, 2.80s/it] {'loss': 0.4831, 'grad_norm': 3.7416878890359517, 'learning_rate': 1.2923784456792314e-06, 'epoch': 0.67} 67%|██████▋ | 8258/12313 [6:11:23<3:09:32, 2.80s/it] 67%|██████▋ | 8259/12313 [6:11:26<3:12:42, 2.85s/it] {'loss': 0.4317, 'grad_norm': 3.7742295359579874, 'learning_rate': 1.291802677745193e-06, 'epoch': 0.67} 67%|██████▋ | 8259/12313 [6:11:26<3:12:42, 2.85s/it] 67%|██████▋ | 8260/12313 [6:11:29<3:12:19, 2.85s/it] {'loss': 0.5397, 'grad_norm': 4.136648173116798, 'learning_rate': 1.2912269934120142e-06, 'epoch': 0.67} 67%|██████▋ | 8260/12313 [6:11:29<3:12:19, 2.85s/it] 67%|██████▋ | 8261/12313 [6:11:32<3:07:56, 2.78s/it] {'loss': 0.4942, 'grad_norm': 10.177523748838187, 'learning_rate': 1.2906513927195308e-06, 'epoch': 0.67} 67%|██████▋ | 8261/12313 [6:11:32<3:07:56, 2.78s/it] 67%|██████▋ | 8262/12313 [6:11:35<3:03:47, 2.72s/it] {'loss': 0.3985, 'grad_norm': 5.041065114037551, 'learning_rate': 1.290075875707571e-06, 'epoch': 0.67} 67%|██████▋ | 8262/12313 [6:11:35<3:03:47, 2.72s/it] 67%|██████▋ | 8263/12313 [6:11:37<3:02:52, 2.71s/it] {'loss': 0.4884, 'grad_norm': 6.204016855298799, 'learning_rate': 1.2895004424159557e-06, 'epoch': 0.67} 67%|██████▋ | 8263/12313 [6:11:37<3:02:52, 2.71s/it] 67%|██████▋ | 8264/12313 [6:11:40<3:02:27, 2.70s/it] {'loss': 0.3397, 'grad_norm': 6.451950713200213, 'learning_rate': 1.2889250928845038e-06, 'epoch': 0.67} 67%|██████▋ | 8264/12313 [6:11:40<3:02:27, 2.70s/it] 67%|██████▋ | 8265/12313 [6:11:43<3:01:53, 2.70s/it] {'loss': 0.5267, 'grad_norm': 5.789167573594882, 'learning_rate': 1.2883498271530265e-06, 'epoch': 0.67} 67%|██████▋ | 8265/12313 [6:11:43<3:01:53, 2.70s/it] 67%|██████▋ | 8266/12313 [6:11:45<2:57:21, 2.63s/it] {'loss': 0.5771, 'grad_norm': 3.020900821630963, 'learning_rate': 1.2877746452613277e-06, 'epoch': 0.67} 67%|██████▋ | 8266/12313 [6:11:45<2:57:21, 2.63s/it] 67%|██████▋ | 8267/12313 [6:11:48<3:00:03, 2.67s/it] {'loss': 0.4145, 'grad_norm': 7.136575144733493, 'learning_rate': 1.2871995472492088e-06, 'epoch': 0.67} 67%|██████▋ | 8267/12313 [6:11:48<3:00:03, 2.67s/it] 67%|██████▋ | 8268/12313 [6:11:51<3:03:09, 2.72s/it] {'loss': 0.4899, 'grad_norm': 6.86909884452662, 'learning_rate': 1.2866245331564627e-06, 'epoch': 0.67} 67%|██████▋ | 8268/12313 [6:11:51<3:03:09, 2.72s/it] 67%|██████▋ | 8269/12313 [6:11:53<3:00:05, 2.67s/it] {'loss': 0.7517, 'grad_norm': 4.392832664582703, 'learning_rate': 1.2860496030228763e-06, 'epoch': 0.67} 67%|██████▋ | 8269/12313 [6:11:53<3:00:05, 2.67s/it] 67%|██████▋ | 8270/12313 [6:11:56<2:58:11, 2.64s/it] {'loss': 0.4853, 'grad_norm': 4.036722831992974, 'learning_rate': 1.2854747568882336e-06, 'epoch': 0.67} 67%|██████▋ | 8270/12313 [6:11:56<2:58:11, 2.64s/it] 67%|██████▋ | 8271/12313 [6:11:59<3:02:26, 2.71s/it] {'loss': 0.3944, 'grad_norm': 4.039548813778698, 'learning_rate': 1.2848999947923089e-06, 'epoch': 0.67} 67%|██████▋ | 8271/12313 [6:11:59<3:02:26, 2.71s/it] 67%|██████▋ | 8272/12313 [6:12:01<2:58:51, 2.66s/it] {'loss': 0.4949, 'grad_norm': 4.0410979346726625, 'learning_rate': 1.2843253167748745e-06, 'epoch': 0.67} 67%|██████▋ | 8272/12313 [6:12:01<2:58:51, 2.66s/it] 67%|██████▋ | 8273/12313 [6:12:04<3:00:25, 2.68s/it] {'loss': 0.7473, 'grad_norm': 4.170268917322463, 'learning_rate': 1.2837507228756934e-06, 'epoch': 0.67} 67%|██████▋ | 8273/12313 [6:12:04<3:00:25, 2.68s/it] 67%|██████▋ | 8274/12313 [6:12:07<3:01:30, 2.70s/it] {'loss': 0.3525, 'grad_norm': 5.339351144770748, 'learning_rate': 1.2831762131345265e-06, 'epoch': 0.67} 67%|██████▋ | 8274/12313 [6:12:07<3:01:30, 2.70s/it] 67%|██████▋ | 8275/12313 [6:12:09<3:02:17, 2.71s/it] {'loss': 0.5092, 'grad_norm': 4.558326943954262, 'learning_rate': 1.2826017875911257e-06, 'epoch': 0.67} 67%|██████▋ | 8275/12313 [6:12:09<3:02:17, 2.71s/it] 67%|██████▋ | 8276/12313 [6:12:12<3:02:03, 2.71s/it] {'loss': 0.3959, 'grad_norm': 6.121469512209196, 'learning_rate': 1.2820274462852373e-06, 'epoch': 0.67} 67%|██████▋ | 8276/12313 [6:12:12<3:02:03, 2.71s/it] 67%|██████▋ | 8277/12313 [6:12:15<2:58:40, 2.66s/it] {'loss': 0.4574, 'grad_norm': 6.247553742372245, 'learning_rate': 1.2814531892566034e-06, 'epoch': 0.67} 67%|██████▋ | 8277/12313 [6:12:15<2:58:40, 2.66s/it] 67%|██████▋ | 8278/12313 [6:12:17<2:58:29, 2.65s/it] {'loss': 0.4817, 'grad_norm': 3.743110767471212, 'learning_rate': 1.2808790165449609e-06, 'epoch': 0.67} 67%|██████▋ | 8278/12313 [6:12:17<2:58:29, 2.65s/it] 67%|██████▋ | 8279/12313 [6:12:20<2:58:22, 2.65s/it] {'loss': 0.4439, 'grad_norm': 3.847254853806929, 'learning_rate': 1.280304928190037e-06, 'epoch': 0.67} 67%|██████▋ | 8279/12313 [6:12:20<2:58:22, 2.65s/it] 67%|██████▋ | 8280/12313 [6:12:23<3:03:09, 2.72s/it] {'loss': 0.5166, 'grad_norm': 5.111654340531649, 'learning_rate': 1.2797309242315584e-06, 'epoch': 0.67} 67%|██████▋ | 8280/12313 [6:12:23<3:03:09, 2.72s/it] 67%|██████▋ | 8281/12313 [6:12:26<3:03:06, 2.72s/it] {'loss': 0.4379, 'grad_norm': 5.402468457889438, 'learning_rate': 1.2791570047092413e-06, 'epoch': 0.67} 67%|██████▋ | 8281/12313 [6:12:26<3:03:06, 2.72s/it] 67%|██████▋ | 8282/12313 [6:12:28<3:04:15, 2.74s/it] {'loss': 0.4985, 'grad_norm': 4.64148894732247, 'learning_rate': 1.2785831696627975e-06, 'epoch': 0.67} 67%|██████▋ | 8282/12313 [6:12:28<3:04:15, 2.74s/it] 67%|██████▋ | 8283/12313 [6:12:31<3:00:47, 2.69s/it] {'loss': 0.3378, 'grad_norm': 4.977759194515003, 'learning_rate': 1.2780094191319348e-06, 'epoch': 0.67} 67%|██████▋ | 8283/12313 [6:12:31<3:00:47, 2.69s/it] 67%|██████▋ | 8284/12313 [6:12:34<3:01:32, 2.70s/it] {'loss': 0.5315, 'grad_norm': 5.750546485168828, 'learning_rate': 1.2774357531563522e-06, 'epoch': 0.67} 67%|██████▋ | 8284/12313 [6:12:34<3:01:32, 2.70s/it] 67%|██████▋ | 8285/12313 [6:12:36<2:58:08, 2.65s/it] {'loss': 0.611, 'grad_norm': 5.462788945858097, 'learning_rate': 1.276862171775745e-06, 'epoch': 0.67} 67%|██████▋ | 8285/12313 [6:12:36<2:58:08, 2.65s/it] 67%|██████▋ | 8286/12313 [6:12:39<2:57:31, 2.64s/it] {'loss': 0.4435, 'grad_norm': 9.040739646165454, 'learning_rate': 1.2762886750298033e-06, 'epoch': 0.67} 67%|██████▋ | 8286/12313 [6:12:39<2:57:31, 2.64s/it] 67%|██████▋ | 8287/12313 [6:12:41<2:58:31, 2.66s/it] {'loss': 0.585, 'grad_norm': 5.419042990213366, 'learning_rate': 1.275715262958209e-06, 'epoch': 0.67} 67%|██████▋ | 8287/12313 [6:12:41<2:58:31, 2.66s/it] 67%|██████▋ | 8288/12313 [6:12:44<2:56:51, 2.64s/it] {'loss': 0.4452, 'grad_norm': 8.622423592062997, 'learning_rate': 1.275141935600639e-06, 'epoch': 0.67} 67%|██████▋ | 8288/12313 [6:12:44<2:56:51, 2.64s/it] 67%|██████▋ | 8289/12313 [6:12:47<2:54:46, 2.61s/it] {'loss': 0.5745, 'grad_norm': 3.39665996833734, 'learning_rate': 1.2745686929967632e-06, 'epoch': 0.67} 67%|██████▋ | 8289/12313 [6:12:47<2:54:46, 2.61s/it] 67%|██████▋ | 8290/12313 [6:12:49<2:55:53, 2.62s/it] {'loss': 0.4325, 'grad_norm': 9.640446040935274, 'learning_rate': 1.2739955351862488e-06, 'epoch': 0.67} 67%|██████▋ | 8290/12313 [6:12:49<2:55:53, 2.62s/it] 67%|██████▋ | 8291/12313 [6:12:52<2:59:54, 2.68s/it] {'loss': 0.742, 'grad_norm': 3.375245810027421, 'learning_rate': 1.2734224622087556e-06, 'epoch': 0.67} 67%|██████▋ | 8291/12313 [6:12:52<2:59:54, 2.68s/it] 67%|██████▋ | 8292/12313 [6:12:54<2:52:48, 2.58s/it] {'loss': 0.4282, 'grad_norm': 6.070864539301688, 'learning_rate': 1.2728494741039354e-06, 'epoch': 0.67} 67%|██████▋ | 8292/12313 [6:12:54<2:52:48, 2.58s/it] 67%|██████▋ | 8293/12313 [6:12:57<3:02:57, 2.73s/it] {'loss': 0.5101, 'grad_norm': 3.1473742664954956, 'learning_rate': 1.2722765709114382e-06, 'epoch': 0.67} 67%|██████▋ | 8293/12313 [6:12:57<3:02:57, 2.73s/it] 67%|██████▋ | 8294/12313 [6:13:00<3:02:23, 2.72s/it] {'loss': 0.4927, 'grad_norm': 5.797541760307167, 'learning_rate': 1.2717037526709048e-06, 'epoch': 0.67} 67%|██████▋ | 8294/12313 [6:13:00<3:02:23, 2.72s/it] 67%|██████▋ | 8295/12313 [6:13:03<3:00:42, 2.70s/it] {'loss': 0.5272, 'grad_norm': 3.9968403294119192, 'learning_rate': 1.2711310194219695e-06, 'epoch': 0.67} 67%|██████▋ | 8295/12313 [6:13:03<3:00:42, 2.70s/it] 67%|██████▋ | 8296/12313 [6:13:06<3:07:27, 2.80s/it] {'loss': 0.3865, 'grad_norm': 8.686594608534724, 'learning_rate': 1.2705583712042654e-06, 'epoch': 0.67} 67%|██████▋ | 8296/12313 [6:13:06<3:07:27, 2.80s/it] 67%|██████▋ | 8297/12313 [6:13:09<3:08:05, 2.81s/it] {'loss': 0.6049, 'grad_norm': 5.299840930228158, 'learning_rate': 1.2699858080574141e-06, 'epoch': 0.67} 67%|██████▋ | 8297/12313 [6:13:09<3:08:05, 2.81s/it] 67%|██████▋ | 8298/12313 [6:13:12<3:08:54, 2.82s/it] {'loss': 0.5769, 'grad_norm': 5.383383957246399, 'learning_rate': 1.2694133300210354e-06, 'epoch': 0.67} 67%|██████▋ | 8298/12313 [6:13:12<3:08:54, 2.82s/it] 67%|██████▋ | 8299/12313 [6:13:14<3:10:52, 2.85s/it] {'loss': 0.5609, 'grad_norm': 5.48276722761316, 'learning_rate': 1.2688409371347422e-06, 'epoch': 0.67} 67%|██████▋ | 8299/12313 [6:13:14<3:10:52, 2.85s/it] 67%|██████▋ | 8300/12313 [6:13:17<3:13:21, 2.89s/it] {'loss': 0.5895, 'grad_norm': 4.890248642899915, 'learning_rate': 1.2682686294381403e-06, 'epoch': 0.67} 67%|██████▋ | 8300/12313 [6:13:17<3:13:21, 2.89s/it] 67%|██████▋ | 8301/12313 [6:13:20<3:06:04, 2.78s/it] {'loss': 0.406, 'grad_norm': 5.286027918182194, 'learning_rate': 1.2676964069708294e-06, 'epoch': 0.67} 67%|██████▋ | 8301/12313 [6:13:20<3:06:04, 2.78s/it] 67%|██████▋ | 8302/12313 [6:13:23<3:01:53, 2.72s/it] {'loss': 0.6805, 'grad_norm': 4.144062037889405, 'learning_rate': 1.2671242697724061e-06, 'epoch': 0.67} 67%|██████▋ | 8302/12313 [6:13:23<3:01:53, 2.72s/it] 67%|██████▋ | 8303/12313 [6:13:25<2:59:30, 2.69s/it] {'loss': 0.4172, 'grad_norm': 3.10497422112379, 'learning_rate': 1.266552217882458e-06, 'epoch': 0.67} 67%|██████▋ | 8303/12313 [6:13:25<2:59:30, 2.69s/it] 67%|██████▋ | 8304/12313 [6:13:28<2:59:22, 2.68s/it] {'loss': 0.3766, 'grad_norm': 4.7159885122704495, 'learning_rate': 1.265980251340568e-06, 'epoch': 0.67} 67%|██████▋ | 8304/12313 [6:13:28<2:59:22, 2.68s/it] 67%|██████▋ | 8305/12313 [6:13:31<2:59:36, 2.69s/it] {'loss': 0.5068, 'grad_norm': 6.029787369464984, 'learning_rate': 1.265408370186315e-06, 'epoch': 0.67} 67%|██████▋ | 8305/12313 [6:13:31<2:59:36, 2.69s/it] 67%|██████▋ | 8306/12313 [6:13:33<3:00:25, 2.70s/it] {'loss': 0.5335, 'grad_norm': 9.477273387771245, 'learning_rate': 1.2648365744592683e-06, 'epoch': 0.67} 67%|██████▋ | 8306/12313 [6:13:33<3:00:25, 2.70s/it] 67%|██████▋ | 8307/12313 [6:13:36<2:56:43, 2.65s/it] {'loss': 0.3893, 'grad_norm': 5.274931921378115, 'learning_rate': 1.264264864198994e-06, 'epoch': 0.67} 67%|██████▋ | 8307/12313 [6:13:36<2:56:43, 2.65s/it] 67%|██████▋ | 8308/12313 [6:13:39<3:00:38, 2.71s/it] {'loss': 0.3245, 'grad_norm': 6.694925532405975, 'learning_rate': 1.2636932394450502e-06, 'epoch': 0.67} 67%|██████▋ | 8308/12313 [6:13:39<3:00:38, 2.71s/it] 67%|██████▋ | 8309/12313 [6:13:41<3:00:18, 2.70s/it] {'loss': 0.5724, 'grad_norm': 3.4329112146537994, 'learning_rate': 1.2631217002369917e-06, 'epoch': 0.67} 67%|██████▋ | 8309/12313 [6:13:41<3:00:18, 2.70s/it] 67%|██████▋ | 8310/12313 [6:13:44<3:00:52, 2.71s/it] {'loss': 0.6977, 'grad_norm': 5.211887506168933, 'learning_rate': 1.2625502466143646e-06, 'epoch': 0.67} 67%|██████▋ | 8310/12313 [6:13:44<3:00:52, 2.71s/it] 67%|██████▋ | 8311/12313 [6:13:47<2:55:54, 2.64s/it] {'loss': 0.4674, 'grad_norm': 3.3614053784547258, 'learning_rate': 1.2619788786167113e-06, 'epoch': 0.67} 67%|██████▋ | 8311/12313 [6:13:47<2:55:54, 2.64s/it] 68%|██████▊ | 8312/12313 [6:13:49<2:59:04, 2.69s/it] {'loss': 0.4857, 'grad_norm': 3.329679930838226, 'learning_rate': 1.2614075962835688e-06, 'epoch': 0.68} 68%|██████▊ | 8312/12313 [6:13:49<2:59:04, 2.69s/it] 68%|██████▊ | 8313/12313 [6:13:52<2:59:29, 2.69s/it] {'loss': 0.5106, 'grad_norm': 5.737443755358583, 'learning_rate': 1.2608363996544654e-06, 'epoch': 0.68} 68%|██████▊ | 8313/12313 [6:13:52<2:59:29, 2.69s/it] 68%|██████▊ | 8314/12313 [6:13:55<2:59:34, 2.69s/it] {'loss': 0.4443, 'grad_norm': 6.125567742013193, 'learning_rate': 1.2602652887689237e-06, 'epoch': 0.68} 68%|██████▊ | 8314/12313 [6:13:55<2:59:34, 2.69s/it] 68%|██████▊ | 8315/12313 [6:13:57<2:58:53, 2.68s/it] {'loss': 0.4483, 'grad_norm': 9.237383534994807, 'learning_rate': 1.2596942636664638e-06, 'epoch': 0.68} 68%|██████▊ | 8315/12313 [6:13:57<2:58:53, 2.68s/it] 68%|██████▊ | 8316/12313 [6:14:00<3:03:02, 2.75s/it] {'loss': 0.4114, 'grad_norm': 5.106127682199849, 'learning_rate': 1.2591233243865958e-06, 'epoch': 0.68} 68%|██████▊ | 8316/12313 [6:14:00<3:03:02, 2.75s/it] 68%|██████▊ | 8317/12313 [6:14:03<3:00:08, 2.70s/it] {'loss': 0.5567, 'grad_norm': 6.956467222846337, 'learning_rate': 1.2585524709688268e-06, 'epoch': 0.68} 68%|██████▊ | 8317/12313 [6:14:03<3:00:08, 2.70s/it] 68%|██████▊ | 8318/12313 [6:14:06<3:01:31, 2.73s/it] {'loss': 0.4624, 'grad_norm': 6.165264566742239, 'learning_rate': 1.257981703452657e-06, 'epoch': 0.68} 68%|██████▊ | 8318/12313 [6:14:06<3:01:31, 2.73s/it] 68%|██████▊ | 8319/12313 [6:14:08<3:00:04, 2.71s/it] {'loss': 0.479, 'grad_norm': 3.908854775644603, 'learning_rate': 1.2574110218775804e-06, 'epoch': 0.68} 68%|██████▊ | 8319/12313 [6:14:08<3:00:04, 2.71s/it] 68%|██████▊ | 8320/12313 [6:14:11<2:55:29, 2.64s/it] {'loss': 0.4723, 'grad_norm': 5.299167034989398, 'learning_rate': 1.2568404262830836e-06, 'epoch': 0.68} 68%|██████▊ | 8320/12313 [6:14:11<2:55:29, 2.64s/it] 68%|██████▊ | 8321/12313 [6:14:14<2:57:26, 2.67s/it] {'loss': 0.4358, 'grad_norm': 5.113207491776007, 'learning_rate': 1.256269916708651e-06, 'epoch': 0.68} 68%|██████▊ | 8321/12313 [6:14:14<2:57:26, 2.67s/it] 68%|██████▊ | 8322/12313 [6:14:16<3:00:25, 2.71s/it] {'loss': 0.4825, 'grad_norm': 6.490258586098834, 'learning_rate': 1.2556994931937565e-06, 'epoch': 0.68} 68%|██████▊ | 8322/12313 [6:14:16<3:00:25, 2.71s/it] 68%|██████▊ | 8323/12313 [6:14:19<3:00:50, 2.72s/it] {'loss': 0.4548, 'grad_norm': 5.544338315054307, 'learning_rate': 1.2551291557778721e-06, 'epoch': 0.68} 68%|██████▊ | 8323/12313 [6:14:19<3:00:50, 2.72s/it] 68%|██████▊ | 8324/12313 [6:14:22<3:00:37, 2.72s/it] {'loss': 0.5272, 'grad_norm': 4.16562191240731, 'learning_rate': 1.2545589045004627e-06, 'epoch': 0.68} 68%|██████▊ | 8324/12313 [6:14:22<3:00:37, 2.72s/it] 68%|██████▊ | 8325/12313 [6:14:24<2:56:31, 2.66s/it] {'loss': 0.5807, 'grad_norm': 8.317648547442525, 'learning_rate': 1.2539887394009855e-06, 'epoch': 0.68} 68%|██████▊ | 8325/12313 [6:14:24<2:56:31, 2.66s/it] 68%|██████▊ | 8326/12313 [6:14:27<2:56:17, 2.65s/it] {'loss': 0.4716, 'grad_norm': 4.863432965134022, 'learning_rate': 1.2534186605188933e-06, 'epoch': 0.68} 68%|██████▊ | 8326/12313 [6:14:27<2:56:17, 2.65s/it] 68%|██████▊ | 8327/12313 [6:14:30<2:57:14, 2.67s/it] {'loss': 0.3962, 'grad_norm': 4.473812223631319, 'learning_rate': 1.2528486678936313e-06, 'epoch': 0.68} 68%|██████▊ | 8327/12313 [6:14:30<2:57:14, 2.67s/it] 68%|██████▊ | 8328/12313 [6:14:32<2:57:28, 2.67s/it] {'loss': 0.4288, 'grad_norm': 7.282308721393281, 'learning_rate': 1.2522787615646421e-06, 'epoch': 0.68} 68%|██████▊ | 8328/12313 [6:14:32<2:57:28, 2.67s/it] 68%|██████▊ | 8329/12313 [6:14:35<2:58:38, 2.69s/it] {'loss': 0.4208, 'grad_norm': 6.136375474933964, 'learning_rate': 1.251708941571358e-06, 'epoch': 0.68} 68%|██████▊ | 8329/12313 [6:14:35<2:58:38, 2.69s/it] 68%|██████▊ | 8330/12313 [6:14:38<2:56:54, 2.66s/it] {'loss': 0.3731, 'grad_norm': 8.250435251721406, 'learning_rate': 1.2511392079532087e-06, 'epoch': 0.68} 68%|██████▊ | 8330/12313 [6:14:38<2:56:54, 2.66s/it] 68%|██████▊ | 8331/12313 [6:14:40<2:54:22, 2.63s/it] {'loss': 0.4431, 'grad_norm': 8.8618136486422, 'learning_rate': 1.2505695607496176e-06, 'epoch': 0.68} 68%|██████▊ | 8331/12313 [6:14:40<2:54:22, 2.63s/it] 68%|██████▊ | 8332/12313 [6:14:43<2:57:21, 2.67s/it] {'loss': 0.437, 'grad_norm': 6.809802423773869, 'learning_rate': 1.2500000000000007e-06, 'epoch': 0.68} 68%|██████▊ | 8332/12313 [6:14:43<2:57:21, 2.67s/it] 68%|██████▊ | 8333/12313 [6:14:46<2:59:02, 2.70s/it] {'loss': 0.6663, 'grad_norm': 4.412176686499365, 'learning_rate': 1.2494305257437669e-06, 'epoch': 0.68} 68%|██████▊ | 8333/12313 [6:14:46<2:59:02, 2.70s/it] 68%|██████▊ | 8334/12313 [6:14:48<2:56:34, 2.66s/it] {'loss': 0.5233, 'grad_norm': 8.259279196472452, 'learning_rate': 1.2488611380203234e-06, 'epoch': 0.68} 68%|██████▊ | 8334/12313 [6:14:48<2:56:34, 2.66s/it] 68%|██████▊ | 8335/12313 [6:14:51<2:51:31, 2.59s/it] {'loss': 0.4985, 'grad_norm': 5.681125220205126, 'learning_rate': 1.2482918368690666e-06, 'epoch': 0.68} 68%|██████▊ | 8335/12313 [6:14:51<2:51:31, 2.59s/it] 68%|██████▊ | 8336/12313 [6:14:54<2:56:55, 2.67s/it] {'loss': 0.5764, 'grad_norm': 8.25956808119689, 'learning_rate': 1.24772262232939e-06, 'epoch': 0.68} 68%|██████▊ | 8336/12313 [6:14:54<2:56:55, 2.67s/it] 68%|██████▊ | 8337/12313 [6:14:56<2:53:30, 2.62s/it] {'loss': 0.5972, 'grad_norm': 5.608911686082642, 'learning_rate': 1.2471534944406813e-06, 'epoch': 0.68} 68%|██████▊ | 8337/12313 [6:14:56<2:53:30, 2.62s/it] 68%|██████▊ | 8338/12313 [6:14:59<2:57:27, 2.68s/it] {'loss': 0.4957, 'grad_norm': 3.3213708352263054, 'learning_rate': 1.2465844532423201e-06, 'epoch': 0.68} 68%|██████▊ | 8338/12313 [6:14:59<2:57:27, 2.68s/it] 68%|██████▊ | 8339/12313 [6:15:02<3:02:12, 2.75s/it] {'loss': 0.5549, 'grad_norm': 7.167255021326034, 'learning_rate': 1.2460154987736806e-06, 'epoch': 0.68} 68%|██████▊ | 8339/12313 [6:15:02<3:02:12, 2.75s/it] 68%|██████▊ | 8340/12313 [6:15:05<3:01:25, 2.74s/it] {'loss': 0.427, 'grad_norm': 4.442760200949274, 'learning_rate': 1.2454466310741326e-06, 'epoch': 0.68} 68%|██████▊ | 8340/12313 [6:15:05<3:01:25, 2.74s/it] 68%|██████▊ | 8341/12313 [6:15:07<3:02:40, 2.76s/it] {'loss': 0.4285, 'grad_norm': 5.768478872426691, 'learning_rate': 1.244877850183038e-06, 'epoch': 0.68} 68%|██████▊ | 8341/12313 [6:15:07<3:02:40, 2.76s/it] 68%|██████▊ | 8342/12313 [6:15:10<3:00:04, 2.72s/it] {'loss': 0.5469, 'grad_norm': 4.225339833775357, 'learning_rate': 1.2443091561397527e-06, 'epoch': 0.68} 68%|██████▊ | 8342/12313 [6:15:10<3:00:04, 2.72s/it] 68%|██████▊ | 8343/12313 [6:15:13<3:00:03, 2.72s/it] {'loss': 0.4678, 'grad_norm': 6.322105186178602, 'learning_rate': 1.2437405489836282e-06, 'epoch': 0.68} 68%|██████▊ | 8343/12313 [6:15:13<3:00:03, 2.72s/it] 68%|██████▊ | 8344/12313 [6:15:15<2:58:51, 2.70s/it] {'loss': 0.4633, 'grad_norm': 12.80035724783234, 'learning_rate': 1.2431720287540097e-06, 'epoch': 0.68} 68%|██████▊ | 8344/12313 [6:15:15<2:58:51, 2.70s/it] 68%|██████▊ | 8345/12313 [6:15:18<2:56:08, 2.66s/it] {'loss': 0.5027, 'grad_norm': 6.865099951483512, 'learning_rate': 1.2426035954902356e-06, 'epoch': 0.68} 68%|██████▊ | 8345/12313 [6:15:18<2:56:08, 2.66s/it] 68%|██████▊ | 8346/12313 [6:15:21<2:56:12, 2.67s/it] {'loss': 0.4114, 'grad_norm': 4.2665462756693975, 'learning_rate': 1.2420352492316368e-06, 'epoch': 0.68} 68%|██████▊ | 8346/12313 [6:15:21<2:56:12, 2.67s/it] 68%|██████▊ | 8347/12313 [6:15:23<2:56:59, 2.68s/it] {'loss': 0.4601, 'grad_norm': 4.207313035828698, 'learning_rate': 1.2414669900175423e-06, 'epoch': 0.68} 68%|██████▊ | 8347/12313 [6:15:23<2:56:59, 2.68s/it] 68%|██████▊ | 8348/12313 [6:15:26<2:53:33, 2.63s/it] {'loss': 0.3487, 'grad_norm': 5.802065214743892, 'learning_rate': 1.2408988178872699e-06, 'epoch': 0.68} 68%|██████▊ | 8348/12313 [6:15:26<2:53:33, 2.63s/it] 68%|██████▊ | 8349/12313 [6:15:28<2:51:10, 2.59s/it] {'loss': 0.4018, 'grad_norm': 5.071217203345165, 'learning_rate': 1.240330732880136e-06, 'epoch': 0.68} 68%|██████▊ | 8349/12313 [6:15:28<2:51:10, 2.59s/it] 68%|██████▊ | 8350/12313 [6:15:31<2:49:18, 2.56s/it] {'loss': 0.5006, 'grad_norm': 4.658634987558885, 'learning_rate': 1.2397627350354494e-06, 'epoch': 0.68} 68%|██████▊ | 8350/12313 [6:15:31<2:49:18, 2.56s/it] 68%|██████▊ | 8351/12313 [6:15:33<2:50:05, 2.58s/it] {'loss': 0.3748, 'grad_norm': 5.989526021591514, 'learning_rate': 1.2391948243925119e-06, 'epoch': 0.68} 68%|██████▊ | 8351/12313 [6:15:33<2:50:05, 2.58s/it] 68%|██████▊ | 8352/12313 [6:15:36<2:56:07, 2.67s/it] {'loss': 0.3414, 'grad_norm': 4.543912419314957, 'learning_rate': 1.238627000990619e-06, 'epoch': 0.68} 68%|██████▊ | 8352/12313 [6:15:36<2:56:07, 2.67s/it] 68%|██████▊ | 8353/12313 [6:15:39<2:50:53, 2.59s/it] {'loss': 0.4755, 'grad_norm': 4.935095050500174, 'learning_rate': 1.2380592648690629e-06, 'epoch': 0.68} 68%|██████▊ | 8353/12313 [6:15:39<2:50:53, 2.59s/it] 68%|██████▊ | 8354/12313 [6:15:41<2:51:00, 2.59s/it] {'loss': 0.4053, 'grad_norm': 3.9622208310484024, 'learning_rate': 1.2374916160671268e-06, 'epoch': 0.68} 68%|██████▊ | 8354/12313 [6:15:41<2:51:00, 2.59s/it] 68%|██████▊ | 8355/12313 [6:15:44<2:54:11, 2.64s/it] {'loss': 0.5236, 'grad_norm': 4.65626323141216, 'learning_rate': 1.2369240546240881e-06, 'epoch': 0.68} 68%|██████▊ | 8355/12313 [6:15:44<2:54:11, 2.64s/it] 68%|██████▊ | 8356/12313 [6:15:47<3:03:06, 2.78s/it] {'loss': 0.4848, 'grad_norm': 5.707169042711556, 'learning_rate': 1.2363565805792202e-06, 'epoch': 0.68} 68%|██████▊ | 8356/12313 [6:15:47<3:03:06, 2.78s/it] 68%|██████▊ | 8357/12313 [6:15:50<3:02:21, 2.77s/it] {'loss': 0.4024, 'grad_norm': 5.695511088930679, 'learning_rate': 1.2357891939717903e-06, 'epoch': 0.68} 68%|██████▊ | 8357/12313 [6:15:50<3:02:21, 2.77s/it] 68%|██████▊ | 8358/12313 [6:15:53<2:58:46, 2.71s/it] {'loss': 0.4305, 'grad_norm': 5.28913545779317, 'learning_rate': 1.2352218948410563e-06, 'epoch': 0.68} 68%|██████▊ | 8358/12313 [6:15:53<2:58:46, 2.71s/it] 68%|██████▊ | 8359/12313 [6:15:55<2:58:41, 2.71s/it] {'loss': 0.5757, 'grad_norm': 3.680524773252283, 'learning_rate': 1.2346546832262743e-06, 'epoch': 0.68} 68%|██████▊ | 8359/12313 [6:15:55<2:58:41, 2.71s/it] 68%|██████▊ | 8360/12313 [6:15:58<2:57:47, 2.70s/it] {'loss': 0.5091, 'grad_norm': 4.937519409498853, 'learning_rate': 1.2340875591666917e-06, 'epoch': 0.68} 68%|██████▊ | 8360/12313 [6:15:58<2:57:47, 2.70s/it] 68%|██████▊ | 8361/12313 [6:16:00<2:52:47, 2.62s/it] {'loss': 0.3707, 'grad_norm': 6.252444692643531, 'learning_rate': 1.2335205227015494e-06, 'epoch': 0.68} 68%|██████▊ | 8361/12313 [6:16:00<2:52:47, 2.62s/it] 68%|██████▊ | 8362/12313 [6:16:03<2:54:06, 2.64s/it] {'loss': 0.4521, 'grad_norm': 6.749516651304727, 'learning_rate': 1.2329535738700838e-06, 'epoch': 0.68} 68%|██████▊ | 8362/12313 [6:16:03<2:54:06, 2.64s/it] 68%|██████▊ | 8363/12313 [6:16:06<2:53:55, 2.64s/it] {'loss': 0.4651, 'grad_norm': 6.554192643461585, 'learning_rate': 1.232386712711526e-06, 'epoch': 0.68} 68%|██████▊ | 8363/12313 [6:16:06<2:53:55, 2.64s/it] 68%|██████▊ | 8364/12313 [6:16:08<2:51:19, 2.60s/it] {'loss': 0.5727, 'grad_norm': 6.372502999598231, 'learning_rate': 1.2318199392650993e-06, 'epoch': 0.68} 68%|██████▊ | 8364/12313 [6:16:08<2:51:19, 2.60s/it] 68%|██████▊ | 8365/12313 [6:16:11<2:51:26, 2.61s/it] {'loss': 0.3813, 'grad_norm': 7.402271446804732, 'learning_rate': 1.23125325357002e-06, 'epoch': 0.68} 68%|██████▊ | 8365/12313 [6:16:11<2:51:26, 2.61s/it] 68%|██████▊ | 8366/12313 [6:16:13<2:52:31, 2.62s/it] {'loss': 0.3856, 'grad_norm': 4.56010296625159, 'learning_rate': 1.2306866556655016e-06, 'epoch': 0.68} 68%|██████▊ | 8366/12313 [6:16:13<2:52:31, 2.62s/it] 68%|██████▊ | 8367/12313 [6:16:17<3:01:24, 2.76s/it] {'loss': 0.3799, 'grad_norm': 5.070138865153647, 'learning_rate': 1.2301201455907492e-06, 'epoch': 0.68} 68%|██████▊ | 8367/12313 [6:16:17<3:01:24, 2.76s/it] 68%|██████▊ | 8368/12313 [6:16:19<3:00:21, 2.74s/it] {'loss': 0.4336, 'grad_norm': 7.907043621920382, 'learning_rate': 1.2295537233849608e-06, 'epoch': 0.68} 68%|██████▊ | 8368/12313 [6:16:19<3:00:21, 2.74s/it] 68%|██████▊ | 8369/12313 [6:16:22<2:57:07, 2.69s/it] {'loss': 0.4805, 'grad_norm': 4.083402933054175, 'learning_rate': 1.2289873890873311e-06, 'epoch': 0.68} 68%|██████▊ | 8369/12313 [6:16:22<2:57:07, 2.69s/it] 68%|██████▊ | 8370/12313 [6:16:25<2:56:25, 2.68s/it] {'loss': 0.5801, 'grad_norm': 3.3225519411749134, 'learning_rate': 1.2284211427370483e-06, 'epoch': 0.68} 68%|██████▊ | 8370/12313 [6:16:25<2:56:25, 2.68s/it] 68%|██████▊ | 8371/12313 [6:16:27<2:50:24, 2.59s/it] {'loss': 0.4576, 'grad_norm': 5.893743990272938, 'learning_rate': 1.2278549843732915e-06, 'epoch': 0.68} 68%|██████▊ | 8371/12313 [6:16:27<2:50:24, 2.59s/it] 68%|██████▊ | 8372/12313 [6:16:29<2:49:38, 2.58s/it] {'loss': 0.5446, 'grad_norm': 4.351833235603589, 'learning_rate': 1.2272889140352382e-06, 'epoch': 0.68} 68%|██████▊ | 8372/12313 [6:16:29<2:49:38, 2.58s/it] 68%|██████▊ | 8373/12313 [6:16:32<2:49:51, 2.59s/it] {'loss': 0.4169, 'grad_norm': 6.1661761775892945, 'learning_rate': 1.2267229317620564e-06, 'epoch': 0.68} 68%|██████▊ | 8373/12313 [6:16:32<2:49:51, 2.59s/it] 68%|██████▊ | 8374/12313 [6:16:35<2:48:52, 2.57s/it] {'loss': 0.4341, 'grad_norm': 3.8116371519348404, 'learning_rate': 1.2261570375929077e-06, 'epoch': 0.68} 68%|██████▊ | 8374/12313 [6:16:35<2:48:52, 2.57s/it] 68%|██████▊ | 8375/12313 [6:16:37<2:50:10, 2.59s/it] {'loss': 0.5365, 'grad_norm': 8.305006268325108, 'learning_rate': 1.2255912315669507e-06, 'epoch': 0.68} 68%|██████▊ | 8375/12313 [6:16:37<2:50:10, 2.59s/it] 68%|██████▊ | 8376/12313 [6:16:40<2:56:27, 2.69s/it] {'loss': 0.3122, 'grad_norm': 3.87785735314886, 'learning_rate': 1.2250255137233363e-06, 'epoch': 0.68} 68%|██████▊ | 8376/12313 [6:16:40<2:56:27, 2.69s/it] 68%|██████▊ | 8377/12313 [6:16:43<2:56:55, 2.70s/it] {'loss': 0.4254, 'grad_norm': 5.364139835040124, 'learning_rate': 1.224459884101209e-06, 'epoch': 0.68} 68%|██████▊ | 8377/12313 [6:16:43<2:56:55, 2.70s/it] 68%|██████▊ | 8378/12313 [6:16:45<2:53:35, 2.65s/it] {'loss': 0.5725, 'grad_norm': 5.505941463868022, 'learning_rate': 1.2238943427397059e-06, 'epoch': 0.68} 68%|██████▊ | 8378/12313 [6:16:45<2:53:35, 2.65s/it] 68%|██████▊ | 8379/12313 [6:16:48<2:56:20, 2.69s/it] {'loss': 0.4677, 'grad_norm': 3.2670918728153495, 'learning_rate': 1.2233288896779617e-06, 'epoch': 0.68} 68%|██████▊ | 8379/12313 [6:16:48<2:56:20, 2.69s/it] 68%|██████▊ | 8380/12313 [6:16:51<2:56:52, 2.70s/it] {'loss': 0.4548, 'grad_norm': 5.650997362758408, 'learning_rate': 1.2227635249551014e-06, 'epoch': 0.68} 68%|██████▊ | 8380/12313 [6:16:51<2:56:52, 2.70s/it] 68%|██████▊ | 8381/12313 [6:16:53<2:55:11, 2.67s/it] {'loss': 0.533, 'grad_norm': 4.85029252716426, 'learning_rate': 1.2221982486102446e-06, 'epoch': 0.68} 68%|██████▊ | 8381/12313 [6:16:53<2:55:11, 2.67s/it] 68%|██████▊ | 8382/12313 [6:16:56<2:51:29, 2.62s/it] {'loss': 0.5872, 'grad_norm': 6.520961416314167, 'learning_rate': 1.2216330606825063e-06, 'epoch': 0.68} 68%|██████▊ | 8382/12313 [6:16:56<2:51:29, 2.62s/it] 68%|██████▊ | 8383/12313 [6:16:59<2:51:34, 2.62s/it] {'loss': 0.463, 'grad_norm': 4.358523321449324, 'learning_rate': 1.2210679612109957e-06, 'epoch': 0.68} 68%|██████▊ | 8383/12313 [6:16:59<2:51:34, 2.62s/it] 68%|██████▊ | 8384/12313 [6:17:01<2:51:01, 2.61s/it] {'loss': 0.4915, 'grad_norm': 3.3319401584526256, 'learning_rate': 1.2205029502348123e-06, 'epoch': 0.68} 68%|██████▊ | 8384/12313 [6:17:01<2:51:01, 2.61s/it] 68%|██████▊ | 8385/12313 [6:17:04<2:50:38, 2.61s/it] {'loss': 0.3504, 'grad_norm': 5.916515103224009, 'learning_rate': 1.2199380277930542e-06, 'epoch': 0.68} 68%|██████▊ | 8385/12313 [6:17:04<2:50:38, 2.61s/it] 68%|██████▊ | 8386/12313 [6:17:06<2:50:30, 2.61s/it] {'loss': 0.5314, 'grad_norm': 3.720584300258901, 'learning_rate': 1.2193731939248098e-06, 'epoch': 0.68} 68%|██████▊ | 8386/12313 [6:17:06<2:50:30, 2.61s/it] 68%|██████▊ | 8387/12313 [6:17:09<2:50:55, 2.61s/it] {'loss': 0.4826, 'grad_norm': 6.706752035654295, 'learning_rate': 1.218808448669162e-06, 'epoch': 0.68} 68%|██████▊ | 8387/12313 [6:17:09<2:50:55, 2.61s/it] 68%|██████▊ | 8388/12313 [6:17:12<2:51:17, 2.62s/it] {'loss': 0.3701, 'grad_norm': 7.136044672639504, 'learning_rate': 1.218243792065189e-06, 'epoch': 0.68} 68%|██████▊ | 8388/12313 [6:17:12<2:51:17, 2.62s/it] 68%|██████▊ | 8389/12313 [6:17:14<2:52:26, 2.64s/it] {'loss': 0.511, 'grad_norm': 15.62272184000503, 'learning_rate': 1.2176792241519628e-06, 'epoch': 0.68} 68%|██████▊ | 8389/12313 [6:17:14<2:52:26, 2.64s/it] 68%|██████▊ | 8390/12313 [6:17:17<2:54:48, 2.67s/it] {'loss': 0.4626, 'grad_norm': 5.7775925878468275, 'learning_rate': 1.2171147449685469e-06, 'epoch': 0.68} 68%|██████▊ | 8390/12313 [6:17:17<2:54:48, 2.67s/it] 68%|██████▊ | 8391/12313 [6:17:20<2:58:30, 2.73s/it] {'loss': 0.3721, 'grad_norm': 8.76813533021635, 'learning_rate': 1.2165503545540017e-06, 'epoch': 0.68} 68%|██████▊ | 8391/12313 [6:17:20<2:58:30, 2.73s/it] 68%|██████▊ | 8392/12313 [6:17:23<2:58:00, 2.72s/it] {'loss': 0.5439, 'grad_norm': 2.7821535648469307, 'learning_rate': 1.2159860529473796e-06, 'epoch': 0.68} 68%|██████▊ | 8392/12313 [6:17:23<2:58:00, 2.72s/it] 68%|██████▊ | 8393/12313 [6:17:25<2:58:06, 2.73s/it] {'loss': 0.4945, 'grad_norm': 8.296103570527087, 'learning_rate': 1.2154218401877263e-06, 'epoch': 0.68} 68%|██████▊ | 8393/12313 [6:17:25<2:58:06, 2.73s/it] 68%|██████▊ | 8394/12313 [6:17:28<2:57:32, 2.72s/it] {'loss': 0.5059, 'grad_norm': 3.680639071610561, 'learning_rate': 1.214857716314083e-06, 'epoch': 0.68} 68%|██████▊ | 8394/12313 [6:17:28<2:57:32, 2.72s/it] 68%|██████▊ | 8395/12313 [6:17:31<2:56:28, 2.70s/it] {'loss': 0.4787, 'grad_norm': 3.5629979371942997, 'learning_rate': 1.2142936813654848e-06, 'epoch': 0.68} 68%|██████▊ | 8395/12313 [6:17:31<2:56:28, 2.70s/it] 68%|██████▊ | 8396/12313 [6:17:34<3:02:15, 2.79s/it] {'loss': 0.5225, 'grad_norm': 13.02646445886847, 'learning_rate': 1.21372973538096e-06, 'epoch': 0.68} 68%|██████▊ | 8396/12313 [6:17:34<3:02:15, 2.79s/it] 68%|██████▊ | 8397/12313 [6:17:36<3:00:33, 2.77s/it] {'loss': 0.3445, 'grad_norm': 6.585522603186978, 'learning_rate': 1.2131658783995285e-06, 'epoch': 0.68} 68%|██████▊ | 8397/12313 [6:17:36<3:00:33, 2.77s/it] 68%|██████▊ | 8398/12313 [6:17:39<2:54:51, 2.68s/it] {'loss': 0.6751, 'grad_norm': 4.999188751543709, 'learning_rate': 1.212602110460209e-06, 'epoch': 0.68} 68%|██████▊ | 8398/12313 [6:17:39<2:54:51, 2.68s/it] 68%|██████▊ | 8399/12313 [6:17:42<3:06:16, 2.86s/it] {'loss': 0.464, 'grad_norm': 5.137382890011477, 'learning_rate': 1.2120384316020098e-06, 'epoch': 0.68} 68%|██████▊ | 8399/12313 [6:17:42<3:06:16, 2.86s/it] 68%|██████▊ | 8400/12313 [6:17:45<3:10:59, 2.93s/it] {'loss': 0.3924, 'grad_norm': 3.57144580039995, 'learning_rate': 1.2114748418639339e-06, 'epoch': 0.68} 68%|██████▊ | 8400/12313 [6:17:45<3:10:59, 2.93s/it] 68%|██████▊ | 8401/12313 [6:17:48<3:06:04, 2.85s/it] {'loss': 0.5204, 'grad_norm': 6.448424519995378, 'learning_rate': 1.2109113412849792e-06, 'epoch': 0.68} 68%|██████▊ | 8401/12313 [6:17:48<3:06:04, 2.85s/it] 68%|██████▊ | 8402/12313 [6:17:51<3:02:07, 2.79s/it] {'loss': 0.5435, 'grad_norm': 4.80310339792193, 'learning_rate': 1.2103479299041388e-06, 'epoch': 0.68} 68%|██████▊ | 8402/12313 [6:17:51<3:02:07, 2.79s/it] 68%|██████▊ | 8403/12313 [6:17:53<2:58:21, 2.74s/it] {'loss': 0.514, 'grad_norm': 11.6534555123514, 'learning_rate': 1.209784607760395e-06, 'epoch': 0.68} 68%|██████▊ | 8403/12313 [6:17:53<2:58:21, 2.74s/it] 68%|██████▊ | 8404/12313 [6:17:56<2:55:24, 2.69s/it] {'loss': 0.5202, 'grad_norm': 14.006283811833011, 'learning_rate': 1.209221374892729e-06, 'epoch': 0.68} 68%|██████▊ | 8404/12313 [6:17:56<2:55:24, 2.69s/it] 68%|██████▊ | 8405/12313 [6:17:58<2:53:45, 2.67s/it] {'loss': 0.6359, 'grad_norm': 3.410546012906801, 'learning_rate': 1.2086582313401125e-06, 'epoch': 0.68} 68%|██████▊ | 8405/12313 [6:17:58<2:53:45, 2.67s/it] 68%|██████▊ | 8406/12313 [6:18:01<2:49:50, 2.61s/it] {'loss': 0.6307, 'grad_norm': 4.650396850410136, 'learning_rate': 1.208095177141511e-06, 'epoch': 0.68} 68%|██████▊ | 8406/12313 [6:18:01<2:49:50, 2.61s/it] 68%|██████▊ | 8407/12313 [6:18:04<2:51:59, 2.64s/it] {'loss': 0.5884, 'grad_norm': 4.067175768365543, 'learning_rate': 1.2075322123358857e-06, 'epoch': 0.68} 68%|██████▊ | 8407/12313 [6:18:04<2:51:59, 2.64s/it] 68%|██████▊ | 8408/12313 [6:18:06<2:47:01, 2.57s/it] {'loss': 0.4241, 'grad_norm': 7.834411232708035, 'learning_rate': 1.2069693369621924e-06, 'epoch': 0.68} 68%|██████▊ | 8408/12313 [6:18:06<2:47:01, 2.57s/it] 68%|██████▊ | 8409/12313 [6:18:09<2:49:09, 2.60s/it] {'loss': 0.4524, 'grad_norm': 5.025485826761079, 'learning_rate': 1.2064065510593765e-06, 'epoch': 0.68} 68%|██████▊ | 8409/12313 [6:18:09<2:49:09, 2.60s/it] 68%|██████▊ | 8410/12313 [6:18:11<2:50:22, 2.62s/it] {'loss': 0.6125, 'grad_norm': 7.200545827802387, 'learning_rate': 1.205843854666382e-06, 'epoch': 0.68} 68%|██████▊ | 8410/12313 [6:18:11<2:50:22, 2.62s/it] 68%|██████▊ | 8411/12313 [6:18:14<2:57:13, 2.73s/it] {'loss': 0.6645, 'grad_norm': 5.911187736116562, 'learning_rate': 1.2052812478221437e-06, 'epoch': 0.68} 68%|██████▊ | 8411/12313 [6:18:14<2:57:13, 2.73s/it] 68%|██████▊ | 8412/12313 [6:18:17<3:00:38, 2.78s/it] {'loss': 0.4591, 'grad_norm': 7.850988389705168, 'learning_rate': 1.2047187305655898e-06, 'epoch': 0.68} 68%|██████▊ | 8412/12313 [6:18:17<3:00:38, 2.78s/it] 68%|██████▊ | 8413/12313 [6:18:20<2:54:21, 2.68s/it] {'loss': 0.6595, 'grad_norm': 4.511296495523972, 'learning_rate': 1.2041563029356454e-06, 'epoch': 0.68} 68%|██████▊ | 8413/12313 [6:18:20<2:54:21, 2.68s/it] 68%|██████▊ | 8414/12313 [6:18:22<2:56:25, 2.71s/it] {'loss': 0.5137, 'grad_norm': 5.3954091188162145, 'learning_rate': 1.203593964971226e-06, 'epoch': 0.68} 68%|██████▊ | 8414/12313 [6:18:23<2:56:25, 2.71s/it] 68%|██████▊ | 8415/12313 [6:18:25<2:53:51, 2.68s/it] {'loss': 0.5769, 'grad_norm': 3.303858814227981, 'learning_rate': 1.2030317167112438e-06, 'epoch': 0.68} 68%|██████▊ | 8415/12313 [6:18:25<2:53:51, 2.68s/it] 68%|██████▊ | 8416/12313 [6:18:28<2:55:38, 2.70s/it] {'loss': 0.4826, 'grad_norm': 3.763109647115813, 'learning_rate': 1.2024695581946016e-06, 'epoch': 0.68} 68%|██████▊ | 8416/12313 [6:18:28<2:55:38, 2.70s/it] 68%|██████▊ | 8417/12313 [6:18:30<2:50:59, 2.63s/it] {'loss': 0.3603, 'grad_norm': 17.92152213202901, 'learning_rate': 1.2019074894602005e-06, 'epoch': 0.68} 68%|██████▊ | 8417/12313 [6:18:30<2:50:59, 2.63s/it] 68%|██████▊ | 8418/12313 [6:18:33<2:52:23, 2.66s/it] {'loss': 0.4264, 'grad_norm': 6.268708930822519, 'learning_rate': 1.2013455105469304e-06, 'epoch': 0.68} 68%|██████▊ | 8418/12313 [6:18:33<2:52:23, 2.66s/it] 68%|██████▊ | 8419/12313 [6:18:36<2:57:16, 2.73s/it] {'loss': 0.5712, 'grad_norm': 4.52113050826669, 'learning_rate': 1.2007836214936773e-06, 'epoch': 0.68} 68%|██████▊ | 8419/12313 [6:18:36<2:57:16, 2.73s/it] 68%|██████▊ | 8420/12313 [6:18:39<2:56:54, 2.73s/it] {'loss': 0.5343, 'grad_norm': 5.948974609805699, 'learning_rate': 1.2002218223393213e-06, 'epoch': 0.68} 68%|██████▊ | 8420/12313 [6:18:39<2:56:54, 2.73s/it] 68%|██████▊ | 8421/12313 [6:18:41<2:54:33, 2.69s/it] {'loss': 0.6497, 'grad_norm': 5.13305028563167, 'learning_rate': 1.1996601131227376e-06, 'epoch': 0.68} 68%|██████▊ | 8421/12313 [6:18:41<2:54:33, 2.69s/it] 68%|██████▊ | 8422/12313 [6:18:44<2:55:38, 2.71s/it] {'loss': 0.6081, 'grad_norm': 6.071521774254982, 'learning_rate': 1.1990984938827907e-06, 'epoch': 0.68} 68%|██████▊ | 8422/12313 [6:18:44<2:55:38, 2.71s/it] 68%|██████▊ | 8423/12313 [6:18:47<2:54:21, 2.69s/it] {'loss': 0.5464, 'grad_norm': 4.917477348975693, 'learning_rate': 1.1985369646583442e-06, 'epoch': 0.68} 68%|██████▊ | 8423/12313 [6:18:47<2:54:21, 2.69s/it] 68%|██████▊ | 8424/12313 [6:18:50<3:00:05, 2.78s/it] {'loss': 0.4677, 'grad_norm': 4.304996098971284, 'learning_rate': 1.1979755254882519e-06, 'epoch': 0.68} 68%|██████▊ | 8424/12313 [6:18:50<3:00:05, 2.78s/it] 68%|██████▊ | 8425/12313 [6:18:52<2:56:47, 2.73s/it] {'loss': 0.3471, 'grad_norm': 7.848305808110954, 'learning_rate': 1.1974141764113617e-06, 'epoch': 0.68} 68%|██████▊ | 8425/12313 [6:18:52<2:56:47, 2.73s/it] 68%|██████▊ | 8426/12313 [6:18:55<2:58:59, 2.76s/it] {'loss': 0.4901, 'grad_norm': 5.645856868747358, 'learning_rate': 1.1968529174665173e-06, 'epoch': 0.68} 68%|██████▊ | 8426/12313 [6:18:55<2:58:59, 2.76s/it] 68%|██████▊ | 8427/12313 [6:18:58<2:59:07, 2.77s/it] {'loss': 0.5054, 'grad_norm': 5.273055048309659, 'learning_rate': 1.1962917486925532e-06, 'epoch': 0.68} 68%|██████▊ | 8427/12313 [6:18:58<2:59:07, 2.77s/it] 68%|██████▊ | 8428/12313 [6:19:01<3:01:19, 2.80s/it] {'loss': 0.3776, 'grad_norm': 4.090778363630613, 'learning_rate': 1.1957306701283002e-06, 'epoch': 0.68} 68%|██████▊ | 8428/12313 [6:19:01<3:01:19, 2.80s/it] 68%|██████▊ | 8429/12313 [6:19:03<2:59:46, 2.78s/it] {'loss': 0.5705, 'grad_norm': 6.031487669992356, 'learning_rate': 1.1951696818125835e-06, 'epoch': 0.68} 68%|██████▊ | 8429/12313 [6:19:03<2:59:46, 2.78s/it] 68%|██████▊ | 8430/12313 [6:19:06<2:58:22, 2.76s/it] {'loss': 0.3882, 'grad_norm': 5.241073571056581, 'learning_rate': 1.1946087837842188e-06, 'epoch': 0.68} 68%|██████▊ | 8430/12313 [6:19:06<2:58:22, 2.76s/it] 68%|██████▊ | 8431/12313 [6:19:09<2:58:23, 2.76s/it] {'loss': 0.486, 'grad_norm': 6.683258623229494, 'learning_rate': 1.1940479760820177e-06, 'epoch': 0.68} 68%|██████▊ | 8431/12313 [6:19:09<2:58:23, 2.76s/it] 68%|██████▊ | 8432/12313 [6:19:12<3:01:06, 2.80s/it] {'loss': 0.4291, 'grad_norm': 5.092498342825334, 'learning_rate': 1.1934872587447838e-06, 'epoch': 0.68} 68%|██████▊ | 8432/12313 [6:19:12<3:01:06, 2.80s/it] 68%|██████▊ | 8433/12313 [6:19:15<3:04:33, 2.85s/it] {'loss': 0.5162, 'grad_norm': 2.9158897378436746, 'learning_rate': 1.1929266318113172e-06, 'epoch': 0.68} 68%|██████▊ | 8433/12313 [6:19:15<3:04:33, 2.85s/it] 68%|██████▊ | 8434/12313 [6:19:17<2:59:57, 2.78s/it] {'loss': 0.4291, 'grad_norm': 6.199252123306311, 'learning_rate': 1.192366095320411e-06, 'epoch': 0.68} 68%|██████▊ | 8434/12313 [6:19:17<2:59:57, 2.78s/it] 69%|██████▊ | 8435/12313 [6:19:20<2:58:06, 2.76s/it] {'loss': 0.398, 'grad_norm': 6.833595326483498, 'learning_rate': 1.1918056493108493e-06, 'epoch': 0.69} 69%|██████▊ | 8435/12313 [6:19:20<2:58:06, 2.76s/it] 69%|██████▊ | 8436/12313 [6:19:23<2:55:27, 2.72s/it] {'loss': 0.5889, 'grad_norm': 4.488586364488279, 'learning_rate': 1.1912452938214142e-06, 'epoch': 0.69} 69%|██████▊ | 8436/12313 [6:19:23<2:55:27, 2.72s/it] 69%|██████▊ | 8437/12313 [6:19:26<2:56:14, 2.73s/it] {'loss': 0.5559, 'grad_norm': 7.767681900407471, 'learning_rate': 1.1906850288908783e-06, 'epoch': 0.69} 69%|██████▊ | 8437/12313 [6:19:26<2:56:14, 2.73s/it] 69%|██████▊ | 8438/12313 [6:19:28<2:53:09, 2.68s/it] {'loss': 0.4359, 'grad_norm': 4.340043418686966, 'learning_rate': 1.1901248545580082e-06, 'epoch': 0.69} 69%|██████▊ | 8438/12313 [6:19:28<2:53:09, 2.68s/it] 69%|██████▊ | 8439/12313 [6:19:31<2:50:54, 2.65s/it] {'loss': 0.4433, 'grad_norm': 4.93465575158808, 'learning_rate': 1.1895647708615665e-06, 'epoch': 0.69} 69%|██████▊ | 8439/12313 [6:19:31<2:50:54, 2.65s/it] 69%|██████▊ | 8440/12313 [6:19:34<3:03:50, 2.85s/it] {'loss': 0.4608, 'grad_norm': 4.964788137277078, 'learning_rate': 1.1890047778403063e-06, 'epoch': 0.69} 69%|██████▊ | 8440/12313 [6:19:34<3:03:50, 2.85s/it] 69%|██████▊ | 8441/12313 [6:19:37<3:00:55, 2.80s/it] {'loss': 0.6192, 'grad_norm': 4.5014832518222905, 'learning_rate': 1.1884448755329772e-06, 'epoch': 0.69} 69%|██████▊ | 8441/12313 [6:19:37<3:00:55, 2.80s/it] 69%|██████▊ | 8442/12313 [6:19:39<2:57:11, 2.75s/it] {'loss': 0.65, 'grad_norm': 3.814073027261063, 'learning_rate': 1.1878850639783224e-06, 'epoch': 0.69} 69%|██████▊ | 8442/12313 [6:19:39<2:57:11, 2.75s/it] 69%|██████▊ | 8443/12313 [6:19:42<2:59:06, 2.78s/it] {'loss': 0.4156, 'grad_norm': 4.698487957966528, 'learning_rate': 1.1873253432150769e-06, 'epoch': 0.69} 69%|██████▊ | 8443/12313 [6:19:42<2:59:06, 2.78s/it] 69%|██████▊ | 8444/12313 [6:19:46<3:15:44, 3.04s/it] {'loss': 0.4989, 'grad_norm': 2.6979949526342173, 'learning_rate': 1.1867657132819693e-06, 'epoch': 0.69} 69%|██████▊ | 8444/12313 [6:19:46<3:15:44, 3.04s/it] 69%|██████▊ | 8445/12313 [6:19:49<3:11:36, 2.97s/it] {'loss': 0.3499, 'grad_norm': 4.663840811404981, 'learning_rate': 1.1862061742177253e-06, 'epoch': 0.69} 69%|██████▊ | 8445/12313 [6:19:49<3:11:36, 2.97s/it] 69%|██████▊ | 8446/12313 [6:19:51<3:00:45, 2.80s/it] {'loss': 0.4283, 'grad_norm': 4.603946365527777, 'learning_rate': 1.1856467260610597e-06, 'epoch': 0.69} 69%|██████▊ | 8446/12313 [6:19:51<3:00:45, 2.80s/it] 69%|██████▊ | 8447/12313 [6:19:54<2:56:47, 2.74s/it] {'loss': 0.4741, 'grad_norm': 6.004017761597916, 'learning_rate': 1.1850873688506847e-06, 'epoch': 0.69} 69%|██████▊ | 8447/12313 [6:19:54<2:56:47, 2.74s/it] 69%|██████▊ | 8448/12313 [6:19:56<2:51:11, 2.66s/it] {'loss': 0.4488, 'grad_norm': 4.91811982386808, 'learning_rate': 1.1845281026253055e-06, 'epoch': 0.69} 69%|██████▊ | 8448/12313 [6:19:56<2:51:11, 2.66s/it] 69%|██████▊ | 8449/12313 [6:19:59<2:54:34, 2.71s/it] {'loss': 0.4576, 'grad_norm': 4.124631666496995, 'learning_rate': 1.1839689274236197e-06, 'epoch': 0.69} 69%|██████▊ | 8449/12313 [6:19:59<2:54:34, 2.71s/it] 69%|██████▊ | 8450/12313 [6:20:02<2:53:31, 2.70s/it] {'loss': 0.4204, 'grad_norm': 4.967951805934966, 'learning_rate': 1.183409843284319e-06, 'epoch': 0.69} 69%|██████▊ | 8450/12313 [6:20:02<2:53:31, 2.70s/it] 69%|██████▊ | 8451/12313 [6:20:04<2:52:07, 2.67s/it] {'loss': 0.501, 'grad_norm': 3.6946567527289083, 'learning_rate': 1.1828508502460884e-06, 'epoch': 0.69} 69%|██████▊ | 8451/12313 [6:20:04<2:52:07, 2.67s/it] 69%|██████▊ | 8452/12313 [6:20:07<2:54:05, 2.71s/it] {'loss': 0.5587, 'grad_norm': 4.3257591185306214, 'learning_rate': 1.1822919483476089e-06, 'epoch': 0.69} 69%|██████▊ | 8452/12313 [6:20:07<2:54:05, 2.71s/it] 69%|██████▊ | 8453/12313 [6:20:09<2:48:18, 2.62s/it] {'loss': 0.3784, 'grad_norm': 13.53355986373059, 'learning_rate': 1.1817331376275518e-06, 'epoch': 0.69} 69%|██████▊ | 8453/12313 [6:20:09<2:48:18, 2.62s/it] 69%|██████▊ | 8454/12313 [6:20:12<2:46:39, 2.59s/it] {'loss': 0.529, 'grad_norm': 4.686471250690919, 'learning_rate': 1.181174418124585e-06, 'epoch': 0.69} 69%|██████▊ | 8454/12313 [6:20:12<2:46:39, 2.59s/it] 69%|██████▊ | 8455/12313 [6:20:15<2:46:55, 2.60s/it] {'loss': 0.521, 'grad_norm': 13.019567449891584, 'learning_rate': 1.1806157898773694e-06, 'epoch': 0.69} 69%|██████▊ | 8455/12313 [6:20:15<2:46:55, 2.60s/it] 69%|██████▊ | 8456/12313 [6:20:17<2:47:20, 2.60s/it] {'loss': 0.4579, 'grad_norm': 5.024504582924606, 'learning_rate': 1.1800572529245581e-06, 'epoch': 0.69} 69%|██████▊ | 8456/12313 [6:20:17<2:47:20, 2.60s/it] 69%|██████▊ | 8457/12313 [6:20:20<2:49:01, 2.63s/it] {'loss': 0.4946, 'grad_norm': 5.622740484481374, 'learning_rate': 1.1794988073047986e-06, 'epoch': 0.69} 69%|██████▊ | 8457/12313 [6:20:20<2:49:01, 2.63s/it] 69%|██████▊ | 8458/12313 [6:20:23<2:51:13, 2.66s/it] {'loss': 0.454, 'grad_norm': 3.383472516465761, 'learning_rate': 1.1789404530567338e-06, 'epoch': 0.69} 69%|██████▊ | 8458/12313 [6:20:23<2:51:13, 2.66s/it] 69%|██████▊ | 8459/12313 [6:20:26<2:57:27, 2.76s/it] {'loss': 0.4916, 'grad_norm': 39.06057548040847, 'learning_rate': 1.178382190218997e-06, 'epoch': 0.69} 69%|██████▊ | 8459/12313 [6:20:26<2:57:27, 2.76s/it] 69%|██████▊ | 8460/12313 [6:20:28<2:57:22, 2.76s/it] {'loss': 0.5374, 'grad_norm': 5.13942966732467, 'learning_rate': 1.1778240188302181e-06, 'epoch': 0.69} 69%|██████▊ | 8460/12313 [6:20:28<2:57:22, 2.76s/it] 69%|██████▊ | 8461/12313 [6:20:31<2:49:19, 2.64s/it] {'loss': 0.4787, 'grad_norm': 5.2491128634756805, 'learning_rate': 1.177265938929021e-06, 'epoch': 0.69} 69%|██████▊ | 8461/12313 [6:20:31<2:49:19, 2.64s/it] 69%|██████▊ | 8462/12313 [6:20:33<2:50:43, 2.66s/it] {'loss': 0.4962, 'grad_norm': 3.883090224748362, 'learning_rate': 1.1767079505540198e-06, 'epoch': 0.69} 69%|██████▊ | 8462/12313 [6:20:33<2:50:43, 2.66s/it] 69%|██████▊ | 8463/12313 [6:20:36<2:56:11, 2.75s/it] {'loss': 0.4901, 'grad_norm': 4.459207504966209, 'learning_rate': 1.1761500537438246e-06, 'epoch': 0.69} 69%|██████▊ | 8463/12313 [6:20:36<2:56:11, 2.75s/it] 69%|██████▊ | 8464/12313 [6:20:39<2:55:36, 2.74s/it] {'loss': 0.6049, 'grad_norm': 5.526598548112002, 'learning_rate': 1.1755922485370397e-06, 'epoch': 0.69} 69%|██████▊ | 8464/12313 [6:20:39<2:55:36, 2.74s/it] 69%|██████▊ | 8465/12313 [6:20:42<2:56:56, 2.76s/it] {'loss': 0.6439, 'grad_norm': 3.9962235924628, 'learning_rate': 1.1750345349722611e-06, 'epoch': 0.69} 69%|██████▊ | 8465/12313 [6:20:42<2:56:56, 2.76s/it] 69%|██████▉ | 8466/12313 [6:20:44<2:53:46, 2.71s/it] {'loss': 0.5233, 'grad_norm': 5.430777327111154, 'learning_rate': 1.1744769130880814e-06, 'epoch': 0.69} 69%|██████▉ | 8466/12313 [6:20:44<2:53:46, 2.71s/it] 69%|██████▉ | 8467/12313 [6:20:47<2:54:06, 2.72s/it] {'loss': 0.4564, 'grad_norm': 5.478225172555752, 'learning_rate': 1.1739193829230833e-06, 'epoch': 0.69} 69%|██████▉ | 8467/12313 [6:20:47<2:54:06, 2.72s/it] 69%|██████▉ | 8468/12313 [6:20:50<2:50:11, 2.66s/it] {'loss': 0.5561, 'grad_norm': 14.175270639083902, 'learning_rate': 1.1733619445158465e-06, 'epoch': 0.69} 69%|██████▉ | 8468/12313 [6:20:50<2:50:11, 2.66s/it] 69%|██████▉ | 8469/12313 [6:20:52<2:51:11, 2.67s/it] {'loss': 0.439, 'grad_norm': 4.726662335713709, 'learning_rate': 1.1728045979049421e-06, 'epoch': 0.69} 69%|██████▉ | 8469/12313 [6:20:52<2:51:11, 2.67s/it] 69%|██████▉ | 8470/12313 [6:20:55<2:50:08, 2.66s/it] {'loss': 0.4297, 'grad_norm': 6.585567352748735, 'learning_rate': 1.1722473431289344e-06, 'epoch': 0.69} 69%|██████▉ | 8470/12313 [6:20:55<2:50:08, 2.66s/it] 69%|██████▉ | 8471/12313 [6:20:58<2:51:02, 2.67s/it] {'loss': 0.4433, 'grad_norm': 7.151511500781442, 'learning_rate': 1.1716901802263845e-06, 'epoch': 0.69} 69%|██████▉ | 8471/12313 [6:20:58<2:51:02, 2.67s/it] 69%|██████▉ | 8472/12313 [6:21:00<2:50:18, 2.66s/it] {'loss': 0.3943, 'grad_norm': 13.543001844242685, 'learning_rate': 1.171133109235843e-06, 'epoch': 0.69} 69%|██████▉ | 8472/12313 [6:21:00<2:50:18, 2.66s/it] 69%|██████▉ | 8473/12313 [6:21:03<2:46:46, 2.61s/it] {'loss': 0.3842, 'grad_norm': 6.360919578390468, 'learning_rate': 1.1705761301958576e-06, 'epoch': 0.69} 69%|██████▉ | 8473/12313 [6:21:03<2:46:46, 2.61s/it] 69%|██████▉ | 8474/12313 [6:21:05<2:47:36, 2.62s/it] {'loss': 0.3475, 'grad_norm': 5.289828859594367, 'learning_rate': 1.170019243144969e-06, 'epoch': 0.69} 69%|██████▉ | 8474/12313 [6:21:05<2:47:36, 2.62s/it] 69%|██████▉ | 8475/12313 [6:21:08<2:46:41, 2.61s/it] {'loss': 0.5738, 'grad_norm': 6.436765945910618, 'learning_rate': 1.16946244812171e-06, 'epoch': 0.69} 69%|██████▉ | 8475/12313 [6:21:08<2:46:41, 2.61s/it] 69%|██████▉ | 8476/12313 [6:21:11<2:45:12, 2.58s/it] {'loss': 0.5015, 'grad_norm': 9.265057636256202, 'learning_rate': 1.1689057451646072e-06, 'epoch': 0.69} 69%|██████▉ | 8476/12313 [6:21:11<2:45:12, 2.58s/it] 69%|██████▉ | 8477/12313 [6:21:13<2:45:18, 2.59s/it] {'loss': 0.6312, 'grad_norm': 5.62906939860262, 'learning_rate': 1.1683491343121825e-06, 'epoch': 0.69} 69%|██████▉ | 8477/12313 [6:21:13<2:45:18, 2.59s/it] 69%|██████▉ | 8478/12313 [6:21:16<2:51:25, 2.68s/it] {'loss': 0.5705, 'grad_norm': 3.608605687615943, 'learning_rate': 1.1677926156029495e-06, 'epoch': 0.69} 69%|██████▉ | 8478/12313 [6:21:16<2:51:25, 2.68s/it] 69%|██████▉ | 8479/12313 [6:21:19<2:52:30, 2.70s/it] {'loss': 0.481, 'grad_norm': 7.326218578037191, 'learning_rate': 1.1672361890754165e-06, 'epoch': 0.69} 69%|██████▉ | 8479/12313 [6:21:19<2:52:30, 2.70s/it] 69%|██████▉ | 8480/12313 [6:21:21<2:50:25, 2.67s/it] {'loss': 0.5597, 'grad_norm': 3.101458446740605, 'learning_rate': 1.1666798547680871e-06, 'epoch': 0.69} 69%|██████▉ | 8480/12313 [6:21:21<2:50:25, 2.67s/it] 69%|██████▉ | 8481/12313 [6:21:24<2:50:08, 2.66s/it] {'loss': 0.6213, 'grad_norm': 4.57887880856938, 'learning_rate': 1.166123612719455e-06, 'epoch': 0.69} 69%|██████▉ | 8481/12313 [6:21:24<2:50:08, 2.66s/it] 69%|██████▉ | 8482/12313 [6:21:27<2:50:47, 2.67s/it] {'loss': 0.5009, 'grad_norm': 10.086781507059271, 'learning_rate': 1.1655674629680083e-06, 'epoch': 0.69} 69%|██████▉ | 8482/12313 [6:21:27<2:50:47, 2.67s/it] 69%|██████▉ | 8483/12313 [6:21:29<2:46:57, 2.62s/it] {'loss': 0.7435, 'grad_norm': 5.0242271976636514, 'learning_rate': 1.165011405552232e-06, 'epoch': 0.69} 69%|██████▉ | 8483/12313 [6:21:29<2:46:57, 2.62s/it] 69%|██████▉ | 8484/12313 [6:21:32<2:48:08, 2.63s/it] {'loss': 0.4607, 'grad_norm': 7.2774357651661274, 'learning_rate': 1.164455440510601e-06, 'epoch': 0.69} 69%|██████▉ | 8484/12313 [6:21:32<2:48:08, 2.63s/it] 69%|██████▉ | 8485/12313 [6:21:34<2:44:48, 2.58s/it] {'loss': 0.5113, 'grad_norm': 4.136550631258544, 'learning_rate': 1.1638995678815843e-06, 'epoch': 0.69} 69%|██████▉ | 8485/12313 [6:21:34<2:44:48, 2.58s/it] 69%|██████▉ | 8486/12313 [6:21:37<2:45:24, 2.59s/it] {'loss': 0.4338, 'grad_norm': 5.117305452537372, 'learning_rate': 1.1633437877036462e-06, 'epoch': 0.69} 69%|██████▉ | 8486/12313 [6:21:37<2:45:24, 2.59s/it] 69%|██████▉ | 8487/12313 [6:21:40<2:44:06, 2.57s/it] {'loss': 0.5338, 'grad_norm': 3.714646489723773, 'learning_rate': 1.162788100015245e-06, 'epoch': 0.69} 69%|██████▉ | 8487/12313 [6:21:40<2:44:06, 2.57s/it] 69%|██████▉ | 8488/12313 [6:21:42<2:49:11, 2.65s/it] {'loss': 0.6151, 'grad_norm': 3.9854040499315877, 'learning_rate': 1.1622325048548303e-06, 'epoch': 0.69} 69%|██████▉ | 8488/12313 [6:21:42<2:49:11, 2.65s/it] 69%|██████▉ | 8489/12313 [6:21:45<2:47:10, 2.62s/it] {'loss': 0.5104, 'grad_norm': 4.8147850700984804, 'learning_rate': 1.1616770022608447e-06, 'epoch': 0.69} 69%|██████▉ | 8489/12313 [6:21:45<2:47:10, 2.62s/it] 69%|██████▉ | 8490/12313 [6:21:47<2:43:23, 2.56s/it] {'loss': 0.4588, 'grad_norm': 12.49347709660076, 'learning_rate': 1.161121592271729e-06, 'epoch': 0.69} 69%|██████▉ | 8490/12313 [6:21:47<2:43:23, 2.56s/it] 69%|██████▉ | 8491/12313 [6:21:50<2:52:01, 2.70s/it] {'loss': 0.5458, 'grad_norm': 4.382600842190844, 'learning_rate': 1.1605662749259123e-06, 'epoch': 0.69} 69%|██████▉ | 8491/12313 [6:21:50<2:52:01, 2.70s/it] 69%|██████▉ | 8492/12313 [6:21:53<2:52:46, 2.71s/it] {'loss': 0.5746, 'grad_norm': 3.4100489659901823, 'learning_rate': 1.1600110502618204e-06, 'epoch': 0.69} 69%|██████▉ | 8492/12313 [6:21:53<2:52:46, 2.71s/it] 69%|██████▉ | 8493/12313 [6:21:56<2:52:08, 2.70s/it] {'loss': 0.4043, 'grad_norm': 5.056416537418844, 'learning_rate': 1.1594559183178727e-06, 'epoch': 0.69} 69%|██████▉ | 8493/12313 [6:21:56<2:52:08, 2.70s/it] 69%|██████▉ | 8494/12313 [6:21:59<2:54:04, 2.73s/it] {'loss': 0.5067, 'grad_norm': 4.3441729287251105, 'learning_rate': 1.158900879132481e-06, 'epoch': 0.69} 69%|██████▉ | 8494/12313 [6:21:59<2:54:04, 2.73s/it] 69%|██████▉ | 8495/12313 [6:22:01<2:51:51, 2.70s/it] {'loss': 0.6198, 'grad_norm': 3.6975358584627447, 'learning_rate': 1.1583459327440496e-06, 'epoch': 0.69} 69%|██████▉ | 8495/12313 [6:22:01<2:51:51, 2.70s/it] 69%|██████▉ | 8496/12313 [6:22:04<2:53:26, 2.73s/it] {'loss': 0.557, 'grad_norm': 5.765909988508904, 'learning_rate': 1.1577910791909802e-06, 'epoch': 0.69} 69%|██████▉ | 8496/12313 [6:22:04<2:53:26, 2.73s/it] 69%|██████▉ | 8497/12313 [6:22:06<2:47:56, 2.64s/it] {'loss': 0.5423, 'grad_norm': 6.346348844345966, 'learning_rate': 1.1572363185116648e-06, 'epoch': 0.69} 69%|██████▉ | 8497/12313 [6:22:06<2:47:56, 2.64s/it] 69%|██████▉ | 8498/12313 [6:22:09<2:47:19, 2.63s/it] {'loss': 0.7113, 'grad_norm': 6.761851681144281, 'learning_rate': 1.1566816507444884e-06, 'epoch': 0.69} 69%|██████▉ | 8498/12313 [6:22:09<2:47:19, 2.63s/it] 69%|██████▉ | 8499/12313 [6:22:11<2:42:24, 2.56s/it] {'loss': 0.4832, 'grad_norm': 4.9827548056260484, 'learning_rate': 1.1561270759278326e-06, 'epoch': 0.69} 69%|██████▉ | 8499/12313 [6:22:11<2:42:24, 2.56s/it] 69%|██████▉ | 8500/12313 [6:22:14<2:44:55, 2.60s/it] {'loss': 0.4773, 'grad_norm': 7.290496908601368, 'learning_rate': 1.1555725941000715e-06, 'epoch': 0.69} 69%|██████▉ | 8500/12313 [6:22:14<2:44:55, 2.60s/it] 69%|██████▉ | 8501/12313 [6:22:17<2:41:06, 2.54s/it] {'loss': 0.4249, 'grad_norm': 4.93347902991029, 'learning_rate': 1.1550182052995706e-06, 'epoch': 0.69} 69%|██████▉ | 8501/12313 [6:22:17<2:41:06, 2.54s/it] 69%|██████▉ | 8502/12313 [6:22:19<2:42:53, 2.56s/it] {'loss': 0.4705, 'grad_norm': 4.857856203684281, 'learning_rate': 1.154463909564693e-06, 'epoch': 0.69} 69%|██████▉ | 8502/12313 [6:22:19<2:42:53, 2.56s/it] 69%|██████▉ | 8503/12313 [6:22:22<2:45:39, 2.61s/it] {'loss': 0.4833, 'grad_norm': 16.432653414970492, 'learning_rate': 1.1539097069337913e-06, 'epoch': 0.69} 69%|██████▉ | 8503/12313 [6:22:22<2:45:39, 2.61s/it] 69%|██████▉ | 8504/12313 [6:22:25<2:47:38, 2.64s/it] {'loss': 0.5461, 'grad_norm': 4.517576424813998, 'learning_rate': 1.1533555974452128e-06, 'epoch': 0.69} 69%|██████▉ | 8504/12313 [6:22:25<2:47:38, 2.64s/it] 69%|██████▉ | 8505/12313 [6:22:27<2:45:43, 2.61s/it] {'loss': 0.44, 'grad_norm': 6.387090563981134, 'learning_rate': 1.1528015811373004e-06, 'epoch': 0.69} 69%|██████▉ | 8505/12313 [6:22:27<2:45:43, 2.61s/it] 69%|██████▉ | 8506/12313 [6:22:30<2:44:29, 2.59s/it] {'loss': 0.4357, 'grad_norm': 7.168256261694638, 'learning_rate': 1.1522476580483893e-06, 'epoch': 0.69} 69%|██████▉ | 8506/12313 [6:22:30<2:44:29, 2.59s/it] 69%|██████▉ | 8507/12313 [6:22:32<2:41:09, 2.54s/it] {'loss': 0.5764, 'grad_norm': 5.267349807599555, 'learning_rate': 1.1516938282168074e-06, 'epoch': 0.69} 69%|██████▉ | 8507/12313 [6:22:32<2:41:09, 2.54s/it] 69%|██████▉ | 8508/12313 [6:22:35<2:39:16, 2.51s/it] {'loss': 0.4762, 'grad_norm': 5.446388451950695, 'learning_rate': 1.151140091680876e-06, 'epoch': 0.69} 69%|██████▉ | 8508/12313 [6:22:35<2:39:16, 2.51s/it] 69%|██████▉ | 8509/12313 [6:22:37<2:44:58, 2.60s/it] {'loss': 0.4635, 'grad_norm': 7.776144758103636, 'learning_rate': 1.1505864484789122e-06, 'epoch': 0.69} 69%|██████▉ | 8509/12313 [6:22:37<2:44:58, 2.60s/it] 69%|██████▉ | 8510/12313 [6:22:40<2:46:26, 2.63s/it] {'loss': 0.4744, 'grad_norm': 4.652490283523999, 'learning_rate': 1.1500328986492246e-06, 'epoch': 0.69} 69%|██████▉ | 8510/12313 [6:22:40<2:46:26, 2.63s/it] 69%|██████▉ | 8511/12313 [6:22:43<2:49:10, 2.67s/it] {'loss': 0.5802, 'grad_norm': 5.281761321834011, 'learning_rate': 1.149479442230115e-06, 'epoch': 0.69} 69%|██████▉ | 8511/12313 [6:22:43<2:49:10, 2.67s/it] 69%|██████▉ | 8512/12313 [6:22:45<2:49:04, 2.67s/it] {'loss': 0.4812, 'grad_norm': 4.121830278672146, 'learning_rate': 1.1489260792598803e-06, 'epoch': 0.69} 69%|██████▉ | 8512/12313 [6:22:45<2:49:04, 2.67s/it] 69%|██████▉ | 8513/12313 [6:22:48<2:43:39, 2.58s/it] {'loss': 0.4427, 'grad_norm': 4.197838163994761, 'learning_rate': 1.1483728097768116e-06, 'epoch': 0.69} 69%|██████▉ | 8513/12313 [6:22:48<2:43:39, 2.58s/it] 69%|██████▉ | 8514/12313 [6:22:50<2:43:13, 2.58s/it] {'loss': 0.4649, 'grad_norm': 4.757540075293028, 'learning_rate': 1.14781963381919e-06, 'epoch': 0.69} 69%|██████▉ | 8514/12313 [6:22:50<2:43:13, 2.58s/it] 69%|██████▉ | 8515/12313 [6:22:53<2:50:02, 2.69s/it] {'loss': 0.4501, 'grad_norm': 4.228790147362145, 'learning_rate': 1.1472665514252943e-06, 'epoch': 0.69} 69%|██████▉ | 8515/12313 [6:22:53<2:50:02, 2.69s/it] 69%|██████▉ | 8516/12313 [6:22:56<2:56:21, 2.79s/it] {'loss': 0.4265, 'grad_norm': 3.198795799968073, 'learning_rate': 1.146713562633394e-06, 'epoch': 0.69} 69%|██████▉ | 8516/12313 [6:22:56<2:56:21, 2.79s/it] 69%|██████▉ | 8517/12313 [6:22:59<2:48:58, 2.67s/it] {'loss': 0.5258, 'grad_norm': 3.293645191890074, 'learning_rate': 1.1461606674817518e-06, 'epoch': 0.69} 69%|██████▉ | 8517/12313 [6:22:59<2:48:58, 2.67s/it] 69%|██████▉ | 8518/12313 [6:23:02<2:50:25, 2.69s/it] {'loss': 0.4102, 'grad_norm': 4.377855525825729, 'learning_rate': 1.1456078660086266e-06, 'epoch': 0.69} 69%|██████▉ | 8518/12313 [6:23:02<2:50:25, 2.69s/it] 69%|██████▉ | 8519/12313 [6:23:04<2:50:22, 2.69s/it] {'loss': 0.3486, 'grad_norm': 9.573511509775727, 'learning_rate': 1.1450551582522702e-06, 'epoch': 0.69} 69%|██████▉ | 8519/12313 [6:23:04<2:50:22, 2.69s/it] 69%|██████▉ | 8520/12313 [6:23:07<2:50:42, 2.70s/it] {'loss': 0.4863, 'grad_norm': 3.560198474639802, 'learning_rate': 1.1445025442509258e-06, 'epoch': 0.69} 69%|██████▉ | 8520/12313 [6:23:07<2:50:42, 2.70s/it] 69%|██████▉ | 8521/12313 [6:23:10<2:48:00, 2.66s/it] {'loss': 0.5363, 'grad_norm': 6.303365456673007, 'learning_rate': 1.1439500240428304e-06, 'epoch': 0.69} 69%|██████▉ | 8521/12313 [6:23:10<2:48:00, 2.66s/it] 69%|██████▉ | 8522/12313 [6:23:12<2:53:56, 2.75s/it] {'loss': 0.5908, 'grad_norm': 5.438200231090006, 'learning_rate': 1.1433975976662172e-06, 'epoch': 0.69} 69%|██████▉ | 8522/12313 [6:23:12<2:53:56, 2.75s/it] 69%|██████▉ | 8523/12313 [6:23:15<2:55:29, 2.78s/it] {'loss': 0.5535, 'grad_norm': 7.217503219294745, 'learning_rate': 1.1428452651593102e-06, 'epoch': 0.69} 69%|██████▉ | 8523/12313 [6:23:15<2:55:29, 2.78s/it] 69%|██████▉ | 8524/12313 [6:23:18<2:53:07, 2.74s/it] {'loss': 0.4252, 'grad_norm': 3.113332201559408, 'learning_rate': 1.142293026560328e-06, 'epoch': 0.69} 69%|██████▉ | 8524/12313 [6:23:18<2:53:07, 2.74s/it] 69%|██████▉ | 8525/12313 [6:23:22<3:09:47, 3.01s/it] {'loss': 0.5357, 'grad_norm': 3.7670638203585396, 'learning_rate': 1.1417408819074835e-06, 'epoch': 0.69} 69%|██████▉ | 8525/12313 [6:23:22<3:09:47, 3.01s/it] 69%|██████▉ | 8526/12313 [6:23:24<3:04:08, 2.92s/it] {'loss': 0.5814, 'grad_norm': 9.156807084218489, 'learning_rate': 1.1411888312389815e-06, 'epoch': 0.69} 69%|██████▉ | 8526/12313 [6:23:24<3:04:08, 2.92s/it] 69%|██████▉ | 8527/12313 [6:23:27<2:55:41, 2.78s/it] {'loss': 0.5775, 'grad_norm': 7.477340025472298, 'learning_rate': 1.1406368745930201e-06, 'epoch': 0.69} 69%|██████▉ | 8527/12313 [6:23:27<2:55:41, 2.78s/it] 69%|██████▉ | 8528/12313 [6:23:30<2:57:01, 2.81s/it] {'loss': 0.4531, 'grad_norm': 5.777723223269901, 'learning_rate': 1.140085012007794e-06, 'epoch': 0.69} 69%|██████▉ | 8528/12313 [6:23:30<2:57:01, 2.81s/it] 69%|██████▉ | 8529/12313 [6:23:32<2:52:02, 2.73s/it] {'loss': 0.5404, 'grad_norm': 17.932544473465622, 'learning_rate': 1.1395332435214873e-06, 'epoch': 0.69} 69%|██████▉ | 8529/12313 [6:23:32<2:52:02, 2.73s/it] 69%|██████▉ | 8530/12313 [6:23:35<2:49:59, 2.70s/it] {'loss': 0.4261, 'grad_norm': 6.420149798086578, 'learning_rate': 1.138981569172279e-06, 'epoch': 0.69} 69%|██████▉ | 8530/12313 [6:23:35<2:49:59, 2.70s/it] 69%|██████▉ | 8531/12313 [6:23:37<2:43:30, 2.59s/it] {'loss': 0.4474, 'grad_norm': 9.184826721296508, 'learning_rate': 1.1384299889983432e-06, 'epoch': 0.69} 69%|██████▉ | 8531/12313 [6:23:37<2:43:30, 2.59s/it] 69%|██████▉ | 8532/12313 [6:23:40<2:44:05, 2.60s/it] {'loss': 0.5644, 'grad_norm': 8.376036553312167, 'learning_rate': 1.1378785030378473e-06, 'epoch': 0.69} 69%|██████▉ | 8532/12313 [6:23:40<2:44:05, 2.60s/it] 69%|██████▉ | 8533/12313 [6:23:43<2:49:13, 2.69s/it] {'loss': 0.3391, 'grad_norm': 4.928837681497556, 'learning_rate': 1.137327111328949e-06, 'epoch': 0.69} 69%|██████▉ | 8533/12313 [6:23:43<2:49:13, 2.69s/it] 69%|██████▉ | 8534/12313 [6:23:45<2:49:48, 2.70s/it] {'loss': 0.453, 'grad_norm': 4.407213199355116, 'learning_rate': 1.1367758139098037e-06, 'epoch': 0.69} 69%|██████▉ | 8534/12313 [6:23:45<2:49:48, 2.70s/it] 69%|██████▉ | 8535/12313 [6:23:48<2:49:55, 2.70s/it] {'loss': 0.4148, 'grad_norm': 4.884167064220795, 'learning_rate': 1.1362246108185571e-06, 'epoch': 0.69} 69%|██████▉ | 8535/12313 [6:23:48<2:49:55, 2.70s/it] 69%|██████▉ | 8536/12313 [6:23:51<2:45:14, 2.63s/it] {'loss': 0.6555, 'grad_norm': 5.275613682866477, 'learning_rate': 1.135673502093349e-06, 'epoch': 0.69} 69%|██████▉ | 8536/12313 [6:23:51<2:45:14, 2.63s/it] 69%|██████▉ | 8537/12313 [6:23:53<2:43:57, 2.61s/it] {'loss': 0.4664, 'grad_norm': 4.72395706382707, 'learning_rate': 1.1351224877723137e-06, 'epoch': 0.69} 69%|██████▉ | 8537/12313 [6:23:53<2:43:57, 2.61s/it] 69%|██████▉ | 8538/12313 [6:23:56<2:47:48, 2.67s/it] {'loss': 0.4335, 'grad_norm': 6.683252761194182, 'learning_rate': 1.1345715678935802e-06, 'epoch': 0.69} 69%|██████▉ | 8538/12313 [6:23:56<2:47:48, 2.67s/it] 69%|██████▉ | 8539/12313 [6:23:59<2:46:31, 2.65s/it] {'loss': 0.4784, 'grad_norm': 3.6056858456464704, 'learning_rate': 1.1340207424952673e-06, 'epoch': 0.69} 69%|██████▉ | 8539/12313 [6:23:59<2:46:31, 2.65s/it] 69%|██████▉ | 8540/12313 [6:24:01<2:46:50, 2.65s/it] {'loss': 0.5434, 'grad_norm': 7.002949745318934, 'learning_rate': 1.133470011615489e-06, 'epoch': 0.69} 69%|██████▉ | 8540/12313 [6:24:01<2:46:50, 2.65s/it] 69%|██████▉ | 8541/12313 [6:24:04<2:45:28, 2.63s/it] {'loss': 0.5708, 'grad_norm': 6.659133719702008, 'learning_rate': 1.1329193752923543e-06, 'epoch': 0.69} 69%|██████▉ | 8541/12313 [6:24:04<2:45:28, 2.63s/it] 69%|██████▉ | 8542/12313 [6:24:07<2:51:41, 2.73s/it] {'loss': 0.4835, 'grad_norm': 12.35767990560376, 'learning_rate': 1.1323688335639637e-06, 'epoch': 0.69} 69%|██████▉ | 8542/12313 [6:24:07<2:51:41, 2.73s/it] 69%|██████▉ | 8543/12313 [6:24:09<2:48:29, 2.68s/it] {'loss': 0.4171, 'grad_norm': 3.77158348926073, 'learning_rate': 1.131818386468411e-06, 'epoch': 0.69} 69%|██████▉ | 8543/12313 [6:24:09<2:48:29, 2.68s/it] 69%|██████▉ | 8544/12313 [6:24:12<2:49:04, 2.69s/it] {'loss': 0.4684, 'grad_norm': 5.1758054265529205, 'learning_rate': 1.1312680340437848e-06, 'epoch': 0.69} 69%|██████▉ | 8544/12313 [6:24:12<2:49:04, 2.69s/it] 69%|██████▉ | 8545/12313 [6:24:15<2:47:18, 2.66s/it] {'loss': 0.2615, 'grad_norm': 4.915722528576737, 'learning_rate': 1.130717776328168e-06, 'epoch': 0.69} 69%|██████▉ | 8545/12313 [6:24:15<2:47:18, 2.66s/it] 69%|██████▉ | 8546/12313 [6:24:17<2:43:04, 2.60s/it] {'loss': 0.5899, 'grad_norm': 3.1585274055113914, 'learning_rate': 1.130167613359633e-06, 'epoch': 0.69} 69%|██████▉ | 8546/12313 [6:24:17<2:43:04, 2.60s/it] 69%|██████▉ | 8547/12313 [6:24:20<2:42:55, 2.60s/it] {'loss': 0.3803, 'grad_norm': 4.7926238667126055, 'learning_rate': 1.1296175451762504e-06, 'epoch': 0.69} 69%|██████▉ | 8547/12313 [6:24:20<2:42:55, 2.60s/it] 69%|██████▉ | 8548/12313 [6:24:22<2:45:40, 2.64s/it] {'loss': 0.5899, 'grad_norm': 4.742004176848653, 'learning_rate': 1.129067571816081e-06, 'epoch': 0.69} 69%|██████▉ | 8548/12313 [6:24:22<2:45:40, 2.64s/it] 69%|██████▉ | 8549/12313 [6:24:25<2:47:17, 2.67s/it] {'loss': 0.5, 'grad_norm': 4.740522772470059, 'learning_rate': 1.128517693317179e-06, 'epoch': 0.69} 69%|██████▉ | 8549/12313 [6:24:25<2:47:17, 2.67s/it] 69%|██████▉ | 8550/12313 [6:24:28<2:48:18, 2.68s/it] {'loss': 0.4379, 'grad_norm': 3.0796548456527457, 'learning_rate': 1.1279679097175944e-06, 'epoch': 0.69} 69%|██████▉ | 8550/12313 [6:24:28<2:48:18, 2.68s/it] 69%|██████▉ | 8551/12313 [6:24:31<2:51:34, 2.74s/it] {'loss': 0.4375, 'grad_norm': 4.872361456956494, 'learning_rate': 1.12741822105537e-06, 'epoch': 0.69} 69%|██████▉ | 8551/12313 [6:24:31<2:51:34, 2.74s/it] 69%|██████▉ | 8552/12313 [6:24:33<2:49:23, 2.70s/it] {'loss': 0.41, 'grad_norm': 3.8126089904616625, 'learning_rate': 1.1268686273685391e-06, 'epoch': 0.69} 69%|██████▉ | 8552/12313 [6:24:33<2:49:23, 2.70s/it] 69%|██████▉ | 8553/12313 [6:24:36<2:54:59, 2.79s/it] {'loss': 0.7237, 'grad_norm': 4.918083532251475, 'learning_rate': 1.1263191286951333e-06, 'epoch': 0.69} 69%|██████▉ | 8553/12313 [6:24:36<2:54:59, 2.79s/it] 69%|██████▉ | 8554/12313 [6:24:39<2:53:15, 2.77s/it] {'loss': 0.5379, 'grad_norm': 4.980898147720527, 'learning_rate': 1.1257697250731735e-06, 'epoch': 0.69} 69%|██████▉ | 8554/12313 [6:24:39<2:53:15, 2.77s/it] 69%|██████▉ | 8555/12313 [6:24:42<3:00:46, 2.89s/it] {'loss': 0.5215, 'grad_norm': 3.3037282568689204, 'learning_rate': 1.1252204165406753e-06, 'epoch': 0.69} 69%|██████▉ | 8555/12313 [6:24:42<3:00:46, 2.89s/it] 69%|██████▉ | 8556/12313 [6:24:45<2:59:14, 2.86s/it] {'loss': 0.5758, 'grad_norm': 4.960611865059338, 'learning_rate': 1.1246712031356486e-06, 'epoch': 0.69} 69%|██████▉ | 8556/12313 [6:24:45<2:59:14, 2.86s/it] 69%|██████▉ | 8557/12313 [6:24:48<2:55:06, 2.80s/it] {'loss': 0.4719, 'grad_norm': 7.319288707757544, 'learning_rate': 1.1241220848960952e-06, 'epoch': 0.69} 69%|██████▉ | 8557/12313 [6:24:48<2:55:06, 2.80s/it] 70%|██████▉ | 8558/12313 [6:24:51<2:58:12, 2.85s/it] {'loss': 0.4936, 'grad_norm': 3.819399015745177, 'learning_rate': 1.1235730618600126e-06, 'epoch': 0.7} 70%|██████▉ | 8558/12313 [6:24:51<2:58:12, 2.85s/it] 70%|██████▉ | 8559/12313 [6:24:53<2:57:58, 2.84s/it] {'loss': 0.5413, 'grad_norm': 3.538376081354946, 'learning_rate': 1.1230241340653888e-06, 'epoch': 0.7} 70%|██████▉ | 8559/12313 [6:24:53<2:57:58, 2.84s/it] 70%|██████▉ | 8560/12313 [6:24:56<2:54:48, 2.79s/it] {'loss': 0.5515, 'grad_norm': 6.249841085120328, 'learning_rate': 1.122475301550208e-06, 'epoch': 0.7} 70%|██████▉ | 8560/12313 [6:24:56<2:54:48, 2.79s/it] 70%|██████▉ | 8561/12313 [6:24:59<2:52:15, 2.75s/it] {'loss': 0.6036, 'grad_norm': 8.110595035089235, 'learning_rate': 1.121926564352446e-06, 'epoch': 0.7} 70%|██████▉ | 8561/12313 [6:24:59<2:52:15, 2.75s/it] 70%|██████▉ | 8562/12313 [6:25:01<2:48:43, 2.70s/it] {'loss': 0.7324, 'grad_norm': 4.635874973816129, 'learning_rate': 1.1213779225100715e-06, 'epoch': 0.7} 70%|██████▉ | 8562/12313 [6:25:01<2:48:43, 2.70s/it] 70%|██████▉ | 8563/12313 [6:25:04<2:43:46, 2.62s/it] {'loss': 0.52, 'grad_norm': 6.322869921153947, 'learning_rate': 1.1208293760610486e-06, 'epoch': 0.7} 70%|██████▉ | 8563/12313 [6:25:04<2:43:46, 2.62s/it] 70%|██████▉ | 8564/12313 [6:25:07<2:49:48, 2.72s/it] {'loss': 0.3976, 'grad_norm': 7.6067666576253545, 'learning_rate': 1.1202809250433345e-06, 'epoch': 0.7} 70%|██████▉ | 8564/12313 [6:25:07<2:49:48, 2.72s/it] 70%|██████▉ | 8565/12313 [6:25:10<2:52:59, 2.77s/it] {'loss': 0.5236, 'grad_norm': 4.351083583151557, 'learning_rate': 1.1197325694948774e-06, 'epoch': 0.7} 70%|██████▉ | 8565/12313 [6:25:10<2:52:59, 2.77s/it] 70%|██████▉ | 8566/12313 [6:25:12<2:51:30, 2.75s/it] {'loss': 0.4614, 'grad_norm': 5.090734435411529, 'learning_rate': 1.1191843094536225e-06, 'epoch': 0.7} 70%|██████▉ | 8566/12313 [6:25:12<2:51:30, 2.75s/it] 70%|██████▉ | 8567/12313 [6:25:15<2:50:34, 2.73s/it] {'loss': 0.3699, 'grad_norm': 6.001665213669127, 'learning_rate': 1.1186361449575055e-06, 'epoch': 0.7} 70%|██████▉ | 8567/12313 [6:25:15<2:50:34, 2.73s/it] 70%|██████▉ | 8568/12313 [6:25:18<2:49:58, 2.72s/it] {'loss': 0.6122, 'grad_norm': 6.350310371588752, 'learning_rate': 1.1180880760444558e-06, 'epoch': 0.7} 70%|██████▉ | 8568/12313 [6:25:18<2:49:58, 2.72s/it] 70%|██████▉ | 8569/12313 [6:25:21<2:54:53, 2.80s/it] {'loss': 0.4498, 'grad_norm': 4.918916151218767, 'learning_rate': 1.117540102752398e-06, 'epoch': 0.7} 70%|██████▉ | 8569/12313 [6:25:21<2:54:53, 2.80s/it] 70%|██████▉ | 8570/12313 [6:25:23<2:51:21, 2.75s/it] {'loss': 0.3585, 'grad_norm': 4.2337365050726286, 'learning_rate': 1.116992225119248e-06, 'epoch': 0.7} 70%|██████▉ | 8570/12313 [6:25:23<2:51:21, 2.75s/it] 70%|██████▉ | 8571/12313 [6:25:26<2:49:58, 2.73s/it] {'loss': 0.5052, 'grad_norm': 4.164412182555689, 'learning_rate': 1.1164444431829163e-06, 'epoch': 0.7} 70%|██████▉ | 8571/12313 [6:25:26<2:49:58, 2.73s/it] 70%|██████▉ | 8572/12313 [6:25:29<2:47:13, 2.68s/it] {'loss': 0.4742, 'grad_norm': 4.915037193176119, 'learning_rate': 1.1158967569813079e-06, 'epoch': 0.7} 70%|██████▉ | 8572/12313 [6:25:29<2:47:13, 2.68s/it] 70%|██████▉ | 8573/12313 [6:25:31<2:46:37, 2.67s/it] {'loss': 0.5764, 'grad_norm': 7.126882606950106, 'learning_rate': 1.1153491665523186e-06, 'epoch': 0.7} 70%|██████▉ | 8573/12313 [6:25:31<2:46:37, 2.67s/it] 70%|██████▉ | 8574/12313 [6:25:34<2:49:14, 2.72s/it] {'loss': 0.3955, 'grad_norm': 6.194568824850657, 'learning_rate': 1.1148016719338387e-06, 'epoch': 0.7} 70%|██████▉ | 8574/12313 [6:25:34<2:49:14, 2.72s/it] 70%|██████▉ | 8575/12313 [6:25:37<2:47:12, 2.68s/it] {'loss': 0.2946, 'grad_norm': 4.582251362036855, 'learning_rate': 1.1142542731637513e-06, 'epoch': 0.7} 70%|██████▉ | 8575/12313 [6:25:37<2:47:12, 2.68s/it] 70%|██████▉ | 8576/12313 [6:25:39<2:46:11, 2.67s/it] {'loss': 0.5785, 'grad_norm': 4.565291424432009, 'learning_rate': 1.1137069702799341e-06, 'epoch': 0.7} 70%|██████▉ | 8576/12313 [6:25:39<2:46:11, 2.67s/it] 70%|██████▉ | 8577/12313 [6:25:42<2:45:50, 2.66s/it] {'loss': 0.5502, 'grad_norm': 6.268014709041092, 'learning_rate': 1.1131597633202587e-06, 'epoch': 0.7} 70%|██████▉ | 8577/12313 [6:25:42<2:45:50, 2.66s/it] 70%|██████▉ | 8578/12313 [6:25:45<2:48:57, 2.71s/it] {'loss': 0.4602, 'grad_norm': 4.232016340106684, 'learning_rate': 1.1126126523225869e-06, 'epoch': 0.7} 70%|██████▉ | 8578/12313 [6:25:45<2:48:57, 2.71s/it] 70%|██████▉ | 8579/12313 [6:25:47<2:48:11, 2.70s/it] {'loss': 0.4341, 'grad_norm': 5.324130939463538, 'learning_rate': 1.112065637324778e-06, 'epoch': 0.7} 70%|██████▉ | 8579/12313 [6:25:47<2:48:11, 2.70s/it] 70%|██████▉ | 8580/12313 [6:25:50<2:47:28, 2.69s/it] {'loss': 0.5489, 'grad_norm': 3.6701229318957034, 'learning_rate': 1.1115187183646814e-06, 'epoch': 0.7} 70%|██████▉ | 8580/12313 [6:25:50<2:47:28, 2.69s/it] 70%|██████▉ | 8581/12313 [6:25:53<2:50:27, 2.74s/it] {'loss': 0.5336, 'grad_norm': 4.1075695508244365, 'learning_rate': 1.1109718954801398e-06, 'epoch': 0.7} 70%|██████▉ | 8581/12313 [6:25:53<2:50:27, 2.74s/it] 70%|██████▉ | 8582/12313 [6:25:55<2:45:13, 2.66s/it] {'loss': 0.5781, 'grad_norm': 4.14721530411858, 'learning_rate': 1.110425168708993e-06, 'epoch': 0.7} 70%|██████▉ | 8582/12313 [6:25:55<2:45:13, 2.66s/it] 70%|██████▉ | 8583/12313 [6:25:58<2:39:58, 2.57s/it] {'loss': 0.427, 'grad_norm': 5.832775679972837, 'learning_rate': 1.1098785380890696e-06, 'epoch': 0.7} 70%|██████▉ | 8583/12313 [6:25:58<2:39:58, 2.57s/it] 70%|██████▉ | 8584/12313 [6:26:01<2:41:53, 2.60s/it] {'loss': 0.4442, 'grad_norm': 3.33845605592306, 'learning_rate': 1.1093320036581936e-06, 'epoch': 0.7} 70%|██████▉ | 8584/12313 [6:26:01<2:41:53, 2.60s/it] 70%|██████▉ | 8585/12313 [6:26:03<2:43:35, 2.63s/it] {'loss': 0.368, 'grad_norm': 5.037481188891807, 'learning_rate': 1.1087855654541843e-06, 'epoch': 0.7} 70%|██████▉ | 8585/12313 [6:26:03<2:43:35, 2.63s/it] 70%|██████▉ | 8586/12313 [6:26:06<2:44:30, 2.65s/it] {'loss': 0.4573, 'grad_norm': 4.499749726051026, 'learning_rate': 1.1082392235148509e-06, 'epoch': 0.7} 70%|██████▉ | 8586/12313 [6:26:06<2:44:30, 2.65s/it] 70%|██████▉ | 8587/12313 [6:26:08<2:42:53, 2.62s/it] {'loss': 0.4741, 'grad_norm': 4.154523490167273, 'learning_rate': 1.1076929778779965e-06, 'epoch': 0.7} 70%|██████▉ | 8587/12313 [6:26:08<2:42:53, 2.62s/it] 70%|██████▉ | 8588/12313 [6:26:11<2:44:30, 2.65s/it] {'loss': 0.5117, 'grad_norm': 6.620859889971063, 'learning_rate': 1.1071468285814201e-06, 'epoch': 0.7} 70%|██████▉ | 8588/12313 [6:26:11<2:44:30, 2.65s/it] 70%|██████▉ | 8589/12313 [6:26:14<2:51:45, 2.77s/it] {'loss': 0.428, 'grad_norm': 5.322737847031269, 'learning_rate': 1.106600775662911e-06, 'epoch': 0.7} 70%|██████▉ | 8589/12313 [6:26:14<2:51:45, 2.77s/it] 70%|██████▉ | 8590/12313 [6:26:17<2:50:22, 2.75s/it] {'loss': 0.5052, 'grad_norm': 6.543366704108589, 'learning_rate': 1.1060548191602535e-06, 'epoch': 0.7} 70%|██████▉ | 8590/12313 [6:26:17<2:50:22, 2.75s/it] 70%|██████▉ | 8591/12313 [6:26:20<2:50:09, 2.74s/it] {'loss': 0.4051, 'grad_norm': 3.978052554444944, 'learning_rate': 1.105508959111226e-06, 'epoch': 0.7} 70%|██████▉ | 8591/12313 [6:26:20<2:50:09, 2.74s/it] 70%|██████▉ | 8592/12313 [6:26:22<2:49:56, 2.74s/it] {'loss': 0.5263, 'grad_norm': 4.013135595680058, 'learning_rate': 1.1049631955535985e-06, 'epoch': 0.7} 70%|██████▉ | 8592/12313 [6:26:22<2:49:56, 2.74s/it] 70%|██████▉ | 8593/12313 [6:26:25<2:49:21, 2.73s/it] {'loss': 0.4994, 'grad_norm': 7.931439768063147, 'learning_rate': 1.1044175285251348e-06, 'epoch': 0.7} 70%|██████▉ | 8593/12313 [6:26:25<2:49:21, 2.73s/it] 70%|██████▉ | 8594/12313 [6:26:28<2:48:54, 2.73s/it] {'loss': 0.5575, 'grad_norm': 7.9626002926512465, 'learning_rate': 1.1038719580635913e-06, 'epoch': 0.7} 70%|██████▉ | 8594/12313 [6:26:28<2:48:54, 2.73s/it] 70%|██████▉ | 8595/12313 [6:26:30<2:46:57, 2.69s/it] {'loss': 0.492, 'grad_norm': 5.399616750795122, 'learning_rate': 1.103326484206719e-06, 'epoch': 0.7} 70%|██████▉ | 8595/12313 [6:26:30<2:46:57, 2.69s/it] 70%|██████▉ | 8596/12313 [6:26:33<2:48:01, 2.71s/it] {'loss': 0.5324, 'grad_norm': 5.8829565085948765, 'learning_rate': 1.1027811069922634e-06, 'epoch': 0.7} 70%|██████▉ | 8596/12313 [6:26:33<2:48:01, 2.71s/it] 70%|██████▉ | 8597/12313 [6:26:36<2:45:40, 2.68s/it] {'loss': 0.4975, 'grad_norm': 4.044823741608967, 'learning_rate': 1.1022358264579593e-06, 'epoch': 0.7} 70%|██████▉ | 8597/12313 [6:26:36<2:45:40, 2.68s/it] 70%|██████▉ | 8598/12313 [6:26:38<2:44:57, 2.66s/it] {'loss': 0.6072, 'grad_norm': 5.039936723673592, 'learning_rate': 1.1016906426415397e-06, 'epoch': 0.7} 70%|██████▉ | 8598/12313 [6:26:38<2:44:57, 2.66s/it] 70%|██████▉ | 8599/12313 [6:26:41<2:41:41, 2.61s/it] {'loss': 0.4333, 'grad_norm': 12.620558241769995, 'learning_rate': 1.1011455555807272e-06, 'epoch': 0.7} 70%|██████▉ | 8599/12313 [6:26:41<2:41:41, 2.61s/it] 70%|██████▉ | 8600/12313 [6:26:43<2:39:59, 2.59s/it] {'loss': 0.4666, 'grad_norm': 4.360858885346072, 'learning_rate': 1.1006005653132376e-06, 'epoch': 0.7} 70%|██████▉ | 8600/12313 [6:26:43<2:39:59, 2.59s/it] 70%|██████▉ | 8601/12313 [6:26:46<2:39:40, 2.58s/it] {'loss': 0.5247, 'grad_norm': 4.573558742038945, 'learning_rate': 1.100055671876784e-06, 'epoch': 0.7} 70%|██████▉ | 8601/12313 [6:26:46<2:39:40, 2.58s/it] 70%|██████▉ | 8602/12313 [6:26:49<2:41:03, 2.60s/it] {'loss': 0.4746, 'grad_norm': 5.213935252357303, 'learning_rate': 1.0995108753090677e-06, 'epoch': 0.7} 70%|██████▉ | 8602/12313 [6:26:49<2:41:03, 2.60s/it] 70%|██████▉ | 8603/12313 [6:26:51<2:45:37, 2.68s/it] {'loss': 0.4982, 'grad_norm': 5.309900569340087, 'learning_rate': 1.0989661756477869e-06, 'epoch': 0.7} 70%|██████▉ | 8603/12313 [6:26:51<2:45:37, 2.68s/it] 70%|██████▉ | 8604/12313 [6:26:54<2:46:22, 2.69s/it] {'loss': 0.6216, 'grad_norm': 6.957485365514494, 'learning_rate': 1.0984215729306328e-06, 'epoch': 0.7} 70%|██████▉ | 8604/12313 [6:26:54<2:46:22, 2.69s/it] 70%|██████▉ | 8605/12313 [6:26:57<2:45:55, 2.68s/it] {'loss': 0.5468, 'grad_norm': 4.191127343883717, 'learning_rate': 1.097877067195288e-06, 'epoch': 0.7} 70%|██████▉ | 8605/12313 [6:26:57<2:45:55, 2.68s/it] 70%|██████▉ | 8606/12313 [6:27:00<2:50:26, 2.76s/it] {'loss': 0.5753, 'grad_norm': 4.214027815939813, 'learning_rate': 1.0973326584794286e-06, 'epoch': 0.7} 70%|██████▉ | 8606/12313 [6:27:00<2:50:26, 2.76s/it] 70%|██████▉ | 8607/12313 [6:27:03<2:49:45, 2.75s/it] {'loss': 0.3873, 'grad_norm': 5.627295117665019, 'learning_rate': 1.0967883468207265e-06, 'epoch': 0.7} 70%|██████▉ | 8607/12313 [6:27:03<2:49:45, 2.75s/it] 70%|██████▉ | 8608/12313 [6:27:05<2:45:12, 2.68s/it] {'loss': 0.4377, 'grad_norm': 5.494066620160367, 'learning_rate': 1.0962441322568437e-06, 'epoch': 0.7} 70%|██████▉ | 8608/12313 [6:27:05<2:45:12, 2.68s/it] 70%|██████▉ | 8609/12313 [6:27:08<2:43:45, 2.65s/it] {'loss': 0.4346, 'grad_norm': 4.933052728810609, 'learning_rate': 1.0957000148254387e-06, 'epoch': 0.7} 70%|██████▉ | 8609/12313 [6:27:08<2:43:45, 2.65s/it] 70%|██████▉ | 8610/12313 [6:27:10<2:43:47, 2.65s/it] {'loss': 0.4196, 'grad_norm': 4.67963560490583, 'learning_rate': 1.0951559945641592e-06, 'epoch': 0.7} 70%|██████▉ | 8610/12313 [6:27:10<2:43:47, 2.65s/it] 70%|██████▉ | 8611/12313 [6:27:13<2:39:52, 2.59s/it] {'loss': 0.5331, 'grad_norm': 4.555476078942722, 'learning_rate': 1.094612071510651e-06, 'epoch': 0.7} 70%|██████▉ | 8611/12313 [6:27:13<2:39:52, 2.59s/it] 70%|██████▉ | 8612/12313 [6:27:15<2:41:41, 2.62s/it] {'loss': 0.3957, 'grad_norm': 4.542359966546809, 'learning_rate': 1.0940682457025498e-06, 'epoch': 0.7} 70%|██████▉ | 8612/12313 [6:27:15<2:41:41, 2.62s/it] 70%|██████▉ | 8613/12313 [6:27:18<2:43:43, 2.66s/it] {'loss': 0.6409, 'grad_norm': 4.4445515035232495, 'learning_rate': 1.0935245171774842e-06, 'epoch': 0.7} 70%|██████▉ | 8613/12313 [6:27:18<2:43:43, 2.66s/it] 70%|██████▉ | 8614/12313 [6:27:21<2:47:53, 2.72s/it] {'loss': 0.5317, 'grad_norm': 5.499967666697894, 'learning_rate': 1.092980885973079e-06, 'epoch': 0.7} 70%|██████▉ | 8614/12313 [6:27:21<2:47:53, 2.72s/it] 70%|██████▉ | 8615/12313 [6:27:24<2:53:19, 2.81s/it] {'loss': 0.3992, 'grad_norm': 4.090307339205739, 'learning_rate': 1.0924373521269492e-06, 'epoch': 0.7} 70%|██████▉ | 8615/12313 [6:27:24<2:53:19, 2.81s/it] 70%|██████▉ | 8616/12313 [6:27:27<2:50:05, 2.76s/it] {'loss': 0.8069, 'grad_norm': 5.094031142055631, 'learning_rate': 1.091893915676705e-06, 'epoch': 0.7} 70%|██████▉ | 8616/12313 [6:27:27<2:50:05, 2.76s/it] 70%|██████▉ | 8617/12313 [6:27:30<2:51:34, 2.79s/it] {'loss': 0.5136, 'grad_norm': 5.102016167710116, 'learning_rate': 1.0913505766599506e-06, 'epoch': 0.7} 70%|██████▉ | 8617/12313 [6:27:30<2:51:34, 2.79s/it] 70%|██████▉ | 8618/12313 [6:27:32<2:49:39, 2.75s/it] {'loss': 0.4567, 'grad_norm': 6.840823358662633, 'learning_rate': 1.090807335114281e-06, 'epoch': 0.7} 70%|██████▉ | 8618/12313 [6:27:32<2:49:39, 2.75s/it] 70%|██████▉ | 8619/12313 [6:27:35<2:48:58, 2.74s/it] {'loss': 0.411, 'grad_norm': 6.604394180582011, 'learning_rate': 1.0902641910772852e-06, 'epoch': 0.7} 70%|██████▉ | 8619/12313 [6:27:35<2:48:58, 2.74s/it] 70%|███████ | 8620/12313 [6:27:37<2:43:48, 2.66s/it] {'loss': 0.432, 'grad_norm': 5.660730236893839, 'learning_rate': 1.0897211445865472e-06, 'epoch': 0.7} 70%|███████ | 8620/12313 [6:27:37<2:43:48, 2.66s/it] 70%|███████ | 8621/12313 [6:27:40<2:44:32, 2.67s/it] {'loss': 0.3665, 'grad_norm': 4.702707757036113, 'learning_rate': 1.089178195679641e-06, 'epoch': 0.7} 70%|███████ | 8621/12313 [6:27:40<2:44:32, 2.67s/it] 70%|███████ | 8622/12313 [6:27:43<2:40:49, 2.61s/it] {'loss': 0.4387, 'grad_norm': 4.96553057734693, 'learning_rate': 1.0886353443941373e-06, 'epoch': 0.7} 70%|███████ | 8622/12313 [6:27:43<2:40:49, 2.61s/it] 70%|███████ | 8623/12313 [6:27:45<2:40:54, 2.62s/it] {'loss': 0.3632, 'grad_norm': 8.264862786380114, 'learning_rate': 1.088092590767599e-06, 'epoch': 0.7} 70%|███████ | 8623/12313 [6:27:45<2:40:54, 2.62s/it] 70%|███████ | 8624/12313 [6:27:48<2:40:38, 2.61s/it] {'loss': 0.5675, 'grad_norm': 6.893091911108661, 'learning_rate': 1.0875499348375812e-06, 'epoch': 0.7} 70%|███████ | 8624/12313 [6:27:48<2:40:38, 2.61s/it] 70%|███████ | 8625/12313 [6:27:51<2:43:55, 2.67s/it] {'loss': 0.4464, 'grad_norm': 4.387426391179371, 'learning_rate': 1.0870073766416315e-06, 'epoch': 0.7} 70%|███████ | 8625/12313 [6:27:51<2:43:55, 2.67s/it] 70%|███████ | 8626/12313 [6:27:54<2:53:20, 2.82s/it] {'loss': 0.6075, 'grad_norm': 5.027871465737512, 'learning_rate': 1.0864649162172941e-06, 'epoch': 0.7} 70%|███████ | 8626/12313 [6:27:54<2:53:20, 2.82s/it] 70%|███████ | 8627/12313 [6:27:56<2:49:00, 2.75s/it] {'loss': 0.5228, 'grad_norm': 6.143259232572587, 'learning_rate': 1.0859225536021034e-06, 'epoch': 0.7} 70%|███████ | 8627/12313 [6:27:56<2:49:00, 2.75s/it] 70%|███████ | 8628/12313 [6:27:59<2:45:09, 2.69s/it] {'loss': 0.6185, 'grad_norm': 6.464771014837394, 'learning_rate': 1.0853802888335874e-06, 'epoch': 0.7} 70%|███████ | 8628/12313 [6:27:59<2:45:09, 2.69s/it] 70%|███████ | 8629/12313 [6:28:01<2:41:34, 2.63s/it] {'loss': 0.498, 'grad_norm': 4.205829517254945, 'learning_rate': 1.0848381219492684e-06, 'epoch': 0.7} 70%|███████ | 8629/12313 [6:28:01<2:41:34, 2.63s/it] 70%|███████ | 8630/12313 [6:28:04<2:47:29, 2.73s/it] {'loss': 0.4887, 'grad_norm': 3.9072366994873855, 'learning_rate': 1.0842960529866627e-06, 'epoch': 0.7} 70%|███████ | 8630/12313 [6:28:04<2:47:29, 2.73s/it] 70%|███████ | 8631/12313 [6:28:07<2:46:46, 2.72s/it] {'loss': 0.4346, 'grad_norm': 5.509854679596712, 'learning_rate': 1.0837540819832779e-06, 'epoch': 0.7} 70%|███████ | 8631/12313 [6:28:07<2:46:46, 2.72s/it] 70%|███████ | 8632/12313 [6:28:10<2:50:31, 2.78s/it] {'loss': 0.4646, 'grad_norm': 4.3762550261485, 'learning_rate': 1.0832122089766143e-06, 'epoch': 0.7} 70%|███████ | 8632/12313 [6:28:10<2:50:31, 2.78s/it] 70%|███████ | 8633/12313 [6:28:13<2:45:26, 2.70s/it] {'loss': 0.605, 'grad_norm': 4.21313805277909, 'learning_rate': 1.082670434004168e-06, 'epoch': 0.7} 70%|███████ | 8633/12313 [6:28:13<2:45:26, 2.70s/it] 70%|███████ | 8634/12313 [6:28:15<2:47:48, 2.74s/it] {'loss': 0.5164, 'grad_norm': 5.0290295451057245, 'learning_rate': 1.0821287571034261e-06, 'epoch': 0.7} 70%|███████ | 8634/12313 [6:28:15<2:47:48, 2.74s/it] 70%|███████ | 8635/12313 [6:28:18<2:42:20, 2.65s/it] {'loss': 0.5563, 'grad_norm': 3.852504953053529, 'learning_rate': 1.0815871783118701e-06, 'epoch': 0.7} 70%|███████ | 8635/12313 [6:28:18<2:42:20, 2.65s/it] 70%|███████ | 8636/12313 [6:28:20<2:41:07, 2.63s/it] {'loss': 0.4816, 'grad_norm': 4.4031006382832265, 'learning_rate': 1.0810456976669753e-06, 'epoch': 0.7} 70%|███████ | 8636/12313 [6:28:20<2:41:07, 2.63s/it] 70%|███████ | 8637/12313 [6:28:23<2:38:41, 2.59s/it] {'loss': 0.5762, 'grad_norm': 5.020999898610326, 'learning_rate': 1.0805043152062086e-06, 'epoch': 0.7} 70%|███████ | 8637/12313 [6:28:23<2:38:41, 2.59s/it] 70%|███████ | 8638/12313 [6:28:25<2:38:39, 2.59s/it] {'loss': 0.4189, 'grad_norm': 12.438678796300241, 'learning_rate': 1.07996303096703e-06, 'epoch': 0.7} 70%|███████ | 8638/12313 [6:28:25<2:38:39, 2.59s/it] 70%|███████ | 8639/12313 [6:28:28<2:41:01, 2.63s/it] {'loss': 0.4803, 'grad_norm': 3.8964414970182455, 'learning_rate': 1.0794218449868948e-06, 'epoch': 0.7} 70%|███████ | 8639/12313 [6:28:28<2:41:01, 2.63s/it] 70%|███████ | 8640/12313 [6:28:31<2:41:06, 2.63s/it] {'loss': 0.4304, 'grad_norm': 6.66037981965594, 'learning_rate': 1.07888075730325e-06, 'epoch': 0.7} 70%|███████ | 8640/12313 [6:28:31<2:41:06, 2.63s/it] 70%|███████ | 8641/12313 [6:28:34<2:42:40, 2.66s/it] {'loss': 0.5567, 'grad_norm': 9.601550182923392, 'learning_rate': 1.0783397679535343e-06, 'epoch': 0.7} 70%|███████ | 8641/12313 [6:28:34<2:42:40, 2.66s/it] 70%|███████ | 8642/12313 [6:28:36<2:42:24, 2.65s/it] {'loss': 0.3856, 'grad_norm': 5.837088176132076, 'learning_rate': 1.077798876975183e-06, 'epoch': 0.7} 70%|███████ | 8642/12313 [6:28:36<2:42:24, 2.65s/it] 70%|███████ | 8643/12313 [6:28:39<2:43:50, 2.68s/it] {'loss': 0.5247, 'grad_norm': 4.127084436289391, 'learning_rate': 1.0772580844056232e-06, 'epoch': 0.7} 70%|███████ | 8643/12313 [6:28:39<2:43:50, 2.68s/it] 70%|███████ | 8644/12313 [6:28:42<2:44:54, 2.70s/it] {'loss': 0.3272, 'grad_norm': 3.613384254118512, 'learning_rate': 1.0767173902822733e-06, 'epoch': 0.7} 70%|███████ | 8644/12313 [6:28:42<2:44:54, 2.70s/it] 70%|███████ | 8645/12313 [6:28:44<2:44:52, 2.70s/it] {'loss': 0.5089, 'grad_norm': 2.5823659536071637, 'learning_rate': 1.0761767946425482e-06, 'epoch': 0.7} 70%|███████ | 8645/12313 [6:28:44<2:44:52, 2.70s/it] 70%|███████ | 8646/12313 [6:28:47<2:39:34, 2.61s/it] {'loss': 0.8247, 'grad_norm': 4.879797467260154, 'learning_rate': 1.0756362975238539e-06, 'epoch': 0.7} 70%|███████ | 8646/12313 [6:28:47<2:39:34, 2.61s/it] 70%|███████ | 8647/12313 [6:28:49<2:38:59, 2.60s/it] {'loss': 0.4224, 'grad_norm': 6.519914984541137, 'learning_rate': 1.0750958989635879e-06, 'epoch': 0.7} 70%|███████ | 8647/12313 [6:28:49<2:38:59, 2.60s/it] 70%|███████ | 8648/12313 [6:28:52<2:40:57, 2.64s/it] {'loss': 0.571, 'grad_norm': 3.9501597340899903, 'learning_rate': 1.074555598999145e-06, 'epoch': 0.7} 70%|███████ | 8648/12313 [6:28:52<2:40:57, 2.64s/it] 70%|███████ | 8649/12313 [6:28:55<2:41:04, 2.64s/it] {'loss': 0.4328, 'grad_norm': 6.770723225323952, 'learning_rate': 1.0740153976679114e-06, 'epoch': 0.7} 70%|███████ | 8649/12313 [6:28:55<2:41:04, 2.64s/it] 70%|███████ | 8650/12313 [6:28:57<2:42:48, 2.67s/it] {'loss': 0.424, 'grad_norm': 13.76466570186757, 'learning_rate': 1.073475295007265e-06, 'epoch': 0.7} 70%|███████ | 8650/12313 [6:28:57<2:42:48, 2.67s/it] 70%|███████ | 8651/12313 [6:29:00<2:41:04, 2.64s/it] {'loss': 0.5359, 'grad_norm': 5.062420028751364, 'learning_rate': 1.0729352910545779e-06, 'epoch': 0.7} 70%|███████ | 8651/12313 [6:29:00<2:41:04, 2.64s/it] 70%|███████ | 8652/12313 [6:29:03<2:42:44, 2.67s/it] {'loss': 0.4192, 'grad_norm': 8.4588766125872, 'learning_rate': 1.0723953858472167e-06, 'epoch': 0.7} 70%|███████ | 8652/12313 [6:29:03<2:42:44, 2.67s/it] 70%|███████ | 8653/12313 [6:29:06<2:45:55, 2.72s/it] {'loss': 0.4356, 'grad_norm': 4.594691832051315, 'learning_rate': 1.0718555794225385e-06, 'epoch': 0.7} 70%|███████ | 8653/12313 [6:29:06<2:45:55, 2.72s/it] 70%|███████ | 8654/12313 [6:29:08<2:43:55, 2.69s/it] {'loss': 0.4946, 'grad_norm': 5.910635422121678, 'learning_rate': 1.071315871817896e-06, 'epoch': 0.7} 70%|███████ | 8654/12313 [6:29:08<2:43:55, 2.69s/it] 70%|███████ | 8655/12313 [6:29:11<2:43:21, 2.68s/it] {'loss': 0.6541, 'grad_norm': 7.200371807778343, 'learning_rate': 1.0707762630706345e-06, 'epoch': 0.7} 70%|███████ | 8655/12313 [6:29:11<2:43:21, 2.68s/it] 70%|███████ | 8656/12313 [6:29:14<2:43:10, 2.68s/it] {'loss': 0.5585, 'grad_norm': 4.327245345198459, 'learning_rate': 1.0702367532180919e-06, 'epoch': 0.7} 70%|███████ | 8656/12313 [6:29:14<2:43:10, 2.68s/it] 70%|███████ | 8657/12313 [6:29:16<2:41:17, 2.65s/it] {'loss': 0.4336, 'grad_norm': 5.70898727396328, 'learning_rate': 1.0696973422975978e-06, 'epoch': 0.7} 70%|███████ | 8657/12313 [6:29:16<2:41:17, 2.65s/it] 70%|███████ | 8658/12313 [6:29:19<2:41:12, 2.65s/it] {'loss': 0.7113, 'grad_norm': 3.3122955544251, 'learning_rate': 1.0691580303464791e-06, 'epoch': 0.7} 70%|███████ | 8658/12313 [6:29:19<2:41:12, 2.65s/it] 70%|███████ | 8659/12313 [6:29:21<2:41:52, 2.66s/it] {'loss': 0.5942, 'grad_norm': 6.853811044628905, 'learning_rate': 1.068618817402052e-06, 'epoch': 0.7} 70%|███████ | 8659/12313 [6:29:21<2:41:52, 2.66s/it] 70%|███████ | 8660/12313 [6:29:24<2:43:28, 2.69s/it] {'loss': 0.657, 'grad_norm': 6.076640488626101, 'learning_rate': 1.0680797035016264e-06, 'epoch': 0.7} 70%|███████ | 8660/12313 [6:29:24<2:43:28, 2.69s/it] 70%|███████ | 8661/12313 [6:29:27<2:40:03, 2.63s/it] {'loss': 0.5554, 'grad_norm': 6.2346543123925136, 'learning_rate': 1.0675406886825065e-06, 'epoch': 0.7} 70%|███████ | 8661/12313 [6:29:27<2:40:03, 2.63s/it] 70%|███████ | 8662/12313 [6:29:29<2:39:20, 2.62s/it] {'loss': 0.6035, 'grad_norm': 4.087802951829346, 'learning_rate': 1.0670017729819911e-06, 'epoch': 0.7} 70%|███████ | 8662/12313 [6:29:29<2:39:20, 2.62s/it] 70%|███████ | 8663/12313 [6:29:32<2:38:08, 2.60s/it] {'loss': 0.4762, 'grad_norm': 4.8340326949666, 'learning_rate': 1.066462956437369e-06, 'epoch': 0.7} 70%|███████ | 8663/12313 [6:29:32<2:38:08, 2.60s/it] 70%|███████ | 8664/12313 [6:29:34<2:38:30, 2.61s/it] {'loss': 0.474, 'grad_norm': 5.486314431686029, 'learning_rate': 1.0659242390859224e-06, 'epoch': 0.7} 70%|███████ | 8664/12313 [6:29:34<2:38:30, 2.61s/it] 70%|███████ | 8665/12313 [6:29:37<2:42:59, 2.68s/it] {'loss': 0.4181, 'grad_norm': 5.968816148722602, 'learning_rate': 1.0653856209649297e-06, 'epoch': 0.7} 70%|███████ | 8665/12313 [6:29:37<2:42:59, 2.68s/it] 70%|███████ | 8666/12313 [6:29:40<2:44:45, 2.71s/it] {'loss': 0.5303, 'grad_norm': 4.467813005237297, 'learning_rate': 1.0648471021116584e-06, 'epoch': 0.7} 70%|███████ | 8666/12313 [6:29:40<2:44:45, 2.71s/it] 70%|███████ | 8667/12313 [6:29:43<2:43:08, 2.68s/it] {'loss': 0.5941, 'grad_norm': 3.4998502366425757, 'learning_rate': 1.0643086825633723e-06, 'epoch': 0.7} 70%|███████ | 8667/12313 [6:29:43<2:43:08, 2.68s/it] 70%|███████ | 8668/12313 [6:29:45<2:39:57, 2.63s/it] {'loss': 0.6115, 'grad_norm': 3.138792298225771, 'learning_rate': 1.0637703623573278e-06, 'epoch': 0.7} 70%|███████ | 8668/12313 [6:29:45<2:39:57, 2.63s/it] 70%|███████ | 8669/12313 [6:29:48<2:46:43, 2.75s/it] {'loss': 0.4483, 'grad_norm': 4.420119383519892, 'learning_rate': 1.0632321415307734e-06, 'epoch': 0.7} 70%|███████ | 8669/12313 [6:29:48<2:46:43, 2.75s/it] 70%|███████ | 8670/12313 [6:29:51<2:46:07, 2.74s/it] {'loss': 0.3817, 'grad_norm': 5.331801055403152, 'learning_rate': 1.0626940201209497e-06, 'epoch': 0.7} 70%|███████ | 8670/12313 [6:29:51<2:46:07, 2.74s/it] 70%|███████ | 8671/12313 [6:29:54<2:43:00, 2.69s/it] {'loss': 0.5817, 'grad_norm': 5.750793644056218, 'learning_rate': 1.062155998165094e-06, 'epoch': 0.7} 70%|███████ | 8671/12313 [6:29:54<2:43:00, 2.69s/it] 70%|███████ | 8672/12313 [6:29:56<2:45:02, 2.72s/it] {'loss': 0.6701, 'grad_norm': 3.4756270753695104, 'learning_rate': 1.0616180757004333e-06, 'epoch': 0.7} 70%|███████ | 8672/12313 [6:29:56<2:45:02, 2.72s/it] 70%|███████ | 8673/12313 [6:29:59<2:45:41, 2.73s/it] {'loss': 0.4016, 'grad_norm': 6.270791519856763, 'learning_rate': 1.0610802527641883e-06, 'epoch': 0.7} 70%|███████ | 8673/12313 [6:29:59<2:45:41, 2.73s/it] 70%|███████ | 8674/12313 [6:30:02<2:43:51, 2.70s/it] {'loss': 0.431, 'grad_norm': 5.4889815502837775, 'learning_rate': 1.0605425293935748e-06, 'epoch': 0.7} 70%|███████ | 8674/12313 [6:30:02<2:43:51, 2.70s/it] 70%|███████ | 8675/12313 [6:30:05<2:48:47, 2.78s/it] {'loss': 0.623, 'grad_norm': 4.255319581514907, 'learning_rate': 1.0600049056258008e-06, 'epoch': 0.7} 70%|███████ | 8675/12313 [6:30:05<2:48:47, 2.78s/it] 70%|███████ | 8676/12313 [6:30:07<2:48:25, 2.78s/it] {'loss': 0.4224, 'grad_norm': 4.619912538008321, 'learning_rate': 1.0594673814980652e-06, 'epoch': 0.7} 70%|███████ | 8676/12313 [6:30:07<2:48:25, 2.78s/it] 70%|███████ | 8677/12313 [6:30:10<2:50:49, 2.82s/it] {'loss': 0.4773, 'grad_norm': 4.4624099848947445, 'learning_rate': 1.058929957047564e-06, 'epoch': 0.7} 70%|███████ | 8677/12313 [6:30:10<2:50:49, 2.82s/it] 70%|███████ | 8678/12313 [6:30:13<2:49:42, 2.80s/it] {'loss': 0.5004, 'grad_norm': 3.568435548062833, 'learning_rate': 1.0583926323114829e-06, 'epoch': 0.7} 70%|███████ | 8678/12313 [6:30:13<2:49:42, 2.80s/it] 70%|███████ | 8679/12313 [6:30:16<2:48:10, 2.78s/it] {'loss': 0.5205, 'grad_norm': 5.199277112036462, 'learning_rate': 1.057855407327001e-06, 'epoch': 0.7} 70%|███████ | 8679/12313 [6:30:16<2:48:10, 2.78s/it] 70%|███████ | 8680/12313 [6:30:19<2:46:20, 2.75s/it] {'loss': 0.4348, 'grad_norm': 4.806067495687931, 'learning_rate': 1.0573182821312927e-06, 'epoch': 0.7} 70%|███████ | 8680/12313 [6:30:19<2:46:20, 2.75s/it] 71%|███████ | 8681/12313 [6:30:21<2:49:38, 2.80s/it] {'loss': 0.5282, 'grad_norm': 4.581743151795791, 'learning_rate': 1.056781256761525e-06, 'epoch': 0.71} 71%|███████ | 8681/12313 [6:30:21<2:49:38, 2.80s/it] 71%|███████ | 8682/12313 [6:30:24<2:47:56, 2.78s/it] {'loss': 0.3811, 'grad_norm': 4.187102501676517, 'learning_rate': 1.0562443312548558e-06, 'epoch': 0.71} 71%|███████ | 8682/12313 [6:30:24<2:47:56, 2.78s/it] 71%|███████ | 8683/12313 [6:30:27<2:45:39, 2.74s/it] {'loss': 0.4645, 'grad_norm': 10.94900482763677, 'learning_rate': 1.0557075056484373e-06, 'epoch': 0.71} 71%|███████ | 8683/12313 [6:30:27<2:45:39, 2.74s/it] 71%|███████ | 8684/12313 [6:30:30<2:45:16, 2.73s/it] {'loss': 0.515, 'grad_norm': 3.286189799041891, 'learning_rate': 1.0551707799794164e-06, 'epoch': 0.71} 71%|███████ | 8684/12313 [6:30:30<2:45:16, 2.73s/it] 71%|███████ | 8685/12313 [6:30:32<2:44:28, 2.72s/it] {'loss': 0.3609, 'grad_norm': 5.1687388035544535, 'learning_rate': 1.054634154284931e-06, 'epoch': 0.71} 71%|███████ | 8685/12313 [6:30:32<2:44:28, 2.72s/it] 71%|███████ | 8686/12313 [6:30:35<2:41:49, 2.68s/it] {'loss': 0.5426, 'grad_norm': 6.765257196850052, 'learning_rate': 1.0540976286021115e-06, 'epoch': 0.71} 71%|███████ | 8686/12313 [6:30:35<2:41:49, 2.68s/it] 71%|███████ | 8687/12313 [6:30:38<2:46:54, 2.76s/it] {'loss': 0.4086, 'grad_norm': 5.1893140007432885, 'learning_rate': 1.053561202968084e-06, 'epoch': 0.71} 71%|███████ | 8687/12313 [6:30:38<2:46:54, 2.76s/it] 71%|███████ | 8688/12313 [6:30:40<2:44:14, 2.72s/it] {'loss': 0.4706, 'grad_norm': 4.075572539458918, 'learning_rate': 1.053024877419967e-06, 'epoch': 0.71} 71%|███████ | 8688/12313 [6:30:40<2:44:14, 2.72s/it] 71%|███████ | 8689/12313 [6:30:43<2:44:56, 2.73s/it] {'loss': 0.5246, 'grad_norm': 5.198722303974706, 'learning_rate': 1.0524886519948693e-06, 'epoch': 0.71} 71%|███████ | 8689/12313 [6:30:43<2:44:56, 2.73s/it] 71%|███████ | 8690/12313 [6:30:46<2:40:59, 2.67s/it] {'loss': 0.3365, 'grad_norm': 5.742252264604989, 'learning_rate': 1.0519525267298972e-06, 'epoch': 0.71} 71%|███████ | 8690/12313 [6:30:46<2:40:59, 2.67s/it] 71%|███████ | 8691/12313 [6:30:48<2:38:03, 2.62s/it] {'loss': 0.6085, 'grad_norm': 4.08553333166632, 'learning_rate': 1.0514165016621464e-06, 'epoch': 0.71} 71%|███████ | 8691/12313 [6:30:48<2:38:03, 2.62s/it] 71%|███████ | 8692/12313 [6:30:51<2:42:39, 2.70s/it] {'loss': 0.5561, 'grad_norm': 5.7021732693041995, 'learning_rate': 1.0508805768287061e-06, 'epoch': 0.71} 71%|███████ | 8692/12313 [6:30:51<2:42:39, 2.70s/it] 71%|███████ | 8693/12313 [6:30:54<2:41:36, 2.68s/it] {'loss': 0.4041, 'grad_norm': 11.500144742392935, 'learning_rate': 1.050344752266661e-06, 'epoch': 0.71} 71%|███████ | 8693/12313 [6:30:54<2:41:36, 2.68s/it] 71%|███████ | 8694/12313 [6:30:56<2:42:31, 2.69s/it] {'loss': 0.5148, 'grad_norm': 5.728188208611933, 'learning_rate': 1.0498090280130873e-06, 'epoch': 0.71} 71%|███████ | 8694/12313 [6:30:56<2:42:31, 2.69s/it] 71%|███████ | 8695/12313 [6:31:00<2:58:08, 2.95s/it] {'loss': 0.5706, 'grad_norm': 3.8194604173660442, 'learning_rate': 1.0492734041050532e-06, 'epoch': 0.71} 71%|███████ | 8695/12313 [6:31:00<2:58:08, 2.95s/it] 71%|███████ | 8696/12313 [6:31:03<3:02:44, 3.03s/it] {'loss': 0.4247, 'grad_norm': 4.238106988426028, 'learning_rate': 1.0487378805796225e-06, 'epoch': 0.71} 71%|███████ | 8696/12313 [6:31:03<3:02:44, 3.03s/it] 71%|███████ | 8697/12313 [6:31:06<2:58:01, 2.95s/it] {'loss': 0.4802, 'grad_norm': 4.981104011286187, 'learning_rate': 1.0482024574738498e-06, 'epoch': 0.71} 71%|███████ | 8697/12313 [6:31:06<2:58:01, 2.95s/it] 71%|███████ | 8698/12313 [6:31:09<2:53:16, 2.88s/it] {'loss': 0.3993, 'grad_norm': 5.353614712971508, 'learning_rate': 1.0476671348247834e-06, 'epoch': 0.71} 71%|███████ | 8698/12313 [6:31:09<2:53:16, 2.88s/it] 71%|███████ | 8699/12313 [6:31:11<2:47:45, 2.79s/it] {'loss': 0.391, 'grad_norm': 5.533743790772518, 'learning_rate': 1.047131912669464e-06, 'epoch': 0.71} 71%|███████ | 8699/12313 [6:31:11<2:47:45, 2.79s/it] 71%|███████ | 8700/12313 [6:31:14<2:46:29, 2.76s/it] {'loss': 0.4343, 'grad_norm': 5.99441767333963, 'learning_rate': 1.0465967910449274e-06, 'epoch': 0.71} 71%|███████ | 8700/12313 [6:31:14<2:46:29, 2.76s/it] 71%|███████ | 8701/12313 [6:31:17<2:43:18, 2.71s/it] {'loss': 0.3472, 'grad_norm': 10.472893307184739, 'learning_rate': 1.046061769988201e-06, 'epoch': 0.71} 71%|███████ | 8701/12313 [6:31:17<2:43:18, 2.71s/it] 71%|███████ | 8702/12313 [6:31:19<2:42:08, 2.69s/it] {'loss': 0.3806, 'grad_norm': 6.614917885036994, 'learning_rate': 1.045526849536305e-06, 'epoch': 0.71} 71%|███████ | 8702/12313 [6:31:19<2:42:08, 2.69s/it] 71%|███████ | 8703/12313 [6:31:22<2:37:56, 2.63s/it] {'loss': 0.5681, 'grad_norm': 6.161723653557723, 'learning_rate': 1.0449920297262542e-06, 'epoch': 0.71} 71%|███████ | 8703/12313 [6:31:22<2:37:56, 2.63s/it] 71%|███████ | 8704/12313 [6:31:24<2:36:44, 2.61s/it] {'loss': 0.4967, 'grad_norm': 4.341342853033297, 'learning_rate': 1.0444573105950543e-06, 'epoch': 0.71} 71%|███████ | 8704/12313 [6:31:24<2:36:44, 2.61s/it] 71%|███████ | 8705/12313 [6:31:27<2:39:03, 2.65s/it] {'loss': 0.5641, 'grad_norm': 3.964088413604789, 'learning_rate': 1.0439226921797042e-06, 'epoch': 0.71} 71%|███████ | 8705/12313 [6:31:27<2:39:03, 2.65s/it] 71%|███████ | 8706/12313 [6:31:30<2:40:39, 2.67s/it] {'loss': 0.3931, 'grad_norm': 7.436857002923769, 'learning_rate': 1.0433881745171976e-06, 'epoch': 0.71} 71%|███████ | 8706/12313 [6:31:30<2:40:39, 2.67s/it] 71%|███████ | 8707/12313 [6:31:32<2:41:18, 2.68s/it] {'loss': 0.5358, 'grad_norm': 21.38007720377836, 'learning_rate': 1.042853757644521e-06, 'epoch': 0.71} 71%|███████ | 8707/12313 [6:31:32<2:41:18, 2.68s/it] 71%|███████ | 8708/12313 [6:31:35<2:41:16, 2.68s/it] {'loss': 0.394, 'grad_norm': 9.571044595510465, 'learning_rate': 1.0423194415986518e-06, 'epoch': 0.71} 71%|███████ | 8708/12313 [6:31:35<2:41:16, 2.68s/it] 71%|███████ | 8709/12313 [6:31:38<2:46:10, 2.77s/it] {'loss': 0.7711, 'grad_norm': 3.4522432755903045, 'learning_rate': 1.0417852264165637e-06, 'epoch': 0.71} 71%|███████ | 8709/12313 [6:31:38<2:46:10, 2.77s/it] 71%|███████ | 8710/12313 [6:31:41<2:44:46, 2.74s/it] {'loss': 0.5022, 'grad_norm': 4.437337860525526, 'learning_rate': 1.0412511121352201e-06, 'epoch': 0.71} 71%|███████ | 8710/12313 [6:31:41<2:44:46, 2.74s/it] 71%|███████ | 8711/12313 [6:31:43<2:40:19, 2.67s/it] {'loss': 0.3741, 'grad_norm': 4.832102307573993, 'learning_rate': 1.0407170987915786e-06, 'epoch': 0.71} 71%|███████ | 8711/12313 [6:31:43<2:40:19, 2.67s/it] 71%|███████ | 8712/12313 [6:31:46<2:38:44, 2.64s/it] {'loss': 0.4477, 'grad_norm': 4.424143419021889, 'learning_rate': 1.0401831864225915e-06, 'epoch': 0.71} 71%|███████ | 8712/12313 [6:31:46<2:38:44, 2.64s/it] 71%|███████ | 8713/12313 [6:31:48<2:37:31, 2.63s/it] {'loss': 0.6794, 'grad_norm': 4.167265583846372, 'learning_rate': 1.0396493750652008e-06, 'epoch': 0.71} 71%|███████ | 8713/12313 [6:31:48<2:37:31, 2.63s/it] 71%|███████ | 8714/12313 [6:31:51<2:43:30, 2.73s/it] {'loss': 0.4642, 'grad_norm': 5.174243065313732, 'learning_rate': 1.039115664756345e-06, 'epoch': 0.71} 71%|███████ | 8714/12313 [6:31:51<2:43:30, 2.73s/it] 71%|███████ | 8715/12313 [6:31:54<2:39:54, 2.67s/it] {'loss': 0.5378, 'grad_norm': 4.436027938464365, 'learning_rate': 1.0385820555329543e-06, 'epoch': 0.71} 71%|███████ | 8715/12313 [6:31:54<2:39:54, 2.67s/it] 71%|███████ | 8716/12313 [6:31:57<2:40:05, 2.67s/it] {'loss': 0.4807, 'grad_norm': 6.048625139053751, 'learning_rate': 1.0380485474319507e-06, 'epoch': 0.71} 71%|███████ | 8716/12313 [6:31:57<2:40:05, 2.67s/it] 71%|███████ | 8717/12313 [6:31:59<2:37:51, 2.63s/it] {'loss': 0.4596, 'grad_norm': 4.409054430942545, 'learning_rate': 1.0375151404902507e-06, 'epoch': 0.71} 71%|███████ | 8717/12313 [6:31:59<2:37:51, 2.63s/it] 71%|███████ | 8718/12313 [6:32:02<2:37:26, 2.63s/it] {'loss': 0.396, 'grad_norm': 6.231874069745442, 'learning_rate': 1.0369818347447617e-06, 'epoch': 0.71} 71%|███████ | 8718/12313 [6:32:02<2:37:26, 2.63s/it] 71%|███████ | 8719/12313 [6:32:04<2:38:27, 2.65s/it] {'loss': 0.6371, 'grad_norm': 6.571192019178439, 'learning_rate': 1.0364486302323868e-06, 'epoch': 0.71} 71%|███████ | 8719/12313 [6:32:04<2:38:27, 2.65s/it] 71%|███████ | 8720/12313 [6:32:07<2:40:49, 2.69s/it] {'loss': 0.4857, 'grad_norm': 6.2588184473383555, 'learning_rate': 1.035915526990022e-06, 'epoch': 0.71} 71%|███████ | 8720/12313 [6:32:07<2:40:49, 2.69s/it] 71%|███████ | 8721/12313 [6:32:10<2:43:51, 2.74s/it] {'loss': 0.4409, 'grad_norm': 4.086683961249212, 'learning_rate': 1.0353825250545533e-06, 'epoch': 0.71} 71%|███████ | 8721/12313 [6:32:10<2:43:51, 2.74s/it] 71%|███████ | 8722/12313 [6:32:13<2:39:18, 2.66s/it] {'loss': 0.4143, 'grad_norm': 4.896811289532637, 'learning_rate': 1.0348496244628633e-06, 'epoch': 0.71} 71%|███████ | 8722/12313 [6:32:13<2:39:18, 2.66s/it] 71%|███████ | 8723/12313 [6:32:15<2:37:27, 2.63s/it] {'loss': 0.4197, 'grad_norm': 6.411254993382916, 'learning_rate': 1.0343168252518252e-06, 'epoch': 0.71} 71%|███████ | 8723/12313 [6:32:15<2:37:27, 2.63s/it] 71%|███████ | 8724/12313 [6:32:18<2:38:45, 2.65s/it] {'loss': 0.6165, 'grad_norm': 6.5354079259733915, 'learning_rate': 1.0337841274583046e-06, 'epoch': 0.71} 71%|███████ | 8724/12313 [6:32:18<2:38:45, 2.65s/it] 71%|███████ | 8725/12313 [6:32:21<2:41:01, 2.69s/it] {'loss': 0.5354, 'grad_norm': 5.167176814080369, 'learning_rate': 1.0332515311191627e-06, 'epoch': 0.71} 71%|███████ | 8725/12313 [6:32:21<2:41:01, 2.69s/it] 71%|███████ | 8726/12313 [6:32:23<2:41:31, 2.70s/it] {'loss': 0.5054, 'grad_norm': 7.4442577463893835, 'learning_rate': 1.032719036271253e-06, 'epoch': 0.71} 71%|███████ | 8726/12313 [6:32:23<2:41:31, 2.70s/it] 71%|███████ | 8727/12313 [6:32:26<2:41:59, 2.71s/it] {'loss': 0.5241, 'grad_norm': 4.039848430781059, 'learning_rate': 1.0321866429514199e-06, 'epoch': 0.71} 71%|███████ | 8727/12313 [6:32:26<2:41:59, 2.71s/it] 71%|███████ | 8728/12313 [6:32:29<2:41:31, 2.70s/it] {'loss': 0.5754, 'grad_norm': 3.2336915820436736, 'learning_rate': 1.0316543511965035e-06, 'epoch': 0.71} 71%|███████ | 8728/12313 [6:32:29<2:41:31, 2.70s/it] 71%|███████ | 8729/12313 [6:32:31<2:41:39, 2.71s/it] {'loss': 0.377, 'grad_norm': 4.7161410228484195, 'learning_rate': 1.031122161043335e-06, 'epoch': 0.71} 71%|███████ | 8729/12313 [6:32:31<2:41:39, 2.71s/it] 71%|███████ | 8730/12313 [6:32:34<2:42:05, 2.71s/it] {'loss': 0.4437, 'grad_norm': 3.6485307474977624, 'learning_rate': 1.030590072528738e-06, 'epoch': 0.71} 71%|███████ | 8730/12313 [6:32:34<2:42:05, 2.71s/it] 71%|███████ | 8731/12313 [6:32:37<2:39:44, 2.68s/it] {'loss': 0.4561, 'grad_norm': 4.868323212713634, 'learning_rate': 1.030058085689532e-06, 'epoch': 0.71} 71%|███████ | 8731/12313 [6:32:37<2:39:44, 2.68s/it] 71%|███████ | 8732/12313 [6:32:39<2:38:16, 2.65s/it] {'loss': 0.413, 'grad_norm': 6.225865954562011, 'learning_rate': 1.0295262005625262e-06, 'epoch': 0.71} 71%|███████ | 8732/12313 [6:32:39<2:38:16, 2.65s/it] 71%|███████ | 8733/12313 [6:32:42<2:40:56, 2.70s/it] {'loss': 0.5137, 'grad_norm': 4.393355867203619, 'learning_rate': 1.028994417184525e-06, 'epoch': 0.71} 71%|███████ | 8733/12313 [6:32:42<2:40:56, 2.70s/it] 71%|███████ | 8734/12313 [6:32:45<2:42:16, 2.72s/it] {'loss': 0.5499, 'grad_norm': 4.03858492787502, 'learning_rate': 1.0284627355923257e-06, 'epoch': 0.71} 71%|███████ | 8734/12313 [6:32:45<2:42:16, 2.72s/it] 71%|███████ | 8735/12313 [6:32:48<2:43:56, 2.75s/it] {'loss': 0.4145, 'grad_norm': 7.2081095740889225, 'learning_rate': 1.0279311558227174e-06, 'epoch': 0.71} 71%|███████ | 8735/12313 [6:32:48<2:43:56, 2.75s/it] 71%|███████ | 8736/12313 [6:32:50<2:39:36, 2.68s/it] {'loss': 0.4802, 'grad_norm': 4.52684696449661, 'learning_rate': 1.027399677912482e-06, 'epoch': 0.71} 71%|███████ | 8736/12313 [6:32:50<2:39:36, 2.68s/it] 71%|███████ | 8737/12313 [6:32:53<2:39:59, 2.68s/it] {'loss': 0.4124, 'grad_norm': 4.627236682606212, 'learning_rate': 1.0268683018983944e-06, 'epoch': 0.71} 71%|███████ | 8737/12313 [6:32:53<2:39:59, 2.68s/it] 71%|███████ | 8738/12313 [6:32:56<2:39:10, 2.67s/it] {'loss': 0.4463, 'grad_norm': 4.60387331373005, 'learning_rate': 1.026337027817224e-06, 'epoch': 0.71} 71%|███████ | 8738/12313 [6:32:56<2:39:10, 2.67s/it] 71%|███████ | 8739/12313 [6:32:58<2:37:54, 2.65s/it] {'loss': 0.4391, 'grad_norm': 4.049679102453057, 'learning_rate': 1.0258058557057328e-06, 'epoch': 0.71} 71%|███████ | 8739/12313 [6:32:58<2:37:54, 2.65s/it] 71%|███████ | 8740/12313 [6:33:01<2:43:48, 2.75s/it] {'loss': 0.4598, 'grad_norm': 3.932889064119171, 'learning_rate': 1.0252747856006735e-06, 'epoch': 0.71} 71%|███████ | 8740/12313 [6:33:01<2:43:48, 2.75s/it] 71%|███████ | 8741/12313 [6:33:04<2:41:14, 2.71s/it] {'loss': 0.3975, 'grad_norm': 5.445177988596398, 'learning_rate': 1.0247438175387946e-06, 'epoch': 0.71} 71%|███████ | 8741/12313 [6:33:04<2:41:14, 2.71s/it] 71%|███████ | 8742/12313 [6:33:06<2:38:37, 2.67s/it] {'loss': 0.6375, 'grad_norm': 4.336033272668492, 'learning_rate': 1.0242129515568364e-06, 'epoch': 0.71} 71%|███████ | 8742/12313 [6:33:06<2:38:37, 2.67s/it] 71%|███████ | 8743/12313 [6:33:09<2:35:40, 2.62s/it] {'loss': 0.5762, 'grad_norm': 5.942395150551147, 'learning_rate': 1.0236821876915303e-06, 'epoch': 0.71} 71%|███████ | 8743/12313 [6:33:09<2:35:40, 2.62s/it] 71%|███████ | 8744/12313 [6:33:12<2:43:02, 2.74s/it] {'loss': 0.4297, 'grad_norm': 7.955186448658773, 'learning_rate': 1.0231515259796046e-06, 'epoch': 0.71} 71%|███████ | 8744/12313 [6:33:12<2:43:02, 2.74s/it] 71%|███████ | 8745/12313 [6:33:14<2:39:38, 2.68s/it] {'loss': 0.5287, 'grad_norm': 4.589645135923486, 'learning_rate': 1.022620966457776e-06, 'epoch': 0.71} 71%|███████ | 8745/12313 [6:33:14<2:39:38, 2.68s/it] 71%|███████ | 8746/12313 [6:33:17<2:38:17, 2.66s/it] {'loss': 0.3939, 'grad_norm': 10.369934951756198, 'learning_rate': 1.0220905091627581e-06, 'epoch': 0.71} 71%|███████ | 8746/12313 [6:33:17<2:38:17, 2.66s/it] 71%|███████ | 8747/12313 [6:33:20<2:43:25, 2.75s/it] {'loss': 0.4922, 'grad_norm': 4.830026444654436, 'learning_rate': 1.0215601541312556e-06, 'epoch': 0.71} 71%|███████ | 8747/12313 [6:33:20<2:43:25, 2.75s/it] 71%|███████ | 8748/12313 [6:33:23<2:45:05, 2.78s/it] {'loss': 0.4854, 'grad_norm': 4.89390413686288, 'learning_rate': 1.0210299013999662e-06, 'epoch': 0.71} 71%|███████ | 8748/12313 [6:33:23<2:45:05, 2.78s/it] 71%|███████ | 8749/12313 [6:33:26<2:47:26, 2.82s/it] {'loss': 0.4383, 'grad_norm': 5.6081171633262015, 'learning_rate': 1.0204997510055793e-06, 'epoch': 0.71} 71%|███████ | 8749/12313 [6:33:26<2:47:26, 2.82s/it] 71%|███████ | 8750/12313 [6:33:28<2:40:59, 2.71s/it] {'loss': 0.5754, 'grad_norm': 7.598002540248723, 'learning_rate': 1.0199697029847804e-06, 'epoch': 0.71} 71%|███████ | 8750/12313 [6:33:28<2:40:59, 2.71s/it] 71%|███████ | 8751/12313 [6:33:31<2:37:22, 2.65s/it] {'loss': 0.3061, 'grad_norm': 6.496257353872819, 'learning_rate': 1.0194397573742442e-06, 'epoch': 0.71} 71%|███████ | 8751/12313 [6:33:31<2:37:22, 2.65s/it] 71%|███████ | 8752/12313 [6:33:33<2:36:36, 2.64s/it] {'loss': 0.4262, 'grad_norm': 7.92240606328125, 'learning_rate': 1.0189099142106421e-06, 'epoch': 0.71} 71%|███████ | 8752/12313 [6:33:33<2:36:36, 2.64s/it] 71%|███████ | 8753/12313 [6:33:36<2:36:08, 2.63s/it] {'loss': 0.4334, 'grad_norm': 27.70458593561067, 'learning_rate': 1.0183801735306342e-06, 'epoch': 0.71} 71%|███████ | 8753/12313 [6:33:36<2:36:08, 2.63s/it] 71%|███████ | 8754/12313 [6:33:38<2:32:56, 2.58s/it] {'loss': 0.5682, 'grad_norm': 4.449530450041586, 'learning_rate': 1.0178505353708779e-06, 'epoch': 0.71} 71%|███████ | 8754/12313 [6:33:38<2:32:56, 2.58s/it] 71%|███████ | 8755/12313 [6:33:41<2:33:15, 2.58s/it] {'loss': 0.5643, 'grad_norm': 5.978122767767535, 'learning_rate': 1.0173209997680203e-06, 'epoch': 0.71} 71%|███████ | 8755/12313 [6:33:41<2:33:15, 2.58s/it] 71%|███████ | 8756/12313 [6:33:44<2:39:24, 2.69s/it] {'loss': 0.47, 'grad_norm': 4.850468611110746, 'learning_rate': 1.0167915667587019e-06, 'epoch': 0.71} 71%|███████ | 8756/12313 [6:33:44<2:39:24, 2.69s/it] 71%|███████ | 8757/12313 [6:33:47<2:41:13, 2.72s/it] {'loss': 0.5099, 'grad_norm': 4.3524338310305835, 'learning_rate': 1.016262236379558e-06, 'epoch': 0.71} 71%|███████ | 8757/12313 [6:33:47<2:41:13, 2.72s/it] 71%|███████ | 8758/12313 [6:33:49<2:36:47, 2.65s/it] {'loss': 0.6591, 'grad_norm': 6.685288002285286, 'learning_rate': 1.015733008667214e-06, 'epoch': 0.71} 71%|███████ | 8758/12313 [6:33:49<2:36:47, 2.65s/it] 71%|███████ | 8759/12313 [6:33:52<2:37:46, 2.66s/it] {'loss': 0.5526, 'grad_norm': 7.509285220548554, 'learning_rate': 1.0152038836582903e-06, 'epoch': 0.71} 71%|███████ | 8759/12313 [6:33:52<2:37:46, 2.66s/it] 71%|███████ | 8760/12313 [6:33:55<2:36:33, 2.64s/it] {'loss': 0.4786, 'grad_norm': 3.6292963710683366, 'learning_rate': 1.0146748613894005e-06, 'epoch': 0.71} 71%|███████ | 8760/12313 [6:33:55<2:36:33, 2.64s/it] 71%|███████ | 8761/12313 [6:33:57<2:38:32, 2.68s/it] {'loss': 0.409, 'grad_norm': 4.730030440457719, 'learning_rate': 1.0141459418971496e-06, 'epoch': 0.71} 71%|███████ | 8761/12313 [6:33:57<2:38:32, 2.68s/it] 71%|███████ | 8762/12313 [6:34:00<2:38:42, 2.68s/it] {'loss': 0.5859, 'grad_norm': 3.5510572572185977, 'learning_rate': 1.0136171252181348e-06, 'epoch': 0.71} 71%|███████ | 8762/12313 [6:34:00<2:38:42, 2.68s/it] 71%|███████ | 8763/12313 [6:34:03<2:37:52, 2.67s/it] {'loss': 0.4996, 'grad_norm': 7.350261219519113, 'learning_rate': 1.0130884113889491e-06, 'epoch': 0.71} 71%|███████ | 8763/12313 [6:34:03<2:37:52, 2.67s/it] 71%|███████ | 8764/12313 [6:34:05<2:35:14, 2.62s/it] {'loss': 0.395, 'grad_norm': 4.1865654744906635, 'learning_rate': 1.0125598004461752e-06, 'epoch': 0.71} 71%|███████ | 8764/12313 [6:34:05<2:35:14, 2.62s/it] 71%|███████ | 8765/12313 [6:34:08<2:34:15, 2.61s/it] {'loss': 0.4005, 'grad_norm': 8.196706389670227, 'learning_rate': 1.012031292426391e-06, 'epoch': 0.71} 71%|███████ | 8765/12313 [6:34:08<2:34:15, 2.61s/it] 71%|███████ | 8766/12313 [6:34:10<2:36:28, 2.65s/it] {'loss': 0.4292, 'grad_norm': 5.9136835481706855, 'learning_rate': 1.011502887366167e-06, 'epoch': 0.71} 71%|███████ | 8766/12313 [6:34:10<2:36:28, 2.65s/it] 71%|███████ | 8767/12313 [6:34:14<2:44:15, 2.78s/it] {'loss': 0.4259, 'grad_norm': 5.09214562271127, 'learning_rate': 1.0109745853020655e-06, 'epoch': 0.71} 71%|███████ | 8767/12313 [6:34:14<2:44:15, 2.78s/it] 71%|███████ | 8768/12313 [6:34:16<2:40:24, 2.71s/it] {'loss': 0.3907, 'grad_norm': 8.352854444325088, 'learning_rate': 1.0104463862706414e-06, 'epoch': 0.71} 71%|███████ | 8768/12313 [6:34:16<2:40:24, 2.71s/it] 71%|███████ | 8769/12313 [6:34:19<2:39:56, 2.71s/it] {'loss': 0.5434, 'grad_norm': 4.539279498681904, 'learning_rate': 1.0099182903084448e-06, 'epoch': 0.71} 71%|███████ | 8769/12313 [6:34:19<2:39:56, 2.71s/it] 71%|███████ | 8770/12313 [6:34:22<2:40:29, 2.72s/it] {'loss': 0.4994, 'grad_norm': 6.108837746410746, 'learning_rate': 1.0093902974520165e-06, 'epoch': 0.71} 71%|███████ | 8770/12313 [6:34:22<2:40:29, 2.72s/it] 71%|███████ | 8771/12313 [6:34:24<2:39:55, 2.71s/it] {'loss': 0.4736, 'grad_norm': 6.827549453211144, 'learning_rate': 1.0088624077378897e-06, 'epoch': 0.71} 71%|███████ | 8771/12313 [6:34:24<2:39:55, 2.71s/it] 71%|███████ | 8772/12313 [6:34:27<2:36:58, 2.66s/it] {'loss': 0.6442, 'grad_norm': 5.0626039477295315, 'learning_rate': 1.0083346212025923e-06, 'epoch': 0.71} 71%|███████ | 8772/12313 [6:34:27<2:36:58, 2.66s/it] 71%|███████ | 8773/12313 [6:34:29<2:32:10, 2.58s/it] {'loss': 0.579, 'grad_norm': 4.104258023545044, 'learning_rate': 1.0078069378826458e-06, 'epoch': 0.71} 71%|███████ | 8773/12313 [6:34:29<2:32:10, 2.58s/it] 71%|███████▏ | 8774/12313 [6:34:32<2:29:45, 2.54s/it] {'loss': 0.5327, 'grad_norm': 3.7706839195991297, 'learning_rate': 1.0072793578145618e-06, 'epoch': 0.71} 71%|███████▏ | 8774/12313 [6:34:32<2:29:45, 2.54s/it] 71%|███████▏ | 8775/12313 [6:34:34<2:26:54, 2.49s/it] {'loss': 0.5282, 'grad_norm': 7.155111652149057, 'learning_rate': 1.0067518810348453e-06, 'epoch': 0.71} 71%|███████▏ | 8775/12313 [6:34:34<2:26:54, 2.49s/it] 71%|███████▏ | 8776/12313 [6:34:37<2:29:31, 2.54s/it] {'loss': 0.5045, 'grad_norm': 7.637395079270046, 'learning_rate': 1.0062245075799966e-06, 'epoch': 0.71} 71%|███████▏ | 8776/12313 [6:34:37<2:29:31, 2.54s/it] 71%|███████▏ | 8777/12313 [6:34:39<2:34:51, 2.63s/it] {'loss': 0.6264, 'grad_norm': 6.413393628245962, 'learning_rate': 1.0056972374865054e-06, 'epoch': 0.71} 71%|███████▏ | 8777/12313 [6:34:39<2:34:51, 2.63s/it] 71%|███████▏ | 8778/12313 [6:34:42<2:38:57, 2.70s/it] {'loss': 0.7116, 'grad_norm': 5.7064526408846135, 'learning_rate': 1.0051700707908569e-06, 'epoch': 0.71} 71%|███████▏ | 8778/12313 [6:34:42<2:38:57, 2.70s/it] 71%|███████▏ | 8779/12313 [6:34:45<2:39:10, 2.70s/it] {'loss': 0.6247, 'grad_norm': 5.902226546762958, 'learning_rate': 1.0046430075295287e-06, 'epoch': 0.71} 71%|███████▏ | 8779/12313 [6:34:45<2:39:10, 2.70s/it] 71%|███████▏ | 8780/12313 [6:34:48<2:35:48, 2.65s/it] {'loss': 0.4892, 'grad_norm': 5.831122076718099, 'learning_rate': 1.0041160477389909e-06, 'epoch': 0.71} 71%|███████▏ | 8780/12313 [6:34:48<2:35:48, 2.65s/it] 71%|███████▏ | 8781/12313 [6:34:50<2:36:31, 2.66s/it] {'loss': 0.5625, 'grad_norm': 5.215468338547022, 'learning_rate': 1.0035891914557044e-06, 'epoch': 0.71} 71%|███████▏ | 8781/12313 [6:34:50<2:36:31, 2.66s/it] 71%|███████▏ | 8782/12313 [6:34:53<2:38:15, 2.69s/it] {'loss': 0.4898, 'grad_norm': 7.447326637422664, 'learning_rate': 1.0030624387161273e-06, 'epoch': 0.71} 71%|███████▏ | 8782/12313 [6:34:53<2:38:15, 2.69s/it] 71%|███████▏ | 8783/12313 [6:34:56<2:37:40, 2.68s/it] {'loss': 0.5263, 'grad_norm': 4.541675365615177, 'learning_rate': 1.002535789556707e-06, 'epoch': 0.71} 71%|███████▏ | 8783/12313 [6:34:56<2:37:40, 2.68s/it] 71%|███████▏ | 8784/12313 [6:34:58<2:36:04, 2.65s/it] {'loss': 0.57, 'grad_norm': 5.008687031433551, 'learning_rate': 1.0020092440138833e-06, 'epoch': 0.71} 71%|███████▏ | 8784/12313 [6:34:58<2:36:04, 2.65s/it] 71%|███████▏ | 8785/12313 [6:35:01<2:36:31, 2.66s/it] {'loss': 0.3873, 'grad_norm': 4.218282129108528, 'learning_rate': 1.0014828021240932e-06, 'epoch': 0.71} 71%|███████▏ | 8785/12313 [6:35:01<2:36:31, 2.66s/it] 71%|███████▏ | 8786/12313 [6:35:03<2:34:47, 2.63s/it] {'loss': 0.3644, 'grad_norm': 7.9910976481847875, 'learning_rate': 1.0009564639237627e-06, 'epoch': 0.71} 71%|███████▏ | 8786/12313 [6:35:03<2:34:47, 2.63s/it] 71%|███████▏ | 8787/12313 [6:35:06<2:36:02, 2.66s/it] {'loss': 0.4983, 'grad_norm': 4.348847062904255, 'learning_rate': 1.0004302294493104e-06, 'epoch': 0.71} 71%|███████▏ | 8787/12313 [6:35:06<2:36:02, 2.66s/it] 71%|███████▏ | 8788/12313 [6:35:09<2:34:25, 2.63s/it] {'loss': 0.4464, 'grad_norm': 6.4121947565646815, 'learning_rate': 9.999040987371505e-07, 'epoch': 0.71} 71%|███████▏ | 8788/12313 [6:35:09<2:34:25, 2.63s/it] 71%|███████▏ | 8789/12313 [6:35:12<2:37:30, 2.68s/it] {'loss': 0.3603, 'grad_norm': 4.829469885568656, 'learning_rate': 9.993780718236882e-07, 'epoch': 0.71} 71%|███████▏ | 8789/12313 [6:35:12<2:37:30, 2.68s/it] 71%|███████▏ | 8790/12313 [6:35:14<2:35:51, 2.65s/it] {'loss': 0.5219, 'grad_norm': 3.668444891519201, 'learning_rate': 9.988521487453203e-07, 'epoch': 0.71} 71%|███████▏ | 8790/12313 [6:35:14<2:35:51, 2.65s/it] 71%|███████▏ | 8791/12313 [6:35:17<2:36:54, 2.67s/it] {'loss': 0.48, 'grad_norm': 4.4085220444412245, 'learning_rate': 9.98326329538439e-07, 'epoch': 0.71} 71%|███████▏ | 8791/12313 [6:35:17<2:36:54, 2.67s/it] 71%|███████▏ | 8792/12313 [6:35:20<2:38:08, 2.69s/it] {'loss': 0.6356, 'grad_norm': 6.942288338011259, 'learning_rate': 9.978006142394292e-07, 'epoch': 0.71} 71%|███████▏ | 8792/12313 [6:35:20<2:38:08, 2.69s/it] 71%|███████▏ | 8793/12313 [6:35:22<2:41:08, 2.75s/it] {'loss': 0.4649, 'grad_norm': 3.9011079567183353, 'learning_rate': 9.972750028846665e-07, 'epoch': 0.71} 71%|███████▏ | 8793/12313 [6:35:22<2:41:08, 2.75s/it] 71%|███████▏ | 8794/12313 [6:35:25<2:38:15, 2.70s/it] {'loss': 0.5606, 'grad_norm': 5.691178624558273, 'learning_rate': 9.967494955105197e-07, 'epoch': 0.71} 71%|███████▏ | 8794/12313 [6:35:25<2:38:15, 2.70s/it] 71%|███████▏ | 8795/12313 [6:35:28<2:34:16, 2.63s/it] {'loss': 0.4898, 'grad_norm': 4.779174640506254, 'learning_rate': 9.962240921533528e-07, 'epoch': 0.71} 71%|███████▏ | 8795/12313 [6:35:28<2:34:16, 2.63s/it] 71%|███████▏ | 8796/12313 [6:35:30<2:35:45, 2.66s/it] {'loss': 0.4236, 'grad_norm': 5.8645704612877445, 'learning_rate': 9.956987928495193e-07, 'epoch': 0.71} 71%|███████▏ | 8796/12313 [6:35:30<2:35:45, 2.66s/it] 71%|███████▏ | 8797/12313 [6:35:33<2:36:09, 2.66s/it] {'loss': 0.4852, 'grad_norm': 4.402062224686781, 'learning_rate': 9.951735976353677e-07, 'epoch': 0.71} 71%|███████▏ | 8797/12313 [6:35:33<2:36:09, 2.66s/it] 71%|███████▏ | 8798/12313 [6:35:35<2:33:35, 2.62s/it] {'loss': 0.357, 'grad_norm': 5.0026387695604875, 'learning_rate': 9.946485065472402e-07, 'epoch': 0.71} 71%|███████▏ | 8798/12313 [6:35:35<2:33:35, 2.62s/it] 71%|███████▏ | 8799/12313 [6:35:38<2:32:26, 2.60s/it] {'loss': 0.5172, 'grad_norm': 7.498003830377879, 'learning_rate': 9.941235196214687e-07, 'epoch': 0.71} 71%|███████▏ | 8799/12313 [6:35:38<2:32:26, 2.60s/it] 71%|███████▏ | 8800/12313 [6:35:41<2:36:27, 2.67s/it] {'loss': 0.4808, 'grad_norm': 3.3478530645853675, 'learning_rate': 9.935986368943796e-07, 'epoch': 0.71} 71%|███████▏ | 8800/12313 [6:35:41<2:36:27, 2.67s/it] 71%|███████▏ | 8801/12313 [6:35:44<2:36:00, 2.67s/it] {'loss': 0.5855, 'grad_norm': 6.779585993646702, 'learning_rate': 9.930738584022925e-07, 'epoch': 0.71} 71%|███████▏ | 8801/12313 [6:35:44<2:36:00, 2.67s/it] 71%|███████▏ | 8802/12313 [6:35:46<2:38:35, 2.71s/it] {'loss': 0.4775, 'grad_norm': 4.424567878048419, 'learning_rate': 9.925491841815197e-07, 'epoch': 0.71} 71%|███████▏ | 8802/12313 [6:35:46<2:38:35, 2.71s/it] 71%|███████▏ | 8803/12313 [6:35:49<2:36:35, 2.68s/it] {'loss': 0.611, 'grad_norm': 4.168929836677361, 'learning_rate': 9.92024614268364e-07, 'epoch': 0.71} 71%|███████▏ | 8803/12313 [6:35:49<2:36:35, 2.68s/it] 72%|███████▏ | 8804/12313 [6:35:51<2:34:50, 2.65s/it] {'loss': 0.6159, 'grad_norm': 4.398671369510578, 'learning_rate': 9.915001486991243e-07, 'epoch': 0.72} 72%|███████▏ | 8804/12313 [6:35:52<2:34:50, 2.65s/it] 72%|███████▏ | 8805/12313 [6:35:54<2:34:06, 2.64s/it] {'loss': 0.4175, 'grad_norm': 4.041777750287028, 'learning_rate': 9.909757875100914e-07, 'epoch': 0.72} 72%|███████▏ | 8805/12313 [6:35:54<2:34:06, 2.64s/it] 72%|███████▏ | 8806/12313 [6:35:57<2:32:57, 2.62s/it] {'loss': 0.5913, 'grad_norm': 7.847660489045761, 'learning_rate': 9.904515307375478e-07, 'epoch': 0.72} 72%|███████▏ | 8806/12313 [6:35:57<2:32:57, 2.62s/it] 72%|███████▏ | 8807/12313 [6:35:59<2:29:11, 2.55s/it] {'loss': 0.3363, 'grad_norm': 4.250630000320927, 'learning_rate': 9.899273784177681e-07, 'epoch': 0.72} 72%|███████▏ | 8807/12313 [6:35:59<2:29:11, 2.55s/it] 72%|███████▏ | 8808/12313 [6:36:02<2:27:50, 2.53s/it] {'loss': 0.4725, 'grad_norm': 8.197363212404545, 'learning_rate': 9.894033305870229e-07, 'epoch': 0.72} 72%|███████▏ | 8808/12313 [6:36:02<2:27:50, 2.53s/it] 72%|███████▏ | 8809/12313 [6:36:04<2:25:40, 2.49s/it] {'loss': 0.4924, 'grad_norm': 3.91482587309766, 'learning_rate': 9.888793872815716e-07, 'epoch': 0.72} 72%|███████▏ | 8809/12313 [6:36:04<2:25:40, 2.49s/it] 72%|███████▏ | 8810/12313 [6:36:07<2:30:15, 2.57s/it] {'loss': 0.6437, 'grad_norm': 4.933671181822034, 'learning_rate': 9.883555485376688e-07, 'epoch': 0.72} 72%|███████▏ | 8810/12313 [6:36:07<2:30:15, 2.57s/it] 72%|███████▏ | 8811/12313 [6:36:10<2:34:40, 2.65s/it] {'loss': 0.3706, 'grad_norm': 5.499179017080881, 'learning_rate': 9.878318143915633e-07, 'epoch': 0.72} 72%|███████▏ | 8811/12313 [6:36:10<2:34:40, 2.65s/it] 72%|███████▏ | 8812/12313 [6:36:12<2:33:12, 2.63s/it] {'loss': 0.4138, 'grad_norm': 7.743811775235963, 'learning_rate': 9.873081848794926e-07, 'epoch': 0.72} 72%|███████▏ | 8812/12313 [6:36:12<2:33:12, 2.63s/it] 72%|███████▏ | 8813/12313 [6:36:15<2:31:41, 2.60s/it] {'loss': 0.5267, 'grad_norm': 4.332868158341539, 'learning_rate': 9.867846600376892e-07, 'epoch': 0.72} 72%|███████▏ | 8813/12313 [6:36:15<2:31:41, 2.60s/it] 72%|███████▏ | 8814/12313 [6:36:17<2:31:38, 2.60s/it] {'loss': 0.6121, 'grad_norm': 4.072077480048703, 'learning_rate': 9.862612399023797e-07, 'epoch': 0.72} 72%|███████▏ | 8814/12313 [6:36:17<2:31:38, 2.60s/it] 72%|███████▏ | 8815/12313 [6:36:20<2:33:12, 2.63s/it] {'loss': 0.4842, 'grad_norm': 6.275203078249558, 'learning_rate': 9.85737924509781e-07, 'epoch': 0.72} 72%|███████▏ | 8815/12313 [6:36:20<2:33:12, 2.63s/it] 72%|███████▏ | 8816/12313 [6:36:23<2:32:54, 2.62s/it] {'loss': 0.3927, 'grad_norm': 6.229912521186827, 'learning_rate': 9.852147138961026e-07, 'epoch': 0.72} 72%|███████▏ | 8816/12313 [6:36:23<2:32:54, 2.62s/it] 72%|███████▏ | 8817/12313 [6:36:25<2:34:33, 2.65s/it] {'loss': 0.4515, 'grad_norm': 5.84959994620228, 'learning_rate': 9.846916080975493e-07, 'epoch': 0.72} 72%|███████▏ | 8817/12313 [6:36:25<2:34:33, 2.65s/it] 72%|███████▏ | 8818/12313 [6:36:28<2:34:54, 2.66s/it] {'loss': 0.4579, 'grad_norm': 8.565273476837879, 'learning_rate': 9.841686071503178e-07, 'epoch': 0.72} 72%|███████▏ | 8818/12313 [6:36:28<2:34:54, 2.66s/it] 72%|███████▏ | 8819/12313 [6:36:31<2:39:09, 2.73s/it] {'loss': 0.7484, 'grad_norm': 5.7709811538417854, 'learning_rate': 9.836457110905956e-07, 'epoch': 0.72} 72%|███████▏ | 8819/12313 [6:36:31<2:39:09, 2.73s/it] 72%|███████▏ | 8820/12313 [6:36:34<2:38:06, 2.72s/it] {'loss': 0.3563, 'grad_norm': 5.3290551767188035, 'learning_rate': 9.831229199545659e-07, 'epoch': 0.72} 72%|███████▏ | 8820/12313 [6:36:34<2:38:06, 2.72s/it] 72%|███████▏ | 8821/12313 [6:36:36<2:38:55, 2.73s/it] {'loss': 0.5667, 'grad_norm': 6.90902242367057, 'learning_rate': 9.82600233778402e-07, 'epoch': 0.72} 72%|███████▏ | 8821/12313 [6:36:36<2:38:55, 2.73s/it] 72%|███████▏ | 8822/12313 [6:36:39<2:36:54, 2.70s/it] {'loss': 0.4209, 'grad_norm': 5.0511931716125105, 'learning_rate': 9.820776525982703e-07, 'epoch': 0.72} 72%|███████▏ | 8822/12313 [6:36:39<2:36:54, 2.70s/it] 72%|███████▏ | 8823/12313 [6:36:41<2:34:26, 2.66s/it] {'loss': 0.4668, 'grad_norm': 4.8369460147576095, 'learning_rate': 9.815551764503317e-07, 'epoch': 0.72} 72%|███████▏ | 8823/12313 [6:36:41<2:34:26, 2.66s/it] 72%|███████▏ | 8824/12313 [6:36:44<2:36:38, 2.69s/it] {'loss': 0.8039, 'grad_norm': 4.486543984473168, 'learning_rate': 9.810328053707394e-07, 'epoch': 0.72} 72%|███████▏ | 8824/12313 [6:36:44<2:36:38, 2.69s/it] 72%|███████▏ | 8825/12313 [6:36:47<2:35:17, 2.67s/it] {'loss': 0.471, 'grad_norm': 4.014813431510159, 'learning_rate': 9.805105393956378e-07, 'epoch': 0.72} 72%|███████▏ | 8825/12313 [6:36:47<2:35:17, 2.67s/it] 72%|███████▏ | 8826/12313 [6:36:50<2:35:20, 2.67s/it] {'loss': 0.4735, 'grad_norm': 7.519778869225568, 'learning_rate': 9.799883785611647e-07, 'epoch': 0.72} 72%|███████▏ | 8826/12313 [6:36:50<2:35:20, 2.67s/it] 72%|███████▏ | 8827/12313 [6:36:52<2:33:14, 2.64s/it] {'loss': 0.4381, 'grad_norm': 5.5537814970796315, 'learning_rate': 9.794663229034518e-07, 'epoch': 0.72} 72%|███████▏ | 8827/12313 [6:36:52<2:33:14, 2.64s/it] 72%|███████▏ | 8828/12313 [6:36:55<2:35:12, 2.67s/it] {'loss': 0.4718, 'grad_norm': 19.344716592490663, 'learning_rate': 9.78944372458622e-07, 'epoch': 0.72} 72%|███████▏ | 8828/12313 [6:36:55<2:35:12, 2.67s/it] 72%|███████▏ | 8829/12313 [6:36:58<2:35:57, 2.69s/it] {'loss': 0.5988, 'grad_norm': 5.884182711703903, 'learning_rate': 9.784225272627908e-07, 'epoch': 0.72} 72%|███████▏ | 8829/12313 [6:36:58<2:35:57, 2.69s/it] 72%|███████▏ | 8830/12313 [6:37:00<2:39:26, 2.75s/it] {'loss': 0.3812, 'grad_norm': 4.891986811172294, 'learning_rate': 9.77900787352068e-07, 'epoch': 0.72} 72%|███████▏ | 8830/12313 [6:37:00<2:39:26, 2.75s/it] 72%|███████▏ | 8831/12313 [6:37:03<2:37:04, 2.71s/it] {'loss': 0.6522, 'grad_norm': 10.276801237301445, 'learning_rate': 9.773791527625557e-07, 'epoch': 0.72} 72%|███████▏ | 8831/12313 [6:37:03<2:37:04, 2.71s/it] 72%|███████▏ | 8832/12313 [6:37:06<2:33:49, 2.65s/it] {'loss': 0.5199, 'grad_norm': 6.1468764794545505, 'learning_rate': 9.76857623530347e-07, 'epoch': 0.72} 72%|███████▏ | 8832/12313 [6:37:06<2:33:49, 2.65s/it] 72%|███████▏ | 8833/12313 [6:37:08<2:34:57, 2.67s/it] {'loss': 0.5275, 'grad_norm': 3.8035905037114097, 'learning_rate': 9.763361996915302e-07, 'epoch': 0.72} 72%|███████▏ | 8833/12313 [6:37:08<2:34:57, 2.67s/it] 72%|███████▏ | 8834/12313 [6:37:11<2:33:50, 2.65s/it] {'loss': 0.5616, 'grad_norm': 6.897927489006048, 'learning_rate': 9.75814881282185e-07, 'epoch': 0.72} 72%|███████▏ | 8834/12313 [6:37:11<2:33:50, 2.65s/it] 72%|███████▏ | 8835/12313 [6:37:14<2:32:50, 2.64s/it] {'loss': 0.4339, 'grad_norm': 3.453903520221075, 'learning_rate': 9.752936683383822e-07, 'epoch': 0.72} 72%|███████▏ | 8835/12313 [6:37:14<2:32:50, 2.64s/it] 72%|███████▏ | 8836/12313 [6:37:16<2:29:03, 2.57s/it] {'loss': 0.5881, 'grad_norm': 4.99189702123876, 'learning_rate': 9.747725608961881e-07, 'epoch': 0.72} 72%|███████▏ | 8836/12313 [6:37:16<2:29:03, 2.57s/it] 72%|███████▏ | 8837/12313 [6:37:19<2:31:10, 2.61s/it] {'loss': 0.4863, 'grad_norm': 5.00035982299125, 'learning_rate': 9.742515589916615e-07, 'epoch': 0.72} 72%|███████▏ | 8837/12313 [6:37:19<2:31:10, 2.61s/it] 72%|███████▏ | 8838/12313 [6:37:21<2:31:27, 2.62s/it] {'loss': 0.5093, 'grad_norm': 4.906966447423704, 'learning_rate': 9.737306626608514e-07, 'epoch': 0.72} 72%|███████▏ | 8838/12313 [6:37:21<2:31:27, 2.62s/it] 72%|███████▏ | 8839/12313 [6:37:24<2:33:22, 2.65s/it] {'loss': 0.5323, 'grad_norm': 5.968171309664626, 'learning_rate': 9.732098719398025e-07, 'epoch': 0.72} 72%|███████▏ | 8839/12313 [6:37:24<2:33:22, 2.65s/it] 72%|███████▏ | 8840/12313 [6:37:27<2:31:16, 2.61s/it] {'loss': 0.4468, 'grad_norm': 4.720388953914238, 'learning_rate': 9.726891868645502e-07, 'epoch': 0.72} 72%|███████▏ | 8840/12313 [6:37:27<2:31:16, 2.61s/it] 72%|███████▏ | 8841/12313 [6:37:29<2:29:29, 2.58s/it] {'loss': 0.5385, 'grad_norm': 3.842805699155025, 'learning_rate': 9.721686074711228e-07, 'epoch': 0.72} 72%|███████▏ | 8841/12313 [6:37:29<2:29:29, 2.58s/it] 72%|███████▏ | 8842/12313 [6:37:31<2:26:14, 2.53s/it] {'loss': 0.4456, 'grad_norm': 9.473267130229601, 'learning_rate': 9.716481337955411e-07, 'epoch': 0.72} 72%|███████▏ | 8842/12313 [6:37:31<2:26:14, 2.53s/it] 72%|███████▏ | 8843/12313 [6:37:34<2:28:56, 2.58s/it] {'loss': 0.6716, 'grad_norm': 5.317653998758775, 'learning_rate': 9.711277658738197e-07, 'epoch': 0.72} 72%|███████▏ | 8843/12313 [6:37:34<2:28:56, 2.58s/it] 72%|███████▏ | 8844/12313 [6:37:37<2:28:08, 2.56s/it] {'loss': 0.553, 'grad_norm': 4.085741117429908, 'learning_rate': 9.706075037419666e-07, 'epoch': 0.72} 72%|███████▏ | 8844/12313 [6:37:37<2:28:08, 2.56s/it] 72%|███████▏ | 8845/12313 [6:37:39<2:29:36, 2.59s/it] {'loss': 0.4472, 'grad_norm': 21.53419743501485, 'learning_rate': 9.700873474359786e-07, 'epoch': 0.72} 72%|███████▏ | 8845/12313 [6:37:39<2:29:36, 2.59s/it] 72%|███████▏ | 8846/12313 [6:37:42<2:32:19, 2.64s/it] {'loss': 0.4094, 'grad_norm': 6.994061607271889, 'learning_rate': 9.695672969918508e-07, 'epoch': 0.72} 72%|███████▏ | 8846/12313 [6:37:42<2:32:19, 2.64s/it] 72%|███████▏ | 8847/12313 [6:37:45<2:31:05, 2.62s/it] {'loss': 0.4225, 'grad_norm': 4.554212460035874, 'learning_rate': 9.69047352445566e-07, 'epoch': 0.72} 72%|███████▏ | 8847/12313 [6:37:45<2:31:05, 2.62s/it] 72%|███████▏ | 8848/12313 [6:37:47<2:33:12, 2.65s/it] {'loss': 0.4877, 'grad_norm': 7.821491253661923, 'learning_rate': 9.68527513833101e-07, 'epoch': 0.72} 72%|███████▏ | 8848/12313 [6:37:47<2:33:12, 2.65s/it] 72%|███████▏ | 8849/12313 [6:37:50<2:33:22, 2.66s/it] {'loss': 0.3561, 'grad_norm': 3.1283429984031783, 'learning_rate': 9.68007781190427e-07, 'epoch': 0.72} 72%|███████▏ | 8849/12313 [6:37:50<2:33:22, 2.66s/it] 72%|███████▏ | 8850/12313 [6:37:53<2:31:40, 2.63s/it] {'loss': 0.5745, 'grad_norm': 5.628658133939152, 'learning_rate': 9.674881545535073e-07, 'epoch': 0.72} 72%|███████▏ | 8850/12313 [6:37:53<2:31:40, 2.63s/it] 72%|███████▏ | 8851/12313 [6:37:55<2:32:15, 2.64s/it] {'loss': 0.6452, 'grad_norm': 11.726251049043716, 'learning_rate': 9.669686339582959e-07, 'epoch': 0.72} 72%|███████▏ | 8851/12313 [6:37:55<2:32:15, 2.64s/it] 72%|███████▏ | 8852/12313 [6:37:58<2:33:28, 2.66s/it] {'loss': 0.559, 'grad_norm': 4.111507695592982, 'learning_rate': 9.664492194407425e-07, 'epoch': 0.72} 72%|███████▏ | 8852/12313 [6:37:58<2:33:28, 2.66s/it] 72%|███████▏ | 8853/12313 [6:38:01<2:31:51, 2.63s/it] {'loss': 0.4079, 'grad_norm': 6.507482959837951, 'learning_rate': 9.659299110367868e-07, 'epoch': 0.72} 72%|███████▏ | 8853/12313 [6:38:01<2:31:51, 2.63s/it] 72%|███████▏ | 8854/12313 [6:38:03<2:29:46, 2.60s/it] {'loss': 0.6939, 'grad_norm': 6.32505374062163, 'learning_rate': 9.654107087823613e-07, 'epoch': 0.72} 72%|███████▏ | 8854/12313 [6:38:03<2:29:46, 2.60s/it] 72%|███████▏ | 8855/12313 [6:38:06<2:32:21, 2.64s/it] {'loss': 0.6168, 'grad_norm': 7.74320454513088, 'learning_rate': 9.64891612713393e-07, 'epoch': 0.72} 72%|███████▏ | 8855/12313 [6:38:06<2:32:21, 2.64s/it] 72%|███████▏ | 8856/12313 [6:38:09<2:39:29, 2.77s/it] {'loss': 0.4821, 'grad_norm': 9.529588740791155, 'learning_rate': 9.643726228658017e-07, 'epoch': 0.72} 72%|███████▏ | 8856/12313 [6:38:09<2:39:29, 2.77s/it] 72%|███████▏ | 8857/12313 [6:38:12<2:39:09, 2.76s/it] {'loss': 0.4415, 'grad_norm': 5.534280027657455, 'learning_rate': 9.638537392754968e-07, 'epoch': 0.72} 72%|███████▏ | 8857/12313 [6:38:12<2:39:09, 2.76s/it] 72%|███████▏ | 8858/12313 [6:38:14<2:39:47, 2.77s/it] {'loss': 0.4648, 'grad_norm': 5.287392525426979, 'learning_rate': 9.63334961978384e-07, 'epoch': 0.72} 72%|███████▏ | 8858/12313 [6:38:14<2:39:47, 2.77s/it] 72%|███████▏ | 8859/12313 [6:38:17<2:35:42, 2.70s/it] {'loss': 0.4528, 'grad_norm': 5.411950885514694, 'learning_rate': 9.628162910103595e-07, 'epoch': 0.72} 72%|███████▏ | 8859/12313 [6:38:17<2:35:42, 2.70s/it] 72%|███████▏ | 8860/12313 [6:38:20<2:33:03, 2.66s/it] {'loss': 0.5357, 'grad_norm': 5.234971772019722, 'learning_rate': 9.62297726407312e-07, 'epoch': 0.72} 72%|███████▏ | 8860/12313 [6:38:20<2:33:03, 2.66s/it] 72%|███████▏ | 8861/12313 [6:38:22<2:29:02, 2.59s/it] {'loss': 0.4252, 'grad_norm': 3.5786127468446147, 'learning_rate': 9.617792682051228e-07, 'epoch': 0.72} 72%|███████▏ | 8861/12313 [6:38:22<2:29:02, 2.59s/it] 72%|███████▏ | 8862/12313 [6:38:25<2:28:48, 2.59s/it] {'loss': 0.4677, 'grad_norm': 4.004340303264486, 'learning_rate': 9.612609164396672e-07, 'epoch': 0.72} 72%|███████▏ | 8862/12313 [6:38:25<2:28:48, 2.59s/it] 72%|███████▏ | 8863/12313 [6:38:28<2:35:27, 2.70s/it] {'loss': 0.4476, 'grad_norm': 5.758524558297638, 'learning_rate': 9.607426711468135e-07, 'epoch': 0.72} 72%|███████▏ | 8863/12313 [6:38:28<2:35:27, 2.70s/it] 72%|███████▏ | 8864/12313 [6:38:30<2:34:44, 2.69s/it] {'loss': 0.4546, 'grad_norm': 8.535570347156773, 'learning_rate': 9.602245323624195e-07, 'epoch': 0.72} 72%|███████▏ | 8864/12313 [6:38:30<2:34:44, 2.69s/it] 72%|███████▏ | 8865/12313 [6:38:33<2:32:03, 2.65s/it] {'loss': 0.4607, 'grad_norm': 4.001330283022169, 'learning_rate': 9.597065001223397e-07, 'epoch': 0.72} 72%|███████▏ | 8865/12313 [6:38:33<2:32:03, 2.65s/it] 72%|███████▏ | 8866/12313 [6:38:36<2:37:28, 2.74s/it] {'loss': 0.3572, 'grad_norm': 4.377728948837983, 'learning_rate': 9.591885744624183e-07, 'epoch': 0.72} 72%|███████▏ | 8866/12313 [6:38:36<2:37:28, 2.74s/it] 72%|███████▏ | 8867/12313 [6:38:38<2:35:33, 2.71s/it] {'loss': 0.4552, 'grad_norm': 3.8588903316964442, 'learning_rate': 9.586707554184918e-07, 'epoch': 0.72} 72%|███████▏ | 8867/12313 [6:38:38<2:35:33, 2.71s/it] 72%|███████▏ | 8868/12313 [6:38:41<2:36:21, 2.72s/it] {'loss': 0.4036, 'grad_norm': 6.457542883089068, 'learning_rate': 9.581530430263919e-07, 'epoch': 0.72} 72%|███████▏ | 8868/12313 [6:38:41<2:36:21, 2.72s/it] 72%|███████▏ | 8869/12313 [6:38:44<2:33:16, 2.67s/it] {'loss': 0.2751, 'grad_norm': 6.505717474355659, 'learning_rate': 9.57635437321942e-07, 'epoch': 0.72} 72%|███████▏ | 8869/12313 [6:38:44<2:33:16, 2.67s/it] 72%|███████▏ | 8870/12313 [6:38:46<2:29:54, 2.61s/it] {'loss': 0.4116, 'grad_norm': 7.655308168808172, 'learning_rate': 9.571179383409561e-07, 'epoch': 0.72} 72%|███████▏ | 8870/12313 [6:38:46<2:29:54, 2.61s/it] 72%|███████▏ | 8871/12313 [6:38:49<2:33:12, 2.67s/it] {'loss': 0.4976, 'grad_norm': 3.1031621847917634, 'learning_rate': 9.566005461192444e-07, 'epoch': 0.72} 72%|███████▏ | 8871/12313 [6:38:49<2:33:12, 2.67s/it] 72%|███████▏ | 8872/12313 [6:38:52<2:38:27, 2.76s/it] {'loss': 0.4635, 'grad_norm': 6.088083430309052, 'learning_rate': 9.560832606926064e-07, 'epoch': 0.72} 72%|███████▏ | 8872/12313 [6:38:52<2:38:27, 2.76s/it] 72%|███████▏ | 8873/12313 [6:38:54<2:34:34, 2.70s/it] {'loss': 0.5112, 'grad_norm': 6.004449008237206, 'learning_rate': 9.55566082096835e-07, 'epoch': 0.72} 72%|███████▏ | 8873/12313 [6:38:54<2:34:34, 2.70s/it] 72%|███████▏ | 8874/12313 [6:38:57<2:37:06, 2.74s/it] {'loss': 0.3386, 'grad_norm': 9.024846945792124, 'learning_rate': 9.550490103677176e-07, 'epoch': 0.72} 72%|███████▏ | 8874/12313 [6:38:57<2:37:06, 2.74s/it] 72%|███████▏ | 8875/12313 [6:39:00<2:32:51, 2.67s/it] {'loss': 0.4822, 'grad_norm': 4.2699585973289915, 'learning_rate': 9.54532045541031e-07, 'epoch': 0.72} 72%|███████▏ | 8875/12313 [6:39:00<2:32:51, 2.67s/it] 72%|███████▏ | 8876/12313 [6:39:03<2:36:11, 2.73s/it] {'loss': 0.5532, 'grad_norm': 4.631080829221704, 'learning_rate': 9.54015187652548e-07, 'epoch': 0.72} 72%|███████▏ | 8876/12313 [6:39:03<2:36:11, 2.73s/it] 72%|███████▏ | 8877/12313 [6:39:05<2:35:59, 2.72s/it] {'loss': 0.5322, 'grad_norm': 6.8091971004552585, 'learning_rate': 9.534984367380329e-07, 'epoch': 0.72} 72%|███████▏ | 8877/12313 [6:39:05<2:35:59, 2.72s/it] 72%|███████▏ | 8878/12313 [6:39:08<2:42:21, 2.84s/it] {'loss': 0.523, 'grad_norm': 3.589150990709112, 'learning_rate': 9.529817928332411e-07, 'epoch': 0.72} 72%|███████▏ | 8878/12313 [6:39:08<2:42:21, 2.84s/it] 72%|███████▏ | 8879/12313 [6:39:11<2:37:08, 2.75s/it] {'loss': 0.3827, 'grad_norm': 8.421640744360598, 'learning_rate': 9.524652559739217e-07, 'epoch': 0.72} 72%|███████▏ | 8879/12313 [6:39:11<2:37:08, 2.75s/it] 72%|███████▏ | 8880/12313 [6:39:14<2:35:40, 2.72s/it] {'loss': 0.4339, 'grad_norm': 5.462634830370036, 'learning_rate': 9.519488261958157e-07, 'epoch': 0.72} 72%|███████▏ | 8880/12313 [6:39:14<2:35:40, 2.72s/it] 72%|███████▏ | 8881/12313 [6:39:16<2:33:02, 2.68s/it] {'loss': 0.5582, 'grad_norm': 6.652542616373296, 'learning_rate': 9.514325035346577e-07, 'epoch': 0.72} 72%|███████▏ | 8881/12313 [6:39:16<2:33:02, 2.68s/it] 72%|███████▏ | 8882/12313 [6:39:19<2:40:15, 2.80s/it] {'loss': 0.4287, 'grad_norm': 11.329982918292163, 'learning_rate': 9.509162880261757e-07, 'epoch': 0.72} 72%|███████▏ | 8882/12313 [6:39:19<2:40:15, 2.80s/it] 72%|███████▏ | 8883/12313 [6:39:22<2:38:15, 2.77s/it] {'loss': 0.3969, 'grad_norm': 4.346970981846385, 'learning_rate': 9.504001797060875e-07, 'epoch': 0.72} 72%|███████▏ | 8883/12313 [6:39:22<2:38:15, 2.77s/it] 72%|███████▏ | 8884/12313 [6:39:25<2:38:22, 2.77s/it] {'loss': 0.3896, 'grad_norm': 10.533676550458823, 'learning_rate': 9.498841786101065e-07, 'epoch': 0.72} 72%|███████▏ | 8884/12313 [6:39:25<2:38:22, 2.77s/it] 72%|███████▏ | 8885/12313 [6:39:28<2:37:53, 2.76s/it] {'loss': 0.4312, 'grad_norm': 6.522166801681026, 'learning_rate': 9.493682847739363e-07, 'epoch': 0.72} 72%|███████▏ | 8885/12313 [6:39:28<2:37:53, 2.76s/it] 72%|███████▏ | 8886/12313 [6:39:30<2:35:44, 2.73s/it] {'loss': 0.4908, 'grad_norm': 15.683352653559364, 'learning_rate': 9.488524982332734e-07, 'epoch': 0.72} 72%|███████▏ | 8886/12313 [6:39:30<2:35:44, 2.73s/it] 72%|███████▏ | 8887/12313 [6:39:33<2:33:12, 2.68s/it] {'loss': 0.604, 'grad_norm': 4.885082924071923, 'learning_rate': 9.483368190238093e-07, 'epoch': 0.72} 72%|███████▏ | 8887/12313 [6:39:33<2:33:12, 2.68s/it] 72%|███████▏ | 8888/12313 [6:39:35<2:30:45, 2.64s/it] {'loss': 0.6215, 'grad_norm': 6.215187035651405, 'learning_rate': 9.478212471812242e-07, 'epoch': 0.72} 72%|███████▏ | 8888/12313 [6:39:35<2:30:45, 2.64s/it] 72%|███████▏ | 8889/12313 [6:39:38<2:29:01, 2.61s/it] {'loss': 0.46, 'grad_norm': 6.199409295607022, 'learning_rate': 9.473057827411941e-07, 'epoch': 0.72} 72%|███████▏ | 8889/12313 [6:39:38<2:29:01, 2.61s/it] 72%|███████▏ | 8890/12313 [6:39:41<2:34:48, 2.71s/it] {'loss': 0.4037, 'grad_norm': 5.210688836874202, 'learning_rate': 9.467904257393873e-07, 'epoch': 0.72} 72%|███████▏ | 8890/12313 [6:39:41<2:34:48, 2.71s/it] 72%|███████▏ | 8891/12313 [6:39:43<2:30:37, 2.64s/it] {'loss': 0.5964, 'grad_norm': 5.580114042556584, 'learning_rate': 9.462751762114625e-07, 'epoch': 0.72} 72%|███████▏ | 8891/12313 [6:39:43<2:30:37, 2.64s/it] 72%|███████▏ | 8892/12313 [6:39:46<2:27:02, 2.58s/it] {'loss': 0.5682, 'grad_norm': 5.621455969727306, 'learning_rate': 9.45760034193072e-07, 'epoch': 0.72} 72%|███████▏ | 8892/12313 [6:39:46<2:27:02, 2.58s/it] 72%|███████▏ | 8893/12313 [6:39:48<2:29:37, 2.63s/it] {'loss': 0.3895, 'grad_norm': 11.91702063843931, 'learning_rate': 9.45244999719862e-07, 'epoch': 0.72} 72%|███████▏ | 8893/12313 [6:39:48<2:29:37, 2.63s/it] 72%|███████▏ | 8894/12313 [6:39:51<2:28:50, 2.61s/it] {'loss': 0.3848, 'grad_norm': 4.231827622581398, 'learning_rate': 9.447300728274689e-07, 'epoch': 0.72} 72%|███████▏ | 8894/12313 [6:39:51<2:28:50, 2.61s/it] 72%|███████▏ | 8895/12313 [6:39:53<2:26:48, 2.58s/it] {'loss': 0.4818, 'grad_norm': 4.914560874009442, 'learning_rate': 9.442152535515245e-07, 'epoch': 0.72} 72%|███████▏ | 8895/12313 [6:39:53<2:26:48, 2.58s/it] 72%|███████▏ | 8896/12313 [6:39:56<2:27:32, 2.59s/it] {'loss': 0.5125, 'grad_norm': 3.3941269423081386, 'learning_rate': 9.437005419276496e-07, 'epoch': 0.72} 72%|███████▏ | 8896/12313 [6:39:56<2:27:32, 2.59s/it] 72%|███████▏ | 8897/12313 [6:39:59<2:29:19, 2.62s/it] {'loss': 0.5125, 'grad_norm': 7.1114894868029515, 'learning_rate': 9.431859379914615e-07, 'epoch': 0.72} 72%|███████▏ | 8897/12313 [6:39:59<2:29:19, 2.62s/it] 72%|███████▏ | 8898/12313 [6:40:01<2:25:57, 2.56s/it] {'loss': 0.4268, 'grad_norm': 7.590278377377921, 'learning_rate': 9.426714417785673e-07, 'epoch': 0.72} 72%|███████▏ | 8898/12313 [6:40:01<2:25:57, 2.56s/it] 72%|███████▏ | 8899/12313 [6:40:04<2:25:05, 2.55s/it] {'loss': 0.6289, 'grad_norm': 8.118555760980742, 'learning_rate': 9.421570533245663e-07, 'epoch': 0.72} 72%|███████▏ | 8899/12313 [6:40:04<2:25:05, 2.55s/it] 72%|███████▏ | 8900/12313 [6:40:06<2:25:23, 2.56s/it] {'loss': 0.4768, 'grad_norm': 4.8344990060798025, 'learning_rate': 9.416427726650535e-07, 'epoch': 0.72} 72%|███████▏ | 8900/12313 [6:40:06<2:25:23, 2.56s/it] 72%|███████▏ | 8901/12313 [6:40:09<2:27:55, 2.60s/it] {'loss': 0.588, 'grad_norm': 3.812357829292625, 'learning_rate': 9.411285998356124e-07, 'epoch': 0.72} 72%|███████▏ | 8901/12313 [6:40:09<2:27:55, 2.60s/it] 72%|███████▏ | 8902/12313 [6:40:12<2:36:42, 2.76s/it] {'loss': 0.6037, 'grad_norm': 4.3624783009325805, 'learning_rate': 9.406145348718218e-07, 'epoch': 0.72} 72%|███████▏ | 8902/12313 [6:40:12<2:36:42, 2.76s/it] 72%|███████▏ | 8903/12313 [6:40:15<2:36:10, 2.75s/it] {'loss': 0.5546, 'grad_norm': 7.735840837751049, 'learning_rate': 9.401005778092537e-07, 'epoch': 0.72} 72%|███████▏ | 8903/12313 [6:40:15<2:36:10, 2.75s/it] 72%|███████▏ | 8904/12313 [6:40:18<2:34:40, 2.72s/it] {'loss': 0.4084, 'grad_norm': 4.2176648353846184, 'learning_rate': 9.395867286834695e-07, 'epoch': 0.72} 72%|███████▏ | 8904/12313 [6:40:18<2:34:40, 2.72s/it] 72%|███████▏ | 8905/12313 [6:40:20<2:33:43, 2.71s/it] {'loss': 0.3893, 'grad_norm': 10.500833125072006, 'learning_rate': 9.390729875300247e-07, 'epoch': 0.72} 72%|███████▏ | 8905/12313 [6:40:20<2:33:43, 2.71s/it] 72%|███████▏ | 8906/12313 [6:40:23<2:29:47, 2.64s/it] {'loss': 0.48, 'grad_norm': 5.323683099575961, 'learning_rate': 9.38559354384469e-07, 'epoch': 0.72} 72%|███████▏ | 8906/12313 [6:40:23<2:29:47, 2.64s/it] 72%|███████▏ | 8907/12313 [6:40:26<2:33:51, 2.71s/it] {'loss': 0.6171, 'grad_norm': 4.623752481785489, 'learning_rate': 9.38045829282341e-07, 'epoch': 0.72} 72%|███████▏ | 8907/12313 [6:40:26<2:33:51, 2.71s/it] 72%|███████▏ | 8908/12313 [6:40:28<2:33:02, 2.70s/it] {'loss': 0.5059, 'grad_norm': 7.855386690391926, 'learning_rate': 9.375324122591753e-07, 'epoch': 0.72} 72%|███████▏ | 8908/12313 [6:40:28<2:33:02, 2.70s/it] 72%|███████▏ | 8909/12313 [6:40:31<2:33:16, 2.70s/it] {'loss': 0.5279, 'grad_norm': 4.500533084765437, 'learning_rate': 9.370191033504982e-07, 'epoch': 0.72} 72%|███████▏ | 8909/12313 [6:40:31<2:33:16, 2.70s/it] 72%|███████▏ | 8910/12313 [6:40:34<2:33:36, 2.71s/it] {'loss': 0.4437, 'grad_norm': 7.958925600183833, 'learning_rate': 9.365059025918274e-07, 'epoch': 0.72} 72%|███████▏ | 8910/12313 [6:40:34<2:33:36, 2.71s/it] 72%|███████▏ | 8911/12313 [6:40:36<2:33:25, 2.71s/it] {'loss': 0.5215, 'grad_norm': 3.95735207574078, 'learning_rate': 9.359928100186724e-07, 'epoch': 0.72} 72%|███████▏ | 8911/12313 [6:40:36<2:33:25, 2.71s/it] 72%|███████▏ | 8912/12313 [6:40:39<2:33:26, 2.71s/it] {'loss': 0.474, 'grad_norm': 11.501156203158045, 'learning_rate': 9.354798256665384e-07, 'epoch': 0.72} 72%|███████▏ | 8912/12313 [6:40:39<2:33:26, 2.71s/it] 72%|███████▏ | 8913/12313 [6:40:42<2:35:37, 2.75s/it] {'loss': 0.3921, 'grad_norm': 4.487504865241946, 'learning_rate': 9.349669495709208e-07, 'epoch': 0.72} 72%|███████▏ | 8913/12313 [6:40:42<2:35:37, 2.75s/it] 72%|███████▏ | 8914/12313 [6:40:45<2:37:31, 2.78s/it] {'loss': 0.5789, 'grad_norm': 8.160345585558959, 'learning_rate': 9.344541817673061e-07, 'epoch': 0.72} 72%|███████▏ | 8914/12313 [6:40:45<2:37:31, 2.78s/it] 72%|███████▏ | 8915/12313 [6:40:48<2:37:20, 2.78s/it] {'loss': 0.5123, 'grad_norm': 5.808899053157705, 'learning_rate': 9.339415222911766e-07, 'epoch': 0.72} 72%|███████▏ | 8915/12313 [6:40:48<2:37:20, 2.78s/it] 72%|███████▏ | 8916/12313 [6:40:50<2:36:33, 2.77s/it] {'loss': 0.7287, 'grad_norm': 3.2770410237558165, 'learning_rate': 9.334289711780062e-07, 'epoch': 0.72} 72%|███████▏ | 8916/12313 [6:40:50<2:36:33, 2.77s/it] 72%|███████▏ | 8917/12313 [6:40:53<2:39:08, 2.81s/it] {'loss': 0.4535, 'grad_norm': 6.106039871460762, 'learning_rate': 9.329165284632602e-07, 'epoch': 0.72} 72%|███████▏ | 8917/12313 [6:40:53<2:39:08, 2.81s/it] 72%|███████▏ | 8918/12313 [6:40:56<2:42:47, 2.88s/it] {'loss': 0.4845, 'grad_norm': 4.5914206885164, 'learning_rate': 9.324041941823961e-07, 'epoch': 0.72} 72%|███████▏ | 8918/12313 [6:40:56<2:42:47, 2.88s/it] 72%|███████▏ | 8919/12313 [6:40:59<2:42:25, 2.87s/it] {'loss': 0.5425, 'grad_norm': 8.623594737117811, 'learning_rate': 9.318919683708661e-07, 'epoch': 0.72} 72%|███████▏ | 8919/12313 [6:40:59<2:42:25, 2.87s/it] 72%|███████▏ | 8920/12313 [6:41:02<2:39:12, 2.82s/it] {'loss': 0.507, 'grad_norm': 5.503813897391588, 'learning_rate': 9.313798510641117e-07, 'epoch': 0.72} 72%|███████▏ | 8920/12313 [6:41:02<2:39:12, 2.82s/it] 72%|███████▏ | 8921/12313 [6:41:05<2:38:34, 2.81s/it] {'loss': 0.4499, 'grad_norm': 27.071015075508583, 'learning_rate': 9.308678422975701e-07, 'epoch': 0.72} 72%|███████▏ | 8921/12313 [6:41:05<2:38:34, 2.81s/it] 72%|███████▏ | 8922/12313 [6:41:07<2:34:25, 2.73s/it] {'loss': 0.4108, 'grad_norm': 5.264809337126668, 'learning_rate': 9.303559421066699e-07, 'epoch': 0.72} 72%|███████▏ | 8922/12313 [6:41:07<2:34:25, 2.73s/it] 72%|███████▏ | 8923/12313 [6:41:10<2:30:17, 2.66s/it] {'loss': 0.459, 'grad_norm': 4.739116238844419, 'learning_rate': 9.298441505268316e-07, 'epoch': 0.72} 72%|███████▏ | 8923/12313 [6:41:10<2:30:17, 2.66s/it] 72%|███████▏ | 8924/12313 [6:41:12<2:31:39, 2.69s/it] {'loss': 0.4787, 'grad_norm': 7.57503395698494, 'learning_rate': 9.29332467593467e-07, 'epoch': 0.72} 72%|███████▏ | 8924/12313 [6:41:12<2:31:39, 2.69s/it] 72%|███████▏ | 8925/12313 [6:41:15<2:32:52, 2.71s/it] {'loss': 0.4384, 'grad_norm': 7.320416604446665, 'learning_rate': 9.28820893341984e-07, 'epoch': 0.72} 72%|███████▏ | 8925/12313 [6:41:15<2:32:52, 2.71s/it] 72%|███████▏ | 8926/12313 [6:41:18<2:30:20, 2.66s/it] {'loss': 0.5397, 'grad_norm': 8.330410195117771, 'learning_rate': 9.28309427807779e-07, 'epoch': 0.72} 72%|███████▏ | 8926/12313 [6:41:18<2:30:20, 2.66s/it] 73%|███████▎ | 8927/12313 [6:41:20<2:29:23, 2.65s/it] {'loss': 0.5418, 'grad_norm': 8.653115313732869, 'learning_rate': 9.277980710262432e-07, 'epoch': 0.73} 73%|███████▎ | 8927/12313 [6:41:20<2:29:23, 2.65s/it] 73%|███████▎ | 8928/12313 [6:41:23<2:28:25, 2.63s/it] {'loss': 0.4397, 'grad_norm': 5.712763483431208, 'learning_rate': 9.272868230327614e-07, 'epoch': 0.73} 73%|███████▎ | 8928/12313 [6:41:23<2:28:25, 2.63s/it] 73%|███████▎ | 8929/12313 [6:41:26<2:29:52, 2.66s/it] {'loss': 0.619, 'grad_norm': 7.20946070132443, 'learning_rate': 9.267756838627079e-07, 'epoch': 0.73} 73%|███████▎ | 8929/12313 [6:41:26<2:29:52, 2.66s/it] 73%|███████▎ | 8930/12313 [6:41:28<2:28:28, 2.63s/it] {'loss': 0.5411, 'grad_norm': 3.836932313032167, 'learning_rate': 9.262646535514499e-07, 'epoch': 0.73} 73%|███████▎ | 8930/12313 [6:41:28<2:28:28, 2.63s/it] 73%|███████▎ | 8931/12313 [6:41:31<2:29:06, 2.65s/it] {'loss': 0.5804, 'grad_norm': 7.397925686014042, 'learning_rate': 9.257537321343499e-07, 'epoch': 0.73} 73%|███████▎ | 8931/12313 [6:41:31<2:29:06, 2.65s/it] 73%|███████▎ | 8932/12313 [6:41:33<2:28:09, 2.63s/it] {'loss': 0.5053, 'grad_norm': 7.414052669733408, 'learning_rate': 9.252429196467603e-07, 'epoch': 0.73} 73%|███████▎ | 8932/12313 [6:41:33<2:28:09, 2.63s/it] 73%|███████▎ | 8933/12313 [6:41:36<2:29:06, 2.65s/it] {'loss': 0.4038, 'grad_norm': 6.780735171626088, 'learning_rate': 9.247322161240252e-07, 'epoch': 0.73} 73%|███████▎ | 8933/12313 [6:41:36<2:29:06, 2.65s/it] 73%|███████▎ | 8934/12313 [6:41:39<2:29:00, 2.65s/it] {'loss': 0.6769, 'grad_norm': 4.08445905007205, 'learning_rate': 9.242216216014838e-07, 'epoch': 0.73} 73%|███████▎ | 8934/12313 [6:41:39<2:29:00, 2.65s/it] 73%|███████▎ | 8935/12313 [6:41:41<2:30:12, 2.67s/it] {'loss': 0.3345, 'grad_norm': 5.6918275065058275, 'learning_rate': 9.237111361144674e-07, 'epoch': 0.73} 73%|███████▎ | 8935/12313 [6:41:41<2:30:12, 2.67s/it] 73%|███████▎ | 8936/12313 [6:41:44<2:28:14, 2.63s/it] {'loss': 0.5265, 'grad_norm': 5.320211088421992, 'learning_rate': 9.232007596982978e-07, 'epoch': 0.73} 73%|███████▎ | 8936/12313 [6:41:44<2:28:14, 2.63s/it] 73%|███████▎ | 8937/12313 [6:41:47<2:30:40, 2.68s/it] {'loss': 0.5231, 'grad_norm': 5.5679637549611165, 'learning_rate': 9.226904923882901e-07, 'epoch': 0.73} 73%|███████▎ | 8937/12313 [6:41:47<2:30:40, 2.68s/it] 73%|███████▎ | 8938/12313 [6:41:49<2:29:31, 2.66s/it] {'loss': 0.4251, 'grad_norm': 9.0059520761274, 'learning_rate': 9.22180334219753e-07, 'epoch': 0.73} 73%|███████▎ | 8938/12313 [6:41:49<2:29:31, 2.66s/it] 73%|███████▎ | 8939/12313 [6:41:52<2:28:49, 2.65s/it] {'loss': 0.5836, 'grad_norm': 7.189840208919893, 'learning_rate': 9.216702852279857e-07, 'epoch': 0.73} 73%|███████▎ | 8939/12313 [6:41:52<2:28:49, 2.65s/it] 73%|███████▎ | 8940/12313 [6:41:55<2:30:00, 2.67s/it] {'loss': 0.5427, 'grad_norm': 8.24631932538056, 'learning_rate': 9.211603454482812e-07, 'epoch': 0.73} 73%|███████▎ | 8940/12313 [6:41:55<2:30:00, 2.67s/it] 73%|███████▎ | 8941/12313 [6:41:58<2:31:12, 2.69s/it] {'loss': 0.5369, 'grad_norm': 11.91801174885722, 'learning_rate': 9.206505149159259e-07, 'epoch': 0.73} 73%|███████▎ | 8941/12313 [6:41:58<2:31:12, 2.69s/it] 73%|███████▎ | 8942/12313 [6:42:01<2:40:26, 2.86s/it] {'loss': 0.6227, 'grad_norm': 4.788846104575723, 'learning_rate': 9.201407936661963e-07, 'epoch': 0.73} 73%|███████▎ | 8942/12313 [6:42:01<2:40:26, 2.86s/it] 73%|███████▎ | 8943/12313 [6:42:03<2:35:59, 2.78s/it] {'loss': 0.4761, 'grad_norm': 4.572117460437812, 'learning_rate': 9.196311817343618e-07, 'epoch': 0.73} 73%|███████▎ | 8943/12313 [6:42:03<2:35:59, 2.78s/it] 73%|███████▎ | 8944/12313 [6:42:06<2:34:14, 2.75s/it] {'loss': 0.494, 'grad_norm': 6.432401948520424, 'learning_rate': 9.191216791556864e-07, 'epoch': 0.73} 73%|███████▎ | 8944/12313 [6:42:06<2:34:14, 2.75s/it] 73%|███████▎ | 8945/12313 [6:42:09<2:33:03, 2.73s/it] {'loss': 0.4724, 'grad_norm': 3.666688290179581, 'learning_rate': 9.18612285965424e-07, 'epoch': 0.73} 73%|███████▎ | 8945/12313 [6:42:09<2:33:03, 2.73s/it] 73%|███████▎ | 8946/12313 [6:42:12<2:36:35, 2.79s/it] {'loss': 0.6309, 'grad_norm': 3.558045855054741, 'learning_rate': 9.18103002198821e-07, 'epoch': 0.73} 73%|███████▎ | 8946/12313 [6:42:12<2:36:35, 2.79s/it] 73%|███████▎ | 8947/12313 [6:42:14<2:36:55, 2.80s/it] {'loss': 0.5463, 'grad_norm': 5.292568438307435, 'learning_rate': 9.175938278911184e-07, 'epoch': 0.73} 73%|███████▎ | 8947/12313 [6:42:14<2:36:55, 2.80s/it] 73%|███████▎ | 8948/12313 [6:42:17<2:33:50, 2.74s/it] {'loss': 0.4128, 'grad_norm': 8.253942436158207, 'learning_rate': 9.170847630775489e-07, 'epoch': 0.73} 73%|███████▎ | 8948/12313 [6:42:17<2:33:50, 2.74s/it] 73%|███████▎ | 8949/12313 [6:42:20<2:31:23, 2.70s/it] {'loss': 0.5094, 'grad_norm': 12.693589172331533, 'learning_rate': 9.165758077933365e-07, 'epoch': 0.73} 73%|███████▎ | 8949/12313 [6:42:20<2:31:23, 2.70s/it] 73%|███████▎ | 8950/12313 [6:42:22<2:30:56, 2.69s/it] {'loss': 0.5676, 'grad_norm': 4.3211523742949876, 'learning_rate': 9.160669620736973e-07, 'epoch': 0.73} 73%|███████▎ | 8950/12313 [6:42:22<2:30:56, 2.69s/it] 73%|███████▎ | 8951/12313 [6:42:25<2:36:19, 2.79s/it] {'loss': 0.5242, 'grad_norm': 11.192017775213207, 'learning_rate': 9.15558225953842e-07, 'epoch': 0.73} 73%|███████▎ | 8951/12313 [6:42:25<2:36:19, 2.79s/it] 73%|███████▎ | 8952/12313 [6:42:28<2:34:14, 2.75s/it] {'loss': 0.4027, 'grad_norm': 25.015456026993192, 'learning_rate': 9.150495994689712e-07, 'epoch': 0.73} 73%|███████▎ | 8952/12313 [6:42:28<2:34:14, 2.75s/it] 73%|███████▎ | 8953/12313 [6:42:31<2:32:36, 2.73s/it] {'loss': 0.4545, 'grad_norm': 5.152075474730206, 'learning_rate': 9.145410826542797e-07, 'epoch': 0.73} 73%|███████▎ | 8953/12313 [6:42:31<2:32:36, 2.73s/it] 73%|███████▎ | 8954/12313 [6:42:34<2:36:00, 2.79s/it] {'loss': 0.4933, 'grad_norm': 6.684848268615437, 'learning_rate': 9.140326755449555e-07, 'epoch': 0.73} 73%|███████▎ | 8954/12313 [6:42:34<2:36:00, 2.79s/it] 73%|███████▎ | 8955/12313 [6:42:36<2:31:24, 2.71s/it] {'loss': 0.4792, 'grad_norm': 5.088644961893354, 'learning_rate': 9.135243781761763e-07, 'epoch': 0.73} 73%|███████▎ | 8955/12313 [6:42:36<2:31:24, 2.71s/it] 73%|███████▎ | 8956/12313 [6:42:39<2:30:35, 2.69s/it] {'loss': 0.5671, 'grad_norm': 3.810037495766052, 'learning_rate': 9.130161905831131e-07, 'epoch': 0.73} 73%|███████▎ | 8956/12313 [6:42:39<2:30:35, 2.69s/it] 73%|███████▎ | 8957/12313 [6:42:42<2:31:22, 2.71s/it] {'loss': 0.4428, 'grad_norm': 8.122117526021382, 'learning_rate': 9.125081128009314e-07, 'epoch': 0.73} 73%|███████▎ | 8957/12313 [6:42:42<2:31:22, 2.71s/it] 73%|███████▎ | 8958/12313 [6:42:44<2:30:30, 2.69s/it] {'loss': 0.4813, 'grad_norm': 6.6487819565990325, 'learning_rate': 9.120001448647867e-07, 'epoch': 0.73} 73%|███████▎ | 8958/12313 [6:42:44<2:30:30, 2.69s/it] 73%|███████▎ | 8959/12313 [6:42:47<2:30:17, 2.69s/it] {'loss': 0.4352, 'grad_norm': 6.992838856608024, 'learning_rate': 9.114922868098267e-07, 'epoch': 0.73} 73%|███████▎ | 8959/12313 [6:42:47<2:30:17, 2.69s/it] 73%|███████▎ | 8960/12313 [6:42:49<2:28:55, 2.66s/it] {'loss': 0.5618, 'grad_norm': 4.483263285402361, 'learning_rate': 9.109845386711932e-07, 'epoch': 0.73} 73%|███████▎ | 8960/12313 [6:42:49<2:28:55, 2.66s/it] 73%|███████▎ | 8961/12313 [6:42:52<2:27:45, 2.64s/it] {'loss': 0.5705, 'grad_norm': 4.2451531883975235, 'learning_rate': 9.104769004840208e-07, 'epoch': 0.73} 73%|███████▎ | 8961/12313 [6:42:52<2:27:45, 2.64s/it] 73%|███████▎ | 8962/12313 [6:42:55<2:30:47, 2.70s/it] {'loss': 0.3645, 'grad_norm': 3.5682877453830186, 'learning_rate': 9.099693722834336e-07, 'epoch': 0.73} 73%|███████▎ | 8962/12313 [6:42:55<2:30:47, 2.70s/it] 73%|███████▎ | 8963/12313 [6:42:58<2:29:39, 2.68s/it] {'loss': 0.6062, 'grad_norm': 4.835617927031836, 'learning_rate': 9.094619541045516e-07, 'epoch': 0.73} 73%|███████▎ | 8963/12313 [6:42:58<2:29:39, 2.68s/it] 73%|███████▎ | 8964/12313 [6:43:00<2:26:02, 2.62s/it] {'loss': 0.441, 'grad_norm': 4.723308760739116, 'learning_rate': 9.089546459824846e-07, 'epoch': 0.73} 73%|███████▎ | 8964/12313 [6:43:00<2:26:02, 2.62s/it] 73%|███████▎ | 8965/12313 [6:43:03<2:32:38, 2.74s/it] {'loss': 0.4843, 'grad_norm': 5.594812767897018, 'learning_rate': 9.084474479523347e-07, 'epoch': 0.73} 73%|███████▎ | 8965/12313 [6:43:03<2:32:38, 2.74s/it] 73%|███████▎ | 8966/12313 [6:43:06<2:31:02, 2.71s/it] {'loss': 0.4151, 'grad_norm': 5.966924171312079, 'learning_rate': 9.079403600491982e-07, 'epoch': 0.73} 73%|███████▎ | 8966/12313 [6:43:06<2:31:02, 2.71s/it] 73%|███████▎ | 8967/12313 [6:43:08<2:32:25, 2.73s/it] {'loss': 0.5172, 'grad_norm': 4.336193418701047, 'learning_rate': 9.074333823081638e-07, 'epoch': 0.73} 73%|███████▎ | 8967/12313 [6:43:08<2:32:25, 2.73s/it] 73%|███████▎ | 8968/12313 [6:43:11<2:29:28, 2.68s/it] {'loss': 0.4559, 'grad_norm': 6.063173388232329, 'learning_rate': 9.069265147643109e-07, 'epoch': 0.73} 73%|███████▎ | 8968/12313 [6:43:11<2:29:28, 2.68s/it] 73%|███████▎ | 8969/12313 [6:43:14<2:29:34, 2.68s/it] {'loss': 0.5929, 'grad_norm': 4.628275139282452, 'learning_rate': 9.064197574527112e-07, 'epoch': 0.73} 73%|███████▎ | 8969/12313 [6:43:14<2:29:34, 2.68s/it] 73%|███████▎ | 8970/12313 [6:43:17<2:34:21, 2.77s/it] {'loss': 0.6199, 'grad_norm': 3.7566474590811447, 'learning_rate': 9.059131104084309e-07, 'epoch': 0.73} 73%|███████▎ | 8970/12313 [6:43:17<2:34:21, 2.77s/it] 73%|███████▎ | 8971/12313 [6:43:19<2:28:37, 2.67s/it] {'loss': 0.519, 'grad_norm': 5.357753313412571, 'learning_rate': 9.054065736665268e-07, 'epoch': 0.73} 73%|███████▎ | 8971/12313 [6:43:19<2:28:37, 2.67s/it] 73%|███████▎ | 8972/12313 [6:43:22<2:28:58, 2.68s/it] {'loss': 0.6267, 'grad_norm': 5.601299513674894, 'learning_rate': 9.049001472620481e-07, 'epoch': 0.73} 73%|███████▎ | 8972/12313 [6:43:22<2:28:58, 2.68s/it] 73%|███████▎ | 8973/12313 [6:43:25<2:29:20, 2.68s/it] {'loss': 0.4893, 'grad_norm': 6.85651492063141, 'learning_rate': 9.043938312300368e-07, 'epoch': 0.73} 73%|███████▎ | 8973/12313 [6:43:25<2:29:20, 2.68s/it] 73%|███████▎ | 8974/12313 [6:43:27<2:29:01, 2.68s/it] {'loss': 0.3582, 'grad_norm': 6.764798797003058, 'learning_rate': 9.038876256055288e-07, 'epoch': 0.73} 73%|███████▎ | 8974/12313 [6:43:27<2:29:01, 2.68s/it] 73%|███████▎ | 8975/12313 [6:43:30<2:29:00, 2.68s/it] {'loss': 0.412, 'grad_norm': 6.409476783815426, 'learning_rate': 9.033815304235488e-07, 'epoch': 0.73} 73%|███████▎ | 8975/12313 [6:43:30<2:29:00, 2.68s/it] 73%|███████▎ | 8976/12313 [6:43:32<2:24:43, 2.60s/it] {'loss': 0.3983, 'grad_norm': 18.649132110530104, 'learning_rate': 9.028755457191179e-07, 'epoch': 0.73} 73%|███████▎ | 8976/12313 [6:43:32<2:24:43, 2.60s/it] 73%|███████▎ | 8977/12313 [6:43:35<2:24:25, 2.60s/it] {'loss': 0.3894, 'grad_norm': 6.513807711538606, 'learning_rate': 9.023696715272468e-07, 'epoch': 0.73} 73%|███████▎ | 8977/12313 [6:43:35<2:24:25, 2.60s/it] 73%|███████▎ | 8978/12313 [6:43:38<2:25:15, 2.61s/it] {'loss': 0.4124, 'grad_norm': 6.299055569156435, 'learning_rate': 9.018639078829378e-07, 'epoch': 0.73} 73%|███████▎ | 8978/12313 [6:43:38<2:25:15, 2.61s/it] 73%|███████▎ | 8979/12313 [6:43:40<2:26:42, 2.64s/it] {'loss': 0.568, 'grad_norm': 3.514477375819592, 'learning_rate': 9.013582548211885e-07, 'epoch': 0.73} 73%|███████▎ | 8979/12313 [6:43:40<2:26:42, 2.64s/it] 73%|███████▎ | 8980/12313 [6:43:43<2:26:33, 2.64s/it] {'loss': 0.4293, 'grad_norm': 3.3460958135923318, 'learning_rate': 9.008527123769883e-07, 'epoch': 0.73} 73%|███████▎ | 8980/12313 [6:43:43<2:26:33, 2.64s/it] 73%|███████▎ | 8981/12313 [6:43:46<2:28:54, 2.68s/it] {'loss': 0.3699, 'grad_norm': 4.436979913495132, 'learning_rate': 9.003472805853161e-07, 'epoch': 0.73} 73%|███████▎ | 8981/12313 [6:43:46<2:28:54, 2.68s/it] 73%|███████▎ | 8982/12313 [6:43:48<2:29:00, 2.68s/it] {'loss': 0.5529, 'grad_norm': 5.276006150098665, 'learning_rate': 8.998419594811467e-07, 'epoch': 0.73} 73%|███████▎ | 8982/12313 [6:43:48<2:29:00, 2.68s/it] 73%|███████▎ | 8983/12313 [6:43:51<2:24:09, 2.60s/it] {'loss': 0.4417, 'grad_norm': 4.525908461414848, 'learning_rate': 8.993367490994451e-07, 'epoch': 0.73} 73%|███████▎ | 8983/12313 [6:43:51<2:24:09, 2.60s/it] 73%|███████▎ | 8984/12313 [6:43:54<2:30:23, 2.71s/it] {'loss': 0.4299, 'grad_norm': 6.180441338511432, 'learning_rate': 8.988316494751683e-07, 'epoch': 0.73} 73%|███████▎ | 8984/12313 [6:43:54<2:30:23, 2.71s/it] 73%|███████▎ | 8985/12313 [6:43:56<2:29:40, 2.70s/it] {'loss': 0.5197, 'grad_norm': 3.4284667198540983, 'learning_rate': 8.983266606432672e-07, 'epoch': 0.73} 73%|███████▎ | 8985/12313 [6:43:56<2:29:40, 2.70s/it] 73%|███████▎ | 8986/12313 [6:43:59<2:27:45, 2.66s/it] {'loss': 0.5724, 'grad_norm': 4.57248481351034, 'learning_rate': 8.978217826386853e-07, 'epoch': 0.73} 73%|███████▎ | 8986/12313 [6:43:59<2:27:45, 2.66s/it] 73%|███████▎ | 8987/12313 [6:44:02<2:27:20, 2.66s/it] {'loss': 0.5855, 'grad_norm': 3.3815176907013984, 'learning_rate': 8.973170154963567e-07, 'epoch': 0.73} 73%|███████▎ | 8987/12313 [6:44:02<2:27:20, 2.66s/it] 73%|███████▎ | 8988/12313 [6:44:04<2:24:14, 2.60s/it] {'loss': 0.5791, 'grad_norm': 4.911910066001664, 'learning_rate': 8.968123592512076e-07, 'epoch': 0.73} 73%|███████▎ | 8988/12313 [6:44:04<2:24:14, 2.60s/it] 73%|███████▎ | 8989/12313 [6:44:07<2:25:35, 2.63s/it] {'loss': 0.4979, 'grad_norm': 6.250038591037473, 'learning_rate': 8.963078139381595e-07, 'epoch': 0.73} 73%|███████▎ | 8989/12313 [6:44:07<2:25:35, 2.63s/it] 73%|███████▎ | 8990/12313 [6:44:09<2:26:53, 2.65s/it] {'loss': 0.4881, 'grad_norm': 4.472456064967235, 'learning_rate': 8.958033795921231e-07, 'epoch': 0.73} 73%|███████▎ | 8990/12313 [6:44:09<2:26:53, 2.65s/it] 73%|███████▎ | 8991/12313 [6:44:12<2:26:01, 2.64s/it] {'loss': 0.3618, 'grad_norm': 6.163691073085615, 'learning_rate': 8.952990562480021e-07, 'epoch': 0.73} 73%|███████▎ | 8991/12313 [6:44:12<2:26:01, 2.64s/it] 73%|███████▎ | 8992/12313 [6:44:15<2:28:36, 2.69s/it] {'loss': 0.6202, 'grad_norm': 6.473412100775378, 'learning_rate': 8.947948439406934e-07, 'epoch': 0.73} 73%|███████▎ | 8992/12313 [6:44:15<2:28:36, 2.69s/it] 73%|███████▎ | 8993/12313 [6:44:18<2:31:27, 2.74s/it] {'loss': 0.4327, 'grad_norm': 6.060384183610972, 'learning_rate': 8.94290742705087e-07, 'epoch': 0.73} 73%|███████▎ | 8993/12313 [6:44:18<2:31:27, 2.74s/it] 73%|███████▎ | 8994/12313 [6:44:21<2:32:38, 2.76s/it] {'loss': 0.6409, 'grad_norm': 7.291283308794519, 'learning_rate': 8.937867525760622e-07, 'epoch': 0.73} 73%|███████▎ | 8994/12313 [6:44:21<2:32:38, 2.76s/it] 73%|███████▎ | 8995/12313 [6:44:23<2:33:59, 2.78s/it] {'loss': 0.6233, 'grad_norm': 4.822306787639834, 'learning_rate': 8.932828735884944e-07, 'epoch': 0.73} 73%|███████▎ | 8995/12313 [6:44:23<2:33:59, 2.78s/it] 73%|███████▎ | 8996/12313 [6:44:26<2:28:53, 2.69s/it] {'loss': 0.4074, 'grad_norm': 4.5942352939506055, 'learning_rate': 8.927791057772481e-07, 'epoch': 0.73} 73%|███████▎ | 8996/12313 [6:44:26<2:28:53, 2.69s/it] 73%|███████▎ | 8997/12313 [6:44:29<2:29:25, 2.70s/it] {'loss': 0.4292, 'grad_norm': 5.662519945223263, 'learning_rate': 8.922754491771807e-07, 'epoch': 0.73} 73%|███████▎ | 8997/12313 [6:44:29<2:29:25, 2.70s/it] 73%|███████▎ | 8998/12313 [6:44:31<2:26:50, 2.66s/it] {'loss': 0.5461, 'grad_norm': 5.270491463874796, 'learning_rate': 8.917719038231437e-07, 'epoch': 0.73} 73%|███████▎ | 8998/12313 [6:44:31<2:26:50, 2.66s/it] 73%|███████▎ | 8999/12313 [6:44:34<2:35:30, 2.82s/it] {'loss': 0.4651, 'grad_norm': 4.428969136605445, 'learning_rate': 8.912684697499801e-07, 'epoch': 0.73} 73%|███████▎ | 8999/12313 [6:44:34<2:35:30, 2.82s/it] 73%|███████▎ | 9000/12313 [6:44:37<2:33:15, 2.78s/it] {'loss': 0.4143, 'grad_norm': 6.961674772191147, 'learning_rate': 8.907651469925236e-07, 'epoch': 0.73} 73%|███████▎ | 9000/12313 [6:44:37<2:33:15, 2.78s/it] 73%|███████▎ | 9001/12313 [6:44:40<2:32:35, 2.76s/it] {'loss': 0.4823, 'grad_norm': 4.726617406201288, 'learning_rate': 8.902619355856032e-07, 'epoch': 0.73} 73%|███████▎ | 9001/12313 [6:44:40<2:32:35, 2.76s/it] 73%|███████▎ | 9002/12313 [6:44:42<2:30:49, 2.73s/it] {'loss': 0.4849, 'grad_norm': 5.037522415851766, 'learning_rate': 8.897588355640371e-07, 'epoch': 0.73} 73%|███████▎ | 9002/12313 [6:44:42<2:30:49, 2.73s/it] 73%|███████▎ | 9003/12313 [6:44:45<2:29:14, 2.71s/it] {'loss': 0.4989, 'grad_norm': 4.702839968633091, 'learning_rate': 8.892558469626375e-07, 'epoch': 0.73} 73%|███████▎ | 9003/12313 [6:44:45<2:29:14, 2.71s/it] 73%|███████▎ | 9004/12313 [6:44:48<2:29:24, 2.71s/it] {'loss': 0.4176, 'grad_norm': 4.121133343260713, 'learning_rate': 8.887529698162079e-07, 'epoch': 0.73} 73%|███████▎ | 9004/12313 [6:44:48<2:29:24, 2.71s/it] 73%|███████▎ | 9005/12313 [6:44:50<2:27:20, 2.67s/it] {'loss': 0.4902, 'grad_norm': 5.799397892585551, 'learning_rate': 8.882502041595454e-07, 'epoch': 0.73} 73%|███████▎ | 9005/12313 [6:44:50<2:27:20, 2.67s/it] 73%|███████▎ | 9006/12313 [6:44:53<2:23:05, 2.60s/it] {'loss': 0.5977, 'grad_norm': 10.129599048058408, 'learning_rate': 8.877475500274393e-07, 'epoch': 0.73} 73%|███████▎ | 9006/12313 [6:44:53<2:23:05, 2.60s/it] 73%|███████▎ | 9007/12313 [6:44:55<2:22:22, 2.58s/it] {'loss': 0.4533, 'grad_norm': 6.254376447594514, 'learning_rate': 8.872450074546696e-07, 'epoch': 0.73} 73%|███████▎ | 9007/12313 [6:44:55<2:22:22, 2.58s/it] 73%|███████▎ | 9008/12313 [6:44:58<2:19:35, 2.53s/it] {'loss': 0.4203, 'grad_norm': 7.497553888308231, 'learning_rate': 8.867425764760104e-07, 'epoch': 0.73} 73%|███████▎ | 9008/12313 [6:44:58<2:19:35, 2.53s/it] 73%|███████▎ | 9009/12313 [6:45:00<2:20:31, 2.55s/it] {'loss': 0.5435, 'grad_norm': 7.377753478494686, 'learning_rate': 8.862402571262272e-07, 'epoch': 0.73} 73%|███████▎ | 9009/12313 [6:45:00<2:20:31, 2.55s/it] 73%|███████▎ | 9010/12313 [6:45:03<2:20:59, 2.56s/it] {'loss': 0.6699, 'grad_norm': 3.964499330804677, 'learning_rate': 8.857380494400764e-07, 'epoch': 0.73} 73%|███████▎ | 9010/12313 [6:45:03<2:20:59, 2.56s/it] 73%|███████▎ | 9011/12313 [6:45:06<2:21:58, 2.58s/it] {'loss': 0.4287, 'grad_norm': 4.556955979542769, 'learning_rate': 8.852359534523091e-07, 'epoch': 0.73} 73%|███████▎ | 9011/12313 [6:45:06<2:21:58, 2.58s/it] 73%|███████▎ | 9012/12313 [6:45:08<2:23:48, 2.61s/it] {'loss': 0.6336, 'grad_norm': 5.283260885622843, 'learning_rate': 8.847339691976689e-07, 'epoch': 0.73} 73%|███████▎ | 9012/12313 [6:45:08<2:23:48, 2.61s/it] 73%|███████▎ | 9013/12313 [6:45:11<2:24:48, 2.63s/it] {'loss': 0.5459, 'grad_norm': 8.239303453615642, 'learning_rate': 8.842320967108886e-07, 'epoch': 0.73} 73%|███████▎ | 9013/12313 [6:45:11<2:24:48, 2.63s/it] 73%|███████▎ | 9014/12313 [6:45:14<2:26:07, 2.66s/it] {'loss': 0.4328, 'grad_norm': 9.181392373672194, 'learning_rate': 8.837303360266966e-07, 'epoch': 0.73} 73%|███████▎ | 9014/12313 [6:45:14<2:26:07, 2.66s/it] 73%|███████▎ | 9015/12313 [6:45:16<2:24:45, 2.63s/it] {'loss': 0.4695, 'grad_norm': 9.020007347941661, 'learning_rate': 8.832286871798113e-07, 'epoch': 0.73} 73%|███████▎ | 9015/12313 [6:45:16<2:24:45, 2.63s/it] 73%|███████▎ | 9016/12313 [6:45:19<2:26:16, 2.66s/it] {'loss': 0.4455, 'grad_norm': 11.695346831333792, 'learning_rate': 8.827271502049434e-07, 'epoch': 0.73} 73%|███████▎ | 9016/12313 [6:45:19<2:26:16, 2.66s/it] 73%|███████▎ | 9017/12313 [6:45:22<2:29:35, 2.72s/it] {'loss': 0.4783, 'grad_norm': 5.166457906877146, 'learning_rate': 8.822257251367983e-07, 'epoch': 0.73} 73%|███████▎ | 9017/12313 [6:45:22<2:29:35, 2.72s/it] 73%|███████▎ | 9018/12313 [6:45:24<2:27:40, 2.69s/it] {'loss': 0.4167, 'grad_norm': 3.9452559803067575, 'learning_rate': 8.817244120100702e-07, 'epoch': 0.73} 73%|███████▎ | 9018/12313 [6:45:24<2:27:40, 2.69s/it] 73%|███████▎ | 9019/12313 [6:45:27<2:24:41, 2.64s/it] {'loss': 0.3934, 'grad_norm': 5.030514569697522, 'learning_rate': 8.812232108594482e-07, 'epoch': 0.73} 73%|███████▎ | 9019/12313 [6:45:27<2:24:41, 2.64s/it] 73%|███████▎ | 9020/12313 [6:45:30<2:29:24, 2.72s/it] {'loss': 0.5891, 'grad_norm': 10.567800558944477, 'learning_rate': 8.807221217196135e-07, 'epoch': 0.73} 73%|███████▎ | 9020/12313 [6:45:30<2:29:24, 2.72s/it] 73%|███████▎ | 9021/12313 [6:45:32<2:27:50, 2.69s/it] {'loss': 0.5597, 'grad_norm': 7.700337486933208, 'learning_rate': 8.802211446252379e-07, 'epoch': 0.73} 73%|███████▎ | 9021/12313 [6:45:32<2:27:50, 2.69s/it] 73%|███████▎ | 9022/12313 [6:45:35<2:24:29, 2.63s/it] {'loss': 0.5172, 'grad_norm': 11.713465789679601, 'learning_rate': 8.797202796109869e-07, 'epoch': 0.73} 73%|███████▎ | 9022/12313 [6:45:35<2:24:29, 2.63s/it] 73%|███████▎ | 9023/12313 [6:45:37<2:22:01, 2.59s/it] {'loss': 0.4569, 'grad_norm': 7.145291394301389, 'learning_rate': 8.792195267115163e-07, 'epoch': 0.73} 73%|███████▎ | 9023/12313 [6:45:37<2:22:01, 2.59s/it] 73%|███████▎ | 9024/12313 [6:45:40<2:18:53, 2.53s/it] {'loss': 0.6056, 'grad_norm': 4.598031815516408, 'learning_rate': 8.787188859614768e-07, 'epoch': 0.73} 73%|███████▎ | 9024/12313 [6:45:40<2:18:53, 2.53s/it] 73%|███████▎ | 9025/12313 [6:45:42<2:20:10, 2.56s/it] {'loss': 0.4652, 'grad_norm': 5.275503679314213, 'learning_rate': 8.782183573955105e-07, 'epoch': 0.73} 73%|███████▎ | 9025/12313 [6:45:42<2:20:10, 2.56s/it] 73%|███████▎ | 9026/12313 [6:45:45<2:22:22, 2.60s/it] {'loss': 0.6455, 'grad_norm': 6.295243362632932, 'learning_rate': 8.777179410482498e-07, 'epoch': 0.73} 73%|███████▎ | 9026/12313 [6:45:45<2:22:22, 2.60s/it] 73%|███████▎ | 9027/12313 [6:45:48<2:29:13, 2.72s/it] {'loss': 0.3385, 'grad_norm': 4.4474547984782395, 'learning_rate': 8.772176369543229e-07, 'epoch': 0.73} 73%|███████▎ | 9027/12313 [6:45:48<2:29:13, 2.72s/it] 73%|███████▎ | 9028/12313 [6:45:51<2:29:19, 2.73s/it] {'loss': 0.5206, 'grad_norm': 3.172334169434078, 'learning_rate': 8.767174451483468e-07, 'epoch': 0.73} 73%|███████▎ | 9028/12313 [6:45:51<2:29:19, 2.73s/it] 73%|███████▎ | 9029/12313 [6:45:54<2:29:20, 2.73s/it] {'loss': 0.6199, 'grad_norm': 4.6456662676555815, 'learning_rate': 8.762173656649317e-07, 'epoch': 0.73} 73%|███████▎ | 9029/12313 [6:45:54<2:29:20, 2.73s/it] 73%|███████▎ | 9030/12313 [6:45:57<2:31:51, 2.78s/it] {'loss': 0.4957, 'grad_norm': 7.265881502649945, 'learning_rate': 8.757173985386819e-07, 'epoch': 0.73} 73%|███████▎ | 9030/12313 [6:45:57<2:31:51, 2.78s/it] 73%|███████▎ | 9031/12313 [6:45:59<2:28:50, 2.72s/it] {'loss': 0.5664, 'grad_norm': 3.4982261171160833, 'learning_rate': 8.752175438041908e-07, 'epoch': 0.73} 73%|███████▎ | 9031/12313 [6:45:59<2:28:50, 2.72s/it] 73%|███████▎ | 9032/12313 [6:46:02<2:24:49, 2.65s/it] {'loss': 0.4881, 'grad_norm': 3.6827798694601754, 'learning_rate': 8.747178014960467e-07, 'epoch': 0.73} 73%|███████▎ | 9032/12313 [6:46:02<2:24:49, 2.65s/it] 73%|███████▎ | 9033/12313 [6:46:04<2:24:13, 2.64s/it] {'loss': 0.3276, 'grad_norm': 7.960332905031216, 'learning_rate': 8.742181716488302e-07, 'epoch': 0.73} 73%|███████▎ | 9033/12313 [6:46:04<2:24:13, 2.64s/it] 73%|███████▎ | 9034/12313 [6:46:07<2:21:41, 2.59s/it] {'loss': 0.4636, 'grad_norm': 4.4974178890676075, 'learning_rate': 8.737186542971115e-07, 'epoch': 0.73} 73%|███████▎ | 9034/12313 [6:46:07<2:21:41, 2.59s/it] 73%|███████▎ | 9035/12313 [6:46:09<2:20:20, 2.57s/it] {'loss': 0.3476, 'grad_norm': 4.122814660625721, 'learning_rate': 8.732192494754541e-07, 'epoch': 0.73} 73%|███████▎ | 9035/12313 [6:46:09<2:20:20, 2.57s/it] 73%|███████▎ | 9036/12313 [6:46:12<2:21:01, 2.58s/it] {'loss': 0.5909, 'grad_norm': 4.213370810352235, 'learning_rate': 8.727199572184161e-07, 'epoch': 0.73} 73%|███████▎ | 9036/12313 [6:46:12<2:21:01, 2.58s/it] 73%|███████▎ | 9037/12313 [6:46:14<2:19:02, 2.55s/it] {'loss': 0.4764, 'grad_norm': 10.14479503528053, 'learning_rate': 8.722207775605437e-07, 'epoch': 0.73} 73%|███████▎ | 9037/12313 [6:46:14<2:19:02, 2.55s/it] 73%|███████▎ | 9038/12313 [6:46:17<2:20:57, 2.58s/it] {'loss': 0.5015, 'grad_norm': 4.624804246473112, 'learning_rate': 8.717217105363798e-07, 'epoch': 0.73} 73%|███████▎ | 9038/12313 [6:46:17<2:20:57, 2.58s/it] 73%|███████▎ | 9039/12313 [6:46:20<2:22:30, 2.61s/it] {'loss': 0.4768, 'grad_norm': 5.183928674417439, 'learning_rate': 8.712227561804548e-07, 'epoch': 0.73} 73%|███████▎ | 9039/12313 [6:46:20<2:22:30, 2.61s/it] 73%|███████▎ | 9040/12313 [6:46:22<2:22:44, 2.62s/it] {'loss': 0.5025, 'grad_norm': 7.897979278065682, 'learning_rate': 8.707239145272958e-07, 'epoch': 0.73} 73%|███████▎ | 9040/12313 [6:46:22<2:22:44, 2.62s/it] 73%|███████▎ | 9041/12313 [6:46:25<2:21:34, 2.60s/it] {'loss': 0.4793, 'grad_norm': 16.857325758216245, 'learning_rate': 8.702251856114191e-07, 'epoch': 0.73} 73%|███████▎ | 9041/12313 [6:46:25<2:21:34, 2.60s/it] 73%|███████▎ | 9042/12313 [6:46:27<2:21:28, 2.60s/it] {'loss': 0.4738, 'grad_norm': 6.305732473052533, 'learning_rate': 8.697265694673334e-07, 'epoch': 0.73} 73%|███████▎ | 9042/12313 [6:46:27<2:21:28, 2.60s/it] 73%|███████▎ | 9043/12313 [6:46:30<2:23:17, 2.63s/it] {'loss': 0.334, 'grad_norm': 4.796761988404685, 'learning_rate': 8.692280661295419e-07, 'epoch': 0.73} 73%|███████▎ | 9043/12313 [6:46:30<2:23:17, 2.63s/it] 73%|███████▎ | 9044/12313 [6:46:33<2:21:55, 2.61s/it] {'loss': 0.5107, 'grad_norm': 5.71378461104089, 'learning_rate': 8.687296756325364e-07, 'epoch': 0.73} 73%|███████▎ | 9044/12313 [6:46:33<2:21:55, 2.61s/it] 73%|███████▎ | 9045/12313 [6:46:35<2:22:46, 2.62s/it] {'loss': 0.6141, 'grad_norm': 10.227432767121579, 'learning_rate': 8.68231398010804e-07, 'epoch': 0.73} 73%|███████▎ | 9045/12313 [6:46:35<2:22:46, 2.62s/it] 73%|███████▎ | 9046/12313 [6:46:38<2:23:55, 2.64s/it] {'loss': 0.4475, 'grad_norm': 6.276424519094259, 'learning_rate': 8.677332332988236e-07, 'epoch': 0.73} 73%|███████▎ | 9046/12313 [6:46:38<2:23:55, 2.64s/it] 73%|███████▎ | 9047/12313 [6:46:41<2:23:40, 2.64s/it] {'loss': 0.4045, 'grad_norm': 5.843493313646698, 'learning_rate': 8.672351815310651e-07, 'epoch': 0.73} 73%|███████▎ | 9047/12313 [6:46:41<2:23:40, 2.64s/it] 73%|███████▎ | 9048/12313 [6:46:43<2:22:06, 2.61s/it] {'loss': 0.4015, 'grad_norm': 7.797951199210248, 'learning_rate': 8.667372427419895e-07, 'epoch': 0.73} 73%|███████▎ | 9048/12313 [6:46:43<2:22:06, 2.61s/it] 73%|███████▎ | 9049/12313 [6:46:46<2:20:53, 2.59s/it] {'loss': 0.4516, 'grad_norm': 7.633315491295807, 'learning_rate': 8.66239416966054e-07, 'epoch': 0.73} 73%|███████▎ | 9049/12313 [6:46:46<2:20:53, 2.59s/it] 73%|███████▎ | 9050/12313 [6:46:49<2:23:53, 2.65s/it] {'loss': 0.5822, 'grad_norm': 5.0774721644669025, 'learning_rate': 8.657417042377034e-07, 'epoch': 0.73} 73%|███████▎ | 9050/12313 [6:46:49<2:23:53, 2.65s/it] 74%|███████▎ | 9051/12313 [6:46:51<2:23:56, 2.65s/it] {'loss': 0.5267, 'grad_norm': 3.3621612548870967, 'learning_rate': 8.652441045913775e-07, 'epoch': 0.74} 74%|███████▎ | 9051/12313 [6:46:51<2:23:56, 2.65s/it] 74%|███████▎ | 9052/12313 [6:46:54<2:26:20, 2.69s/it] {'loss': 0.5536, 'grad_norm': 6.871554047086195, 'learning_rate': 8.647466180615085e-07, 'epoch': 0.74} 74%|███████▎ | 9052/12313 [6:46:54<2:26:20, 2.69s/it] 74%|███████▎ | 9053/12313 [6:46:57<2:24:43, 2.66s/it] {'loss': 0.5135, 'grad_norm': 6.572776544165536, 'learning_rate': 8.642492446825193e-07, 'epoch': 0.74} 74%|███████▎ | 9053/12313 [6:46:57<2:24:43, 2.66s/it] 74%|███████▎ | 9054/12313 [6:46:59<2:24:57, 2.67s/it] {'loss': 0.4804, 'grad_norm': 5.770321723227472, 'learning_rate': 8.637519844888245e-07, 'epoch': 0.74} 74%|███████▎ | 9054/12313 [6:46:59<2:24:57, 2.67s/it] 74%|███████▎ | 9055/12313 [6:47:02<2:21:33, 2.61s/it] {'loss': 0.5766, 'grad_norm': 10.272898706445313, 'learning_rate': 8.632548375148333e-07, 'epoch': 0.74} 74%|███████▎ | 9055/12313 [6:47:02<2:21:33, 2.61s/it] 74%|███████▎ | 9056/12313 [6:47:04<2:23:48, 2.65s/it] {'loss': 0.6254, 'grad_norm': 7.89177634147062, 'learning_rate': 8.627578037949441e-07, 'epoch': 0.74} 74%|███████▎ | 9056/12313 [6:47:04<2:23:48, 2.65s/it] 74%|███████▎ | 9057/12313 [6:47:07<2:26:37, 2.70s/it] {'loss': 0.4276, 'grad_norm': 6.509795135122351, 'learning_rate': 8.62260883363551e-07, 'epoch': 0.74} 74%|███████▎ | 9057/12313 [6:47:07<2:26:37, 2.70s/it] 74%|███████▎ | 9058/12313 [6:47:10<2:28:22, 2.73s/it] {'loss': 0.4104, 'grad_norm': 5.327797962449124, 'learning_rate': 8.617640762550361e-07, 'epoch': 0.74} 74%|███████▎ | 9058/12313 [6:47:10<2:28:22, 2.73s/it] 74%|███████▎ | 9059/12313 [6:47:13<2:26:56, 2.71s/it] {'loss': 0.6383, 'grad_norm': 10.83596502585155, 'learning_rate': 8.612673825037776e-07, 'epoch': 0.74} 74%|███████▎ | 9059/12313 [6:47:13<2:26:56, 2.71s/it] 74%|███████▎ | 9060/12313 [6:47:15<2:22:19, 2.63s/it] {'loss': 0.5538, 'grad_norm': 3.2930515840707257, 'learning_rate': 8.607708021441436e-07, 'epoch': 0.74} 74%|███████▎ | 9060/12313 [6:47:15<2:22:19, 2.63s/it] 74%|███████▎ | 9061/12313 [6:47:18<2:18:17, 2.55s/it] {'loss': 0.5416, 'grad_norm': 6.123300825914036, 'learning_rate': 8.602743352104936e-07, 'epoch': 0.74} 74%|███████▎ | 9061/12313 [6:47:18<2:18:17, 2.55s/it] 74%|███████▎ | 9062/12313 [6:47:21<2:27:40, 2.73s/it] {'loss': 0.4344, 'grad_norm': 4.126241878471722, 'learning_rate': 8.597779817371824e-07, 'epoch': 0.74} 74%|███████▎ | 9062/12313 [6:47:21<2:27:40, 2.73s/it] 74%|███████▎ | 9063/12313 [6:47:23<2:24:27, 2.67s/it] {'loss': 0.5276, 'grad_norm': 4.001896575407721, 'learning_rate': 8.592817417585534e-07, 'epoch': 0.74} 74%|███████▎ | 9063/12313 [6:47:23<2:24:27, 2.67s/it] 74%|███████▎ | 9064/12313 [6:47:26<2:23:32, 2.65s/it] {'loss': 0.5128, 'grad_norm': 4.410919527265607, 'learning_rate': 8.587856153089444e-07, 'epoch': 0.74} 74%|███████▎ | 9064/12313 [6:47:26<2:23:32, 2.65s/it] 74%|███████▎ | 9065/12313 [6:47:28<2:21:46, 2.62s/it] {'loss': 0.5539, 'grad_norm': 4.6300581291782885, 'learning_rate': 8.582896024226855e-07, 'epoch': 0.74} 74%|███████▎ | 9065/12313 [6:47:28<2:21:46, 2.62s/it] 74%|███████▎ | 9066/12313 [6:47:31<2:22:18, 2.63s/it] {'loss': 0.4621, 'grad_norm': 5.804508870075955, 'learning_rate': 8.577937031340975e-07, 'epoch': 0.74} 74%|███████▎ | 9066/12313 [6:47:31<2:22:18, 2.63s/it] 74%|███████▎ | 9067/12313 [6:47:34<2:25:55, 2.70s/it] {'loss': 0.4759, 'grad_norm': 3.9641442301684653, 'learning_rate': 8.572979174774934e-07, 'epoch': 0.74} 74%|███████▎ | 9067/12313 [6:47:34<2:25:55, 2.70s/it] 74%|███████▎ | 9068/12313 [6:47:37<2:25:24, 2.69s/it] {'loss': 0.5874, 'grad_norm': 3.935697034995573, 'learning_rate': 8.568022454871802e-07, 'epoch': 0.74} 74%|███████▎ | 9068/12313 [6:47:37<2:25:24, 2.69s/it] 74%|███████▎ | 9069/12313 [6:47:39<2:24:13, 2.67s/it] {'loss': 0.4736, 'grad_norm': 5.777703937802822, 'learning_rate': 8.563066871974543e-07, 'epoch': 0.74} 74%|███████▎ | 9069/12313 [6:47:39<2:24:13, 2.67s/it] 74%|███████▎ | 9070/12313 [6:47:42<2:30:33, 2.79s/it] {'loss': 0.4482, 'grad_norm': 3.7700387674835176, 'learning_rate': 8.558112426426062e-07, 'epoch': 0.74} 74%|███████▎ | 9070/12313 [6:47:42<2:30:33, 2.79s/it] 74%|███████▎ | 9071/12313 [6:47:45<2:29:09, 2.76s/it] {'loss': 0.3638, 'grad_norm': 4.042064333038825, 'learning_rate': 8.553159118569196e-07, 'epoch': 0.74} 74%|███████▎ | 9071/12313 [6:47:45<2:29:09, 2.76s/it] 74%|███████▎ | 9072/12313 [6:47:48<2:26:44, 2.72s/it] {'loss': 0.359, 'grad_norm': 5.379208558888325, 'learning_rate': 8.548206948746673e-07, 'epoch': 0.74} 74%|███████▎ | 9072/12313 [6:47:48<2:26:44, 2.72s/it] 74%|███████▎ | 9073/12313 [6:47:50<2:24:35, 2.68s/it] {'loss': 0.5691, 'grad_norm': 5.80342467828285, 'learning_rate': 8.543255917301163e-07, 'epoch': 0.74} 74%|███████▎ | 9073/12313 [6:47:50<2:24:35, 2.68s/it] 74%|███████▎ | 9074/12313 [6:47:53<2:25:22, 2.69s/it] {'loss': 0.5253, 'grad_norm': 9.897018766300208, 'learning_rate': 8.538306024575235e-07, 'epoch': 0.74} 74%|███████▎ | 9074/12313 [6:47:53<2:25:22, 2.69s/it] 74%|███████▎ | 9075/12313 [6:47:56<2:29:45, 2.78s/it] {'loss': 0.4075, 'grad_norm': 5.1025049749314055, 'learning_rate': 8.533357270911419e-07, 'epoch': 0.74} 74%|███████▎ | 9075/12313 [6:47:56<2:29:45, 2.78s/it] 74%|███████▎ | 9076/12313 [6:47:58<2:26:12, 2.71s/it] {'loss': 0.4842, 'grad_norm': 8.535318151291682, 'learning_rate': 8.52840965665212e-07, 'epoch': 0.74} 74%|███████▎ | 9076/12313 [6:47:58<2:26:12, 2.71s/it] 74%|███████▎ | 9077/12313 [6:48:01<2:26:16, 2.71s/it] {'loss': 0.4553, 'grad_norm': 8.277801039325004, 'learning_rate': 8.523463182139699e-07, 'epoch': 0.74} 74%|███████▎ | 9077/12313 [6:48:01<2:26:16, 2.71s/it] 74%|███████▎ | 9078/12313 [6:48:04<2:25:17, 2.69s/it] {'loss': 0.627, 'grad_norm': 4.521612840769156, 'learning_rate': 8.518517847716435e-07, 'epoch': 0.74} 74%|███████▎ | 9078/12313 [6:48:04<2:25:17, 2.69s/it] 74%|███████▎ | 9079/12313 [6:48:06<2:23:25, 2.66s/it] {'loss': 0.5943, 'grad_norm': 3.8960485335075603, 'learning_rate': 8.513573653724508e-07, 'epoch': 0.74} 74%|███████▎ | 9079/12313 [6:48:06<2:23:25, 2.66s/it] 74%|███████▎ | 9080/12313 [6:48:09<2:20:21, 2.60s/it] {'loss': 0.6701, 'grad_norm': 4.606452564353779, 'learning_rate': 8.508630600506021e-07, 'epoch': 0.74} 74%|███████▎ | 9080/12313 [6:48:09<2:20:21, 2.60s/it] 74%|███████▍ | 9081/12313 [6:48:11<2:21:34, 2.63s/it] {'loss': 0.5393, 'grad_norm': 7.052763751543803, 'learning_rate': 8.503688688403028e-07, 'epoch': 0.74} 74%|███████▍ | 9081/12313 [6:48:11<2:21:34, 2.63s/it] 74%|███████▍ | 9082/12313 [6:48:14<2:22:54, 2.65s/it] {'loss': 0.434, 'grad_norm': 4.538323480054813, 'learning_rate': 8.498747917757464e-07, 'epoch': 0.74} 74%|███████▍ | 9082/12313 [6:48:14<2:22:54, 2.65s/it] 74%|███████▍ | 9083/12313 [6:48:17<2:24:42, 2.69s/it] {'loss': 0.561, 'grad_norm': 3.679103857972724, 'learning_rate': 8.49380828891121e-07, 'epoch': 0.74} 74%|███████▍ | 9083/12313 [6:48:17<2:24:42, 2.69s/it] 74%|███████▍ | 9084/12313 [6:48:20<2:23:53, 2.67s/it] {'loss': 0.4462, 'grad_norm': 9.599527757034656, 'learning_rate': 8.488869802206073e-07, 'epoch': 0.74} 74%|███████▍ | 9084/12313 [6:48:20<2:23:53, 2.67s/it] 74%|███████▍ | 9085/12313 [6:48:23<2:27:21, 2.74s/it] {'loss': 0.5055, 'grad_norm': 5.51962161938916, 'learning_rate': 8.483932457983765e-07, 'epoch': 0.74} 74%|███████▍ | 9085/12313 [6:48:23<2:27:21, 2.74s/it] 74%|███████▍ | 9086/12313 [6:48:26<2:32:33, 2.84s/it] {'loss': 0.3904, 'grad_norm': 7.720325892853574, 'learning_rate': 8.478996256585909e-07, 'epoch': 0.74} 74%|███████▍ | 9086/12313 [6:48:26<2:32:33, 2.84s/it] 74%|███████▍ | 9087/12313 [6:48:28<2:30:35, 2.80s/it] {'loss': 0.3425, 'grad_norm': 4.8150352416660125, 'learning_rate': 8.474061198354086e-07, 'epoch': 0.74} 74%|███████▍ | 9087/12313 [6:48:28<2:30:35, 2.80s/it] 74%|███████▍ | 9088/12313 [6:48:31<2:26:53, 2.73s/it] {'loss': 0.4812, 'grad_norm': 4.123061421099957, 'learning_rate': 8.469127283629766e-07, 'epoch': 0.74} 74%|███████▍ | 9088/12313 [6:48:31<2:26:53, 2.73s/it] 74%|███████▍ | 9089/12313 [6:48:33<2:22:24, 2.65s/it] {'loss': 0.6729, 'grad_norm': 4.2839585199281975, 'learning_rate': 8.464194512754339e-07, 'epoch': 0.74} 74%|███████▍ | 9089/12313 [6:48:33<2:22:24, 2.65s/it] 74%|███████▍ | 9090/12313 [6:48:36<2:17:42, 2.56s/it] {'loss': 0.4227, 'grad_norm': 10.07664977406758, 'learning_rate': 8.459262886069139e-07, 'epoch': 0.74} 74%|███████▍ | 9090/12313 [6:48:36<2:17:42, 2.56s/it] 74%|███████▍ | 9091/12313 [6:48:39<2:22:37, 2.66s/it] {'loss': 0.4117, 'grad_norm': 5.667705109796953, 'learning_rate': 8.454332403915416e-07, 'epoch': 0.74} 74%|███████▍ | 9091/12313 [6:48:39<2:22:37, 2.66s/it] 74%|███████▍ | 9092/12313 [6:48:41<2:21:19, 2.63s/it] {'loss': 0.4129, 'grad_norm': 5.360148423288729, 'learning_rate': 8.44940306663432e-07, 'epoch': 0.74} 74%|███████▍ | 9092/12313 [6:48:41<2:21:19, 2.63s/it] 74%|███████▍ | 9093/12313 [6:48:44<2:21:32, 2.64s/it] {'loss': 0.43, 'grad_norm': 4.840577845593679, 'learning_rate': 8.444474874566935e-07, 'epoch': 0.74} 74%|███████▍ | 9093/12313 [6:48:44<2:21:32, 2.64s/it] 74%|███████▍ | 9094/12313 [6:48:47<2:23:25, 2.67s/it] {'loss': 0.5745, 'grad_norm': 5.228586184130316, 'learning_rate': 8.439547828054276e-07, 'epoch': 0.74} 74%|███████▍ | 9094/12313 [6:48:47<2:23:25, 2.67s/it] 74%|███████▍ | 9095/12313 [6:48:49<2:20:50, 2.63s/it] {'loss': 0.6014, 'grad_norm': 5.804073223258344, 'learning_rate': 8.434621927437253e-07, 'epoch': 0.74} 74%|███████▍ | 9095/12313 [6:48:49<2:20:50, 2.63s/it] 74%|███████▍ | 9096/12313 [6:48:52<2:22:28, 2.66s/it] {'loss': 0.4756, 'grad_norm': 6.711353565911964, 'learning_rate': 8.429697173056726e-07, 'epoch': 0.74} 74%|███████▍ | 9096/12313 [6:48:52<2:22:28, 2.66s/it] 74%|███████▍ | 9097/12313 [6:48:55<2:24:37, 2.70s/it] {'loss': 0.5374, 'grad_norm': 5.514320352974961, 'learning_rate': 8.42477356525346e-07, 'epoch': 0.74} 74%|███████▍ | 9097/12313 [6:48:55<2:24:37, 2.70s/it] 74%|███████▍ | 9098/12313 [6:48:57<2:24:53, 2.70s/it] {'loss': 0.3875, 'grad_norm': 3.835666081477301, 'learning_rate': 8.419851104368143e-07, 'epoch': 0.74} 74%|███████▍ | 9098/12313 [6:48:57<2:24:53, 2.70s/it] 74%|███████▍ | 9099/12313 [6:49:00<2:21:15, 2.64s/it] {'loss': 0.4524, 'grad_norm': 12.210910906950325, 'learning_rate': 8.414929790741371e-07, 'epoch': 0.74} 74%|███████▍ | 9099/12313 [6:49:00<2:21:15, 2.64s/it] 74%|███████▍ | 9100/12313 [6:49:02<2:20:19, 2.62s/it] {'loss': 0.5712, 'grad_norm': 5.875608947253129, 'learning_rate': 8.410009624713691e-07, 'epoch': 0.74} 74%|███████▍ | 9100/12313 [6:49:02<2:20:19, 2.62s/it] 74%|███████▍ | 9101/12313 [6:49:05<2:26:06, 2.73s/it] {'loss': 0.6175, 'grad_norm': 4.500274285625198, 'learning_rate': 8.405090606625547e-07, 'epoch': 0.74} 74%|███████▍ | 9101/12313 [6:49:05<2:26:06, 2.73s/it] 74%|███████▍ | 9102/12313 [6:49:08<2:23:36, 2.68s/it] {'loss': 0.5565, 'grad_norm': 7.944159885650859, 'learning_rate': 8.400172736817294e-07, 'epoch': 0.74} 74%|███████▍ | 9102/12313 [6:49:08<2:23:36, 2.68s/it] 74%|███████▍ | 9103/12313 [6:49:10<2:20:29, 2.63s/it] {'loss': 0.4571, 'grad_norm': 10.979049183670785, 'learning_rate': 8.395256015629233e-07, 'epoch': 0.74} 74%|███████▍ | 9103/12313 [6:49:10<2:20:29, 2.63s/it] 74%|███████▍ | 9104/12313 [6:49:13<2:24:17, 2.70s/it] {'loss': 0.4318, 'grad_norm': 6.5837339811285815, 'learning_rate': 8.390340443401588e-07, 'epoch': 0.74} 74%|███████▍ | 9104/12313 [6:49:13<2:24:17, 2.70s/it] 74%|███████▍ | 9105/12313 [6:49:16<2:21:07, 2.64s/it] {'loss': 0.4703, 'grad_norm': 6.045591348963159, 'learning_rate': 8.385426020474468e-07, 'epoch': 0.74} 74%|███████▍ | 9105/12313 [6:49:16<2:21:07, 2.64s/it] 74%|███████▍ | 9106/12313 [6:49:19<2:24:45, 2.71s/it] {'loss': 0.3586, 'grad_norm': 11.087294536746336, 'learning_rate': 8.380512747187944e-07, 'epoch': 0.74} 74%|███████▍ | 9106/12313 [6:49:19<2:24:45, 2.71s/it] 74%|███████▍ | 9107/12313 [6:49:21<2:24:22, 2.70s/it] {'loss': 0.3442, 'grad_norm': 5.179365997405158, 'learning_rate': 8.375600623881983e-07, 'epoch': 0.74} 74%|███████▍ | 9107/12313 [6:49:21<2:24:22, 2.70s/it] 74%|███████▍ | 9108/12313 [6:49:24<2:22:21, 2.66s/it] {'loss': 0.4803, 'grad_norm': 5.5937991392764355, 'learning_rate': 8.370689650896465e-07, 'epoch': 0.74} 74%|███████▍ | 9108/12313 [6:49:24<2:22:21, 2.66s/it] 74%|███████▍ | 9109/12313 [6:49:27<2:26:09, 2.74s/it] {'loss': 0.5371, 'grad_norm': 5.345681951321185, 'learning_rate': 8.365779828571214e-07, 'epoch': 0.74} 74%|███████▍ | 9109/12313 [6:49:27<2:26:09, 2.74s/it] 74%|███████▍ | 9110/12313 [6:49:29<2:24:51, 2.71s/it] {'loss': 0.4857, 'grad_norm': 10.438164567037536, 'learning_rate': 8.360871157245973e-07, 'epoch': 0.74} 74%|███████▍ | 9110/12313 [6:49:29<2:24:51, 2.71s/it] 74%|███████▍ | 9111/12313 [6:49:32<2:27:26, 2.76s/it] {'loss': 0.4761, 'grad_norm': 12.570837392968885, 'learning_rate': 8.355963637260387e-07, 'epoch': 0.74} 74%|███████▍ | 9111/12313 [6:49:32<2:27:26, 2.76s/it] 74%|███████▍ | 9112/12313 [6:49:35<2:29:42, 2.81s/it] {'loss': 0.6403, 'grad_norm': 5.293881936064758, 'learning_rate': 8.351057268954019e-07, 'epoch': 0.74} 74%|███████▍ | 9112/12313 [6:49:35<2:29:42, 2.81s/it] 74%|███████▍ | 9113/12313 [6:49:38<2:31:35, 2.84s/it] {'loss': 0.4348, 'grad_norm': 8.074420587248548, 'learning_rate': 8.346152052666385e-07, 'epoch': 0.74} 74%|███████▍ | 9113/12313 [6:49:38<2:31:35, 2.84s/it] 74%|███████▍ | 9114/12313 [6:49:41<2:28:47, 2.79s/it] {'loss': 0.5021, 'grad_norm': 4.833671703517281, 'learning_rate': 8.341247988736889e-07, 'epoch': 0.74} 74%|███████▍ | 9114/12313 [6:49:41<2:28:47, 2.79s/it] 74%|███████▍ | 9115/12313 [6:49:44<2:30:43, 2.83s/it] {'loss': 0.4408, 'grad_norm': 5.173257511917773, 'learning_rate': 8.336345077504851e-07, 'epoch': 0.74} 74%|███████▍ | 9115/12313 [6:49:44<2:30:43, 2.83s/it] 74%|███████▍ | 9116/12313 [6:49:46<2:27:06, 2.76s/it] {'loss': 0.4978, 'grad_norm': 4.696108390642895, 'learning_rate': 8.331443319309557e-07, 'epoch': 0.74} 74%|███████▍ | 9116/12313 [6:49:46<2:27:06, 2.76s/it] 74%|███████▍ | 9117/12313 [6:49:49<2:23:37, 2.70s/it] {'loss': 0.4354, 'grad_norm': 6.333906802407053, 'learning_rate': 8.326542714490172e-07, 'epoch': 0.74} 74%|███████▍ | 9117/12313 [6:49:49<2:23:37, 2.70s/it] 74%|███████▍ | 9118/12313 [6:49:51<2:21:06, 2.65s/it] {'loss': 0.4368, 'grad_norm': 4.361831325871427, 'learning_rate': 8.321643263385776e-07, 'epoch': 0.74} 74%|███████▍ | 9118/12313 [6:49:51<2:21:06, 2.65s/it] 74%|███████▍ | 9119/12313 [6:49:54<2:19:17, 2.62s/it] {'loss': 0.6296, 'grad_norm': 4.6714291210784085, 'learning_rate': 8.316744966335408e-07, 'epoch': 0.74} 74%|███████▍ | 9119/12313 [6:49:54<2:19:17, 2.62s/it] 74%|███████▍ | 9120/12313 [6:49:57<2:19:48, 2.63s/it] {'loss': 0.5809, 'grad_norm': 4.933127806189035, 'learning_rate': 8.31184782367799e-07, 'epoch': 0.74} 74%|███████▍ | 9120/12313 [6:49:57<2:19:48, 2.63s/it] 74%|███████▍ | 9121/12313 [6:49:59<2:18:31, 2.60s/it] {'loss': 0.6013, 'grad_norm': 4.025591077863814, 'learning_rate': 8.306951835752378e-07, 'epoch': 0.74} 74%|███████▍ | 9121/12313 [6:49:59<2:18:31, 2.60s/it] 74%|███████▍ | 9122/12313 [6:50:02<2:17:59, 2.59s/it] {'loss': 0.5021, 'grad_norm': 11.41144080285039, 'learning_rate': 8.302057002897349e-07, 'epoch': 0.74} 74%|███████▍ | 9122/12313 [6:50:02<2:17:59, 2.59s/it] 74%|███████▍ | 9123/12313 [6:50:04<2:19:06, 2.62s/it] {'loss': 0.5904, 'grad_norm': 4.627955700774956, 'learning_rate': 8.297163325451612e-07, 'epoch': 0.74} 74%|███████▍ | 9123/12313 [6:50:04<2:19:06, 2.62s/it] 74%|███████▍ | 9124/12313 [6:50:07<2:19:21, 2.62s/it] {'loss': 0.457, 'grad_norm': 7.878246043105683, 'learning_rate': 8.292270803753765e-07, 'epoch': 0.74} 74%|███████▍ | 9124/12313 [6:50:07<2:19:21, 2.62s/it] 74%|███████▍ | 9125/12313 [6:50:10<2:20:15, 2.64s/it] {'loss': 0.5311, 'grad_norm': 4.436045720835732, 'learning_rate': 8.287379438142365e-07, 'epoch': 0.74} 74%|███████▍ | 9125/12313 [6:50:10<2:20:15, 2.64s/it] 74%|███████▍ | 9126/12313 [6:50:12<2:21:27, 2.66s/it] {'loss': 0.4093, 'grad_norm': 5.747604125886795, 'learning_rate': 8.282489228955856e-07, 'epoch': 0.74} 74%|███████▍ | 9126/12313 [6:50:12<2:21:27, 2.66s/it] 74%|███████▍ | 9127/12313 [6:50:15<2:22:42, 2.69s/it] {'loss': 0.4506, 'grad_norm': 26.14780023838032, 'learning_rate': 8.277600176532608e-07, 'epoch': 0.74} 74%|███████▍ | 9127/12313 [6:50:15<2:22:42, 2.69s/it] 74%|███████▍ | 9128/12313 [6:50:18<2:18:30, 2.61s/it] {'loss': 0.5179, 'grad_norm': 5.408380773620181, 'learning_rate': 8.272712281210926e-07, 'epoch': 0.74} 74%|███████▍ | 9128/12313 [6:50:18<2:18:30, 2.61s/it] 74%|███████▍ | 9129/12313 [6:50:20<2:20:57, 2.66s/it] {'loss': 0.5395, 'grad_norm': 3.881322907472951, 'learning_rate': 8.267825543329033e-07, 'epoch': 0.74} 74%|███████▍ | 9129/12313 [6:50:20<2:20:57, 2.66s/it] 74%|███████▍ | 9130/12313 [6:50:23<2:23:44, 2.71s/it] {'loss': 0.4454, 'grad_norm': 4.612767077216341, 'learning_rate': 8.262939963225058e-07, 'epoch': 0.74} 74%|███████▍ | 9130/12313 [6:50:23<2:23:44, 2.71s/it] 74%|███████▍ | 9131/12313 [6:50:26<2:25:42, 2.75s/it] {'loss': 0.4422, 'grad_norm': 20.881362818292562, 'learning_rate': 8.258055541237054e-07, 'epoch': 0.74} 74%|███████▍ | 9131/12313 [6:50:26<2:25:42, 2.75s/it] 74%|███████▍ | 9132/12313 [6:50:29<2:25:02, 2.74s/it] {'loss': 0.4213, 'grad_norm': 10.331638585893913, 'learning_rate': 8.253172277703006e-07, 'epoch': 0.74} 74%|███████▍ | 9132/12313 [6:50:29<2:25:02, 2.74s/it] 74%|███████▍ | 9133/12313 [6:50:31<2:22:29, 2.69s/it] {'loss': 0.4839, 'grad_norm': 7.471565014697083, 'learning_rate': 8.248290172960804e-07, 'epoch': 0.74} 74%|███████▍ | 9133/12313 [6:50:31<2:22:29, 2.69s/it] 74%|███████▍ | 9134/12313 [6:50:34<2:22:32, 2.69s/it] {'loss': 0.4021, 'grad_norm': 6.969777149983467, 'learning_rate': 8.24340922734826e-07, 'epoch': 0.74} 74%|███████▍ | 9134/12313 [6:50:34<2:22:32, 2.69s/it] 74%|███████▍ | 9135/12313 [6:50:37<2:19:53, 2.64s/it] {'loss': 0.3838, 'grad_norm': 6.91819887817937, 'learning_rate': 8.238529441203111e-07, 'epoch': 0.74} 74%|███████▍ | 9135/12313 [6:50:37<2:19:53, 2.64s/it] 74%|███████▍ | 9136/12313 [6:50:40<2:25:17, 2.74s/it] {'loss': 0.4054, 'grad_norm': 3.8948651109688703, 'learning_rate': 8.233650814863026e-07, 'epoch': 0.74} 74%|███████▍ | 9136/12313 [6:50:40<2:25:17, 2.74s/it] 74%|███████▍ | 9137/12313 [6:50:42<2:24:35, 2.73s/it] {'loss': 0.4144, 'grad_norm': 4.018293826197548, 'learning_rate': 8.228773348665561e-07, 'epoch': 0.74} 74%|███████▍ | 9137/12313 [6:50:42<2:24:35, 2.73s/it] 74%|███████▍ | 9138/12313 [6:50:45<2:24:44, 2.74s/it] {'loss': 0.4907, 'grad_norm': 3.9662330274627355, 'learning_rate': 8.223897042948228e-07, 'epoch': 0.74} 74%|███████▍ | 9138/12313 [6:50:45<2:24:44, 2.74s/it] 74%|███████▍ | 9139/12313 [6:50:48<2:23:36, 2.71s/it] {'loss': 0.5779, 'grad_norm': 10.723781510737455, 'learning_rate': 8.219021898048435e-07, 'epoch': 0.74} 74%|███████▍ | 9139/12313 [6:50:48<2:23:36, 2.71s/it] 74%|███████▍ | 9140/12313 [6:50:50<2:23:56, 2.72s/it] {'loss': 0.4803, 'grad_norm': 4.79787389560584, 'learning_rate': 8.214147914303505e-07, 'epoch': 0.74} 74%|███████▍ | 9140/12313 [6:50:50<2:23:56, 2.72s/it] 74%|███████▍ | 9141/12313 [6:50:53<2:26:09, 2.76s/it] {'loss': 0.5073, 'grad_norm': 3.222506085754893, 'learning_rate': 8.209275092050701e-07, 'epoch': 0.74} 74%|███████▍ | 9141/12313 [6:50:53<2:26:09, 2.76s/it] 74%|███████▍ | 9142/12313 [6:50:56<2:25:00, 2.74s/it] {'loss': 0.4199, 'grad_norm': 7.127778601731678, 'learning_rate': 8.204403431627206e-07, 'epoch': 0.74} 74%|███████▍ | 9142/12313 [6:50:56<2:25:00, 2.74s/it] 74%|███████▍ | 9143/12313 [6:50:59<2:21:34, 2.68s/it] {'loss': 0.3786, 'grad_norm': 6.285963377503724, 'learning_rate': 8.199532933370094e-07, 'epoch': 0.74} 74%|███████▍ | 9143/12313 [6:50:59<2:21:34, 2.68s/it] 74%|███████▍ | 9144/12313 [6:51:01<2:18:01, 2.61s/it] {'loss': 0.5224, 'grad_norm': 8.21223183631982, 'learning_rate': 8.194663597616398e-07, 'epoch': 0.74} 74%|███████▍ | 9144/12313 [6:51:01<2:18:01, 2.61s/it] 74%|███████▍ | 9145/12313 [6:51:04<2:19:51, 2.65s/it] {'loss': 0.5302, 'grad_norm': 9.976782891147915, 'learning_rate': 8.18979542470304e-07, 'epoch': 0.74} 74%|███████▍ | 9145/12313 [6:51:04<2:19:51, 2.65s/it] 74%|███████▍ | 9146/12313 [6:51:06<2:19:14, 2.64s/it] {'loss': 0.4914, 'grad_norm': 6.028309491334264, 'learning_rate': 8.184928414966873e-07, 'epoch': 0.74} 74%|███████▍ | 9146/12313 [6:51:06<2:19:14, 2.64s/it] 74%|███████▍ | 9147/12313 [6:51:09<2:22:09, 2.69s/it] {'loss': 0.5282, 'grad_norm': 13.676857941967034, 'learning_rate': 8.180062568744657e-07, 'epoch': 0.74} 74%|███████▍ | 9147/12313 [6:51:09<2:22:09, 2.69s/it] 74%|███████▍ | 9148/12313 [6:51:12<2:20:41, 2.67s/it] {'loss': 0.4902, 'grad_norm': 17.032146668354567, 'learning_rate': 8.175197886373093e-07, 'epoch': 0.74} 74%|███████▍ | 9148/12313 [6:51:12<2:20:41, 2.67s/it] 74%|███████▍ | 9149/12313 [6:51:15<2:22:46, 2.71s/it] {'loss': 0.3934, 'grad_norm': 5.4622799815547936, 'learning_rate': 8.170334368188798e-07, 'epoch': 0.74} 74%|███████▍ | 9149/12313 [6:51:15<2:22:46, 2.71s/it] 74%|███████▍ | 9150/12313 [6:51:17<2:21:34, 2.69s/it] {'loss': 0.5146, 'grad_norm': 8.517487460268729, 'learning_rate': 8.16547201452829e-07, 'epoch': 0.74} 74%|███████▍ | 9150/12313 [6:51:17<2:21:34, 2.69s/it] 74%|███████▍ | 9151/12313 [6:51:20<2:17:44, 2.61s/it] {'loss': 0.5049, 'grad_norm': 5.038501260406377, 'learning_rate': 8.160610825728029e-07, 'epoch': 0.74} 74%|███████▍ | 9151/12313 [6:51:20<2:17:44, 2.61s/it] 74%|███████▍ | 9152/12313 [6:51:23<2:30:26, 2.86s/it] {'loss': 0.5832, 'grad_norm': 4.455377373168075, 'learning_rate': 8.155750802124379e-07, 'epoch': 0.74} 74%|███████▍ | 9152/12313 [6:51:23<2:30:26, 2.86s/it] 74%|███████▍ | 9153/12313 [6:51:25<2:23:05, 2.72s/it] {'loss': 0.4169, 'grad_norm': 3.9339441762573863, 'learning_rate': 8.150891944053615e-07, 'epoch': 0.74} 74%|███████▍ | 9153/12313 [6:51:25<2:23:05, 2.72s/it] 74%|███████▍ | 9154/12313 [6:51:28<2:26:02, 2.77s/it] {'loss': 0.5344, 'grad_norm': 3.517549583636786, 'learning_rate': 8.146034251851959e-07, 'epoch': 0.74} 74%|███████▍ | 9154/12313 [6:51:28<2:26:02, 2.77s/it] 74%|███████▍ | 9155/12313 [6:51:31<2:25:26, 2.76s/it] {'loss': 0.5208, 'grad_norm': 4.5372086466348085, 'learning_rate': 8.141177725855543e-07, 'epoch': 0.74} 74%|███████▍ | 9155/12313 [6:51:31<2:25:26, 2.76s/it] 74%|███████▍ | 9156/12313 [6:51:34<2:23:28, 2.73s/it] {'loss': 0.4537, 'grad_norm': 4.458967507222066, 'learning_rate': 8.136322366400396e-07, 'epoch': 0.74} 74%|███████▍ | 9156/12313 [6:51:34<2:23:28, 2.73s/it] 74%|███████▍ | 9157/12313 [6:51:36<2:21:38, 2.69s/it] {'loss': 0.3824, 'grad_norm': 3.5505929046873104, 'learning_rate': 8.131468173822499e-07, 'epoch': 0.74} 74%|███████▍ | 9157/12313 [6:51:36<2:21:38, 2.69s/it] 74%|███████▍ | 9158/12313 [6:51:39<2:18:19, 2.63s/it] {'loss': 0.4316, 'grad_norm': 4.049572608845729, 'learning_rate': 8.126615148457728e-07, 'epoch': 0.74} 74%|███████▍ | 9158/12313 [6:51:39<2:18:19, 2.63s/it] 74%|███████▍ | 9159/12313 [6:51:42<2:20:17, 2.67s/it] {'loss': 0.3844, 'grad_norm': 6.329253089634774, 'learning_rate': 8.121763290641879e-07, 'epoch': 0.74} 74%|███████▍ | 9159/12313 [6:51:42<2:20:17, 2.67s/it] 74%|███████▍ | 9160/12313 [6:51:44<2:18:20, 2.63s/it] {'loss': 0.5157, 'grad_norm': 5.7289617458205235, 'learning_rate': 8.116912600710694e-07, 'epoch': 0.74} 74%|███████▍ | 9160/12313 [6:51:44<2:18:20, 2.63s/it] 74%|███████▍ | 9161/12313 [6:51:47<2:18:41, 2.64s/it] {'loss': 0.4108, 'grad_norm': 7.461558435663731, 'learning_rate': 8.112063078999794e-07, 'epoch': 0.74} 74%|███████▍ | 9161/12313 [6:51:47<2:18:41, 2.64s/it] 74%|███████▍ | 9162/12313 [6:51:49<2:18:56, 2.65s/it] {'loss': 0.4063, 'grad_norm': 6.260601884198078, 'learning_rate': 8.107214725844753e-07, 'epoch': 0.74} 74%|███████▍ | 9162/12313 [6:51:49<2:18:56, 2.65s/it] 74%|███████▍ | 9163/12313 [6:51:52<2:18:58, 2.65s/it] {'loss': 0.375, 'grad_norm': 7.096380076386157, 'learning_rate': 8.102367541581055e-07, 'epoch': 0.74} 74%|███████▍ | 9163/12313 [6:51:52<2:18:58, 2.65s/it] 74%|███████▍ | 9164/12313 [6:51:55<2:18:01, 2.63s/it] {'loss': 0.5004, 'grad_norm': 4.583892134072468, 'learning_rate': 8.097521526544094e-07, 'epoch': 0.74} 74%|███████▍ | 9164/12313 [6:51:55<2:18:01, 2.63s/it] 74%|███████▍ | 9165/12313 [6:51:57<2:14:31, 2.56s/it] {'loss': 0.4534, 'grad_norm': 4.285669547146368, 'learning_rate': 8.092676681069189e-07, 'epoch': 0.74} 74%|███████▍ | 9165/12313 [6:51:57<2:14:31, 2.56s/it] 74%|███████▍ | 9166/12313 [6:51:59<2:11:30, 2.51s/it] {'loss': 0.6221, 'grad_norm': 5.84102693467306, 'learning_rate': 8.087833005491568e-07, 'epoch': 0.74} 74%|███████▍ | 9166/12313 [6:51:59<2:11:30, 2.51s/it] 74%|███████▍ | 9167/12313 [6:52:02<2:17:05, 2.61s/it] {'loss': 0.5261, 'grad_norm': 4.581219331241057, 'learning_rate': 8.082990500146398e-07, 'epoch': 0.74} 74%|███████▍ | 9167/12313 [6:52:02<2:17:05, 2.61s/it] 74%|███████▍ | 9168/12313 [6:52:05<2:18:23, 2.64s/it] {'loss': 0.4019, 'grad_norm': 6.253344718728393, 'learning_rate': 8.078149165368762e-07, 'epoch': 0.74} 74%|███████▍ | 9168/12313 [6:52:05<2:18:23, 2.64s/it] 74%|███████▍ | 9169/12313 [6:52:08<2:20:30, 2.68s/it] {'loss': 0.5145, 'grad_norm': 4.961021707376709, 'learning_rate': 8.073309001493637e-07, 'epoch': 0.74} 74%|███████▍ | 9169/12313 [6:52:08<2:20:30, 2.68s/it] 74%|███████▍ | 9170/12313 [6:52:10<2:16:32, 2.61s/it] {'loss': 0.4394, 'grad_norm': 60.59463933382603, 'learning_rate': 8.068470008855953e-07, 'epoch': 0.74} 74%|███████▍ | 9170/12313 [6:52:10<2:16:32, 2.61s/it] 74%|███████▍ | 9171/12313 [6:52:13<2:16:25, 2.61s/it] {'loss': 0.4144, 'grad_norm': 5.570185363264054, 'learning_rate': 8.063632187790538e-07, 'epoch': 0.74} 74%|███████▍ | 9171/12313 [6:52:13<2:16:25, 2.61s/it] 74%|███████▍ | 9172/12313 [6:52:16<2:18:03, 2.64s/it] {'loss': 0.4612, 'grad_norm': 6.346150984764303, 'learning_rate': 8.05879553863213e-07, 'epoch': 0.74} 74%|███████▍ | 9172/12313 [6:52:16<2:18:03, 2.64s/it] 74%|███████▍ | 9173/12313 [6:52:18<2:13:49, 2.56s/it] {'loss': 0.5154, 'grad_norm': 5.7858397897115745, 'learning_rate': 8.053960061715421e-07, 'epoch': 0.74} 74%|███████▍ | 9173/12313 [6:52:18<2:13:49, 2.56s/it] 75%|███████▍ | 9174/12313 [6:52:21<2:21:28, 2.70s/it] {'loss': 0.4543, 'grad_norm': 9.04581065503539, 'learning_rate': 8.049125757374978e-07, 'epoch': 0.75} 75%|███████▍ | 9174/12313 [6:52:21<2:21:28, 2.70s/it] 75%|███████▍ | 9175/12313 [6:52:23<2:17:26, 2.63s/it] {'loss': 0.5, 'grad_norm': 4.5831119885995575, 'learning_rate': 8.044292625945327e-07, 'epoch': 0.75} 75%|███████▍ | 9175/12313 [6:52:23<2:17:26, 2.63s/it] 75%|███████▍ | 9176/12313 [6:52:26<2:16:19, 2.61s/it] {'loss': 0.6158, 'grad_norm': 4.73034396532834, 'learning_rate': 8.039460667760892e-07, 'epoch': 0.75} 75%|███████▍ | 9176/12313 [6:52:26<2:16:19, 2.61s/it] 75%|███████▍ | 9177/12313 [6:52:29<2:17:02, 2.62s/it] {'loss': 0.4434, 'grad_norm': 7.297415511486316, 'learning_rate': 8.034629883156019e-07, 'epoch': 0.75} 75%|███████▍ | 9177/12313 [6:52:29<2:17:02, 2.62s/it] 75%|███████▍ | 9178/12313 [6:52:31<2:19:42, 2.67s/it] {'loss': 0.3851, 'grad_norm': 7.087502286901482, 'learning_rate': 8.029800272464963e-07, 'epoch': 0.75} 75%|███████▍ | 9178/12313 [6:52:31<2:19:42, 2.67s/it] 75%|███████▍ | 9179/12313 [6:52:34<2:16:20, 2.61s/it] {'loss': 0.3312, 'grad_norm': 21.28250462127868, 'learning_rate': 8.024971836021922e-07, 'epoch': 0.75} 75%|███████▍ | 9179/12313 [6:52:34<2:16:20, 2.61s/it] 75%|███████▍ | 9180/12313 [6:52:36<2:15:51, 2.60s/it] {'loss': 0.5862, 'grad_norm': 4.028786793175537, 'learning_rate': 8.020144574160984e-07, 'epoch': 0.75} 75%|███████▍ | 9180/12313 [6:52:36<2:15:51, 2.60s/it] 75%|███████▍ | 9181/12313 [6:52:39<2:16:27, 2.61s/it] {'loss': 0.4179, 'grad_norm': 8.089625182959997, 'learning_rate': 8.015318487216184e-07, 'epoch': 0.75} 75%|███████▍ | 9181/12313 [6:52:39<2:16:27, 2.61s/it] 75%|███████▍ | 9182/12313 [6:52:42<2:14:38, 2.58s/it] {'loss': 0.5247, 'grad_norm': 10.703076445335201, 'learning_rate': 8.010493575521444e-07, 'epoch': 0.75} 75%|███████▍ | 9182/12313 [6:52:42<2:14:38, 2.58s/it] 75%|███████▍ | 9183/12313 [6:52:44<2:13:37, 2.56s/it] {'loss': 0.6692, 'grad_norm': 5.996668534107049, 'learning_rate': 8.005669839410643e-07, 'epoch': 0.75} 75%|███████▍ | 9183/12313 [6:52:44<2:13:37, 2.56s/it] 75%|███████▍ | 9184/12313 [6:52:47<2:14:11, 2.57s/it] {'loss': 0.7248, 'grad_norm': 4.264762756411719, 'learning_rate': 8.00084727921755e-07, 'epoch': 0.75} 75%|███████▍ | 9184/12313 [6:52:47<2:14:11, 2.57s/it] 75%|███████▍ | 9185/12313 [6:52:49<2:16:02, 2.61s/it] {'loss': 0.4624, 'grad_norm': 6.337959117205082, 'learning_rate': 7.996025895275846e-07, 'epoch': 0.75} 75%|███████▍ | 9185/12313 [6:52:49<2:16:02, 2.61s/it] 75%|███████▍ | 9186/12313 [6:52:52<2:16:31, 2.62s/it] {'loss': 0.5874, 'grad_norm': 4.73694838657318, 'learning_rate': 7.991205687919163e-07, 'epoch': 0.75} 75%|███████▍ | 9186/12313 [6:52:52<2:16:31, 2.62s/it] 75%|███████▍ | 9187/12313 [6:52:55<2:18:26, 2.66s/it] {'loss': 0.4305, 'grad_norm': 4.444846677665557, 'learning_rate': 7.986386657481032e-07, 'epoch': 0.75} 75%|███████▍ | 9187/12313 [6:52:55<2:18:26, 2.66s/it] 75%|███████▍ | 9188/12313 [6:52:57<2:17:40, 2.64s/it] {'loss': 0.4158, 'grad_norm': 4.371046177977482, 'learning_rate': 7.981568804294895e-07, 'epoch': 0.75} 75%|███████▍ | 9188/12313 [6:52:57<2:17:40, 2.64s/it] 75%|███████▍ | 9189/12313 [6:53:00<2:14:42, 2.59s/it] {'loss': 0.5297, 'grad_norm': 5.23612698457551, 'learning_rate': 7.976752128694134e-07, 'epoch': 0.75} 75%|███████▍ | 9189/12313 [6:53:00<2:14:42, 2.59s/it] 75%|███████▍ | 9190/12313 [6:53:02<2:14:02, 2.58s/it] {'loss': 0.4, 'grad_norm': 5.6064882601787325, 'learning_rate': 7.971936631012033e-07, 'epoch': 0.75} 75%|███████▍ | 9190/12313 [6:53:02<2:14:02, 2.58s/it] 75%|███████▍ | 9191/12313 [6:53:05<2:14:38, 2.59s/it] {'loss': 0.465, 'grad_norm': 4.736891493880583, 'learning_rate': 7.96712231158179e-07, 'epoch': 0.75} 75%|███████▍ | 9191/12313 [6:53:05<2:14:38, 2.59s/it] 75%|███████▍ | 9192/12313 [6:53:08<2:17:28, 2.64s/it] {'loss': 0.538, 'grad_norm': 16.20683819170043, 'learning_rate': 7.962309170736546e-07, 'epoch': 0.75} 75%|███████▍ | 9192/12313 [6:53:08<2:17:28, 2.64s/it] 75%|███████▍ | 9193/12313 [6:53:11<2:23:51, 2.77s/it] {'loss': 0.5302, 'grad_norm': 4.072174973984312, 'learning_rate': 7.957497208809328e-07, 'epoch': 0.75} 75%|███████▍ | 9193/12313 [6:53:11<2:23:51, 2.77s/it] 75%|███████▍ | 9194/12313 [6:53:14<2:22:29, 2.74s/it] {'loss': 0.5949, 'grad_norm': 3.329371490211698, 'learning_rate': 7.952686426133105e-07, 'epoch': 0.75} 75%|███████▍ | 9194/12313 [6:53:14<2:22:29, 2.74s/it] 75%|███████▍ | 9195/12313 [6:53:16<2:22:04, 2.73s/it] {'loss': 0.5202, 'grad_norm': 10.181237849562704, 'learning_rate': 7.947876823040771e-07, 'epoch': 0.75} 75%|███████▍ | 9195/12313 [6:53:16<2:22:04, 2.73s/it] 75%|███████▍ | 9196/12313 [6:53:19<2:21:08, 2.72s/it] {'loss': 0.3624, 'grad_norm': 4.700711476652391, 'learning_rate': 7.943068399865111e-07, 'epoch': 0.75} 75%|███████▍ | 9196/12313 [6:53:19<2:21:08, 2.72s/it] 75%|███████▍ | 9197/12313 [6:53:21<2:16:19, 2.63s/it] {'loss': 0.3774, 'grad_norm': 4.764991430944277, 'learning_rate': 7.93826115693884e-07, 'epoch': 0.75} 75%|███████▍ | 9197/12313 [6:53:21<2:16:19, 2.63s/it] 75%|███████▍ | 9198/12313 [6:53:24<2:14:58, 2.60s/it] {'loss': 0.3546, 'grad_norm': 7.85688314418302, 'learning_rate': 7.933455094594602e-07, 'epoch': 0.75} 75%|███████▍ | 9198/12313 [6:53:24<2:14:58, 2.60s/it] 75%|███████▍ | 9199/12313 [6:53:27<2:15:25, 2.61s/it] {'loss': 0.6123, 'grad_norm': 4.120541672766695, 'learning_rate': 7.928650213164945e-07, 'epoch': 0.75} 75%|███████▍ | 9199/12313 [6:53:27<2:15:25, 2.61s/it] 75%|███████▍ | 9200/12313 [6:53:29<2:16:22, 2.63s/it] {'loss': 0.5695, 'grad_norm': 7.327958848564803, 'learning_rate': 7.92384651298235e-07, 'epoch': 0.75} 75%|███████▍ | 9200/12313 [6:53:29<2:16:22, 2.63s/it] 75%|███████▍ | 9201/12313 [6:53:32<2:20:30, 2.71s/it] {'loss': 0.4224, 'grad_norm': 5.719769810682618, 'learning_rate': 7.919043994379194e-07, 'epoch': 0.75} 75%|███████▍ | 9201/12313 [6:53:32<2:20:30, 2.71s/it] 75%|███████▍ | 9202/12313 [6:53:35<2:18:33, 2.67s/it] {'loss': 0.5687, 'grad_norm': 4.094688049104328, 'learning_rate': 7.914242657687804e-07, 'epoch': 0.75} 75%|███████▍ | 9202/12313 [6:53:35<2:18:33, 2.67s/it] 75%|███████▍ | 9203/12313 [6:53:37<2:18:47, 2.68s/it] {'loss': 0.429, 'grad_norm': 8.405367942793522, 'learning_rate': 7.909442503240395e-07, 'epoch': 0.75} 75%|███████▍ | 9203/12313 [6:53:37<2:18:47, 2.68s/it] 75%|███████▍ | 9204/12313 [6:53:40<2:15:46, 2.62s/it] {'loss': 0.4256, 'grad_norm': 6.01950206469752, 'learning_rate': 7.904643531369108e-07, 'epoch': 0.75} 75%|███████▍ | 9204/12313 [6:53:40<2:15:46, 2.62s/it] 75%|███████▍ | 9205/12313 [6:53:43<2:22:38, 2.75s/it] {'loss': 0.4707, 'grad_norm': 7.930996919339839, 'learning_rate': 7.899845742406017e-07, 'epoch': 0.75} 75%|███████▍ | 9205/12313 [6:53:43<2:22:38, 2.75s/it] 75%|███████▍ | 9206/12313 [6:53:46<2:21:28, 2.73s/it] {'loss': 0.411, 'grad_norm': 16.262367161009937, 'learning_rate': 7.895049136683095e-07, 'epoch': 0.75} 75%|███████▍ | 9206/12313 [6:53:46<2:21:28, 2.73s/it] 75%|███████▍ | 9207/12313 [6:53:48<2:16:11, 2.63s/it] {'loss': 0.4796, 'grad_norm': 3.3605717819153527, 'learning_rate': 7.890253714532245e-07, 'epoch': 0.75} 75%|███████▍ | 9207/12313 [6:53:48<2:16:11, 2.63s/it] 75%|███████▍ | 9208/12313 [6:53:51<2:15:33, 2.62s/it] {'loss': 0.5579, 'grad_norm': 4.04997390894807, 'learning_rate': 7.885459476285292e-07, 'epoch': 0.75} 75%|███████▍ | 9208/12313 [6:53:51<2:15:33, 2.62s/it] 75%|███████▍ | 9209/12313 [6:53:53<2:15:15, 2.61s/it] {'loss': 0.3457, 'grad_norm': 4.435766819763639, 'learning_rate': 7.880666422273969e-07, 'epoch': 0.75} 75%|███████▍ | 9209/12313 [6:53:53<2:15:15, 2.61s/it] 75%|███████▍ | 9210/12313 [6:53:56<2:14:26, 2.60s/it] {'loss': 0.5797, 'grad_norm': 5.1365435166693505, 'learning_rate': 7.875874552829918e-07, 'epoch': 0.75} 75%|███████▍ | 9210/12313 [6:53:56<2:14:26, 2.60s/it] 75%|███████▍ | 9211/12313 [6:53:58<2:14:35, 2.60s/it] {'loss': 0.5462, 'grad_norm': 5.591130735882937, 'learning_rate': 7.871083868284726e-07, 'epoch': 0.75} 75%|███████▍ | 9211/12313 [6:53:58<2:14:35, 2.60s/it] 75%|███████▍ | 9212/12313 [6:54:01<2:15:06, 2.61s/it] {'loss': 0.4643, 'grad_norm': 8.16910174430336, 'learning_rate': 7.866294368969871e-07, 'epoch': 0.75} 75%|███████▍ | 9212/12313 [6:54:01<2:15:06, 2.61s/it] 75%|███████▍ | 9213/12313 [6:54:04<2:15:19, 2.62s/it] {'loss': 0.5222, 'grad_norm': 7.038360149078053, 'learning_rate': 7.861506055216764e-07, 'epoch': 0.75} 75%|███████▍ | 9213/12313 [6:54:04<2:15:19, 2.62s/it] 75%|███████▍ | 9214/12313 [6:54:07<2:20:29, 2.72s/it] {'loss': 0.5721, 'grad_norm': 5.659068513220842, 'learning_rate': 7.856718927356743e-07, 'epoch': 0.75} 75%|███████▍ | 9214/12313 [6:54:07<2:20:29, 2.72s/it] 75%|███████▍ | 9215/12313 [6:54:09<2:18:39, 2.69s/it] {'loss': 0.5484, 'grad_norm': 7.479766446282722, 'learning_rate': 7.851932985721042e-07, 'epoch': 0.75} 75%|███████▍ | 9215/12313 [6:54:09<2:18:39, 2.69s/it] 75%|███████▍ | 9216/12313 [6:54:12<2:18:57, 2.69s/it] {'loss': 0.3887, 'grad_norm': 7.0753754777583575, 'learning_rate': 7.847148230640825e-07, 'epoch': 0.75} 75%|███████▍ | 9216/12313 [6:54:12<2:18:57, 2.69s/it] 75%|███████▍ | 9217/12313 [6:54:15<2:18:40, 2.69s/it] {'loss': 0.4933, 'grad_norm': 6.933072652515654, 'learning_rate': 7.842364662447161e-07, 'epoch': 0.75} 75%|███████▍ | 9217/12313 [6:54:15<2:18:40, 2.69s/it] 75%|███████▍ | 9218/12313 [6:54:17<2:17:30, 2.67s/it] {'loss': 0.4314, 'grad_norm': 4.082184303346161, 'learning_rate': 7.837582281471065e-07, 'epoch': 0.75} 75%|███████▍ | 9218/12313 [6:54:17<2:17:30, 2.67s/it] 75%|███████▍ | 9219/12313 [6:54:20<2:18:19, 2.68s/it] {'loss': 0.3486, 'grad_norm': 5.120311127792642, 'learning_rate': 7.832801088043438e-07, 'epoch': 0.75} 75%|███████▍ | 9219/12313 [6:54:20<2:18:19, 2.68s/it] 75%|███████▍ | 9220/12313 [6:54:23<2:17:07, 2.66s/it] {'loss': 0.5207, 'grad_norm': 5.389253627233221, 'learning_rate': 7.828021082495118e-07, 'epoch': 0.75} 75%|███████▍ | 9220/12313 [6:54:23<2:17:07, 2.66s/it] 75%|███████▍ | 9221/12313 [6:54:25<2:17:46, 2.67s/it] {'loss': 0.3661, 'grad_norm': 5.865053171467202, 'learning_rate': 7.823242265156866e-07, 'epoch': 0.75} 75%|███████▍ | 9221/12313 [6:54:25<2:17:46, 2.67s/it] 75%|███████▍ | 9222/12313 [6:54:28<2:17:03, 2.66s/it] {'loss': 0.6257, 'grad_norm': 4.518682135966551, 'learning_rate': 7.818464636359344e-07, 'epoch': 0.75} 75%|███████▍ | 9222/12313 [6:54:28<2:17:03, 2.66s/it] 75%|███████▍ | 9223/12313 [6:54:31<2:21:03, 2.74s/it] {'loss': 0.4796, 'grad_norm': 5.19395467966571, 'learning_rate': 7.813688196433125e-07, 'epoch': 0.75} 75%|███████▍ | 9223/12313 [6:54:31<2:21:03, 2.74s/it] 75%|███████▍ | 9224/12313 [6:54:34<2:24:22, 2.80s/it] {'loss': 0.4585, 'grad_norm': 4.850339059401898, 'learning_rate': 7.808912945708738e-07, 'epoch': 0.75} 75%|███████▍ | 9224/12313 [6:54:34<2:24:22, 2.80s/it] 75%|███████▍ | 9225/12313 [6:54:36<2:22:07, 2.76s/it] {'loss': 0.3884, 'grad_norm': 5.013059286403891, 'learning_rate': 7.804138884516583e-07, 'epoch': 0.75} 75%|███████▍ | 9225/12313 [6:54:36<2:22:07, 2.76s/it] 75%|███████▍ | 9226/12313 [6:54:39<2:24:41, 2.81s/it] {'loss': 0.4484, 'grad_norm': 4.841130061593439, 'learning_rate': 7.799366013187007e-07, 'epoch': 0.75} 75%|███████▍ | 9226/12313 [6:54:39<2:24:41, 2.81s/it] 75%|███████▍ | 9227/12313 [6:54:42<2:29:17, 2.90s/it] {'loss': 0.4274, 'grad_norm': 5.230772273048009, 'learning_rate': 7.794594332050282e-07, 'epoch': 0.75} 75%|███████▍ | 9227/12313 [6:54:42<2:29:17, 2.90s/it] 75%|███████▍ | 9228/12313 [6:54:45<2:26:21, 2.85s/it] {'loss': 0.4516, 'grad_norm': 7.075947384663193, 'learning_rate': 7.789823841436567e-07, 'epoch': 0.75} 75%|███████▍ | 9228/12313 [6:54:45<2:26:21, 2.85s/it] 75%|███████▍ | 9229/12313 [6:54:48<2:25:07, 2.82s/it] {'loss': 0.3975, 'grad_norm': 6.992483714956555, 'learning_rate': 7.785054541675954e-07, 'epoch': 0.75} 75%|███████▍ | 9229/12313 [6:54:48<2:25:07, 2.82s/it] 75%|███████▍ | 9230/12313 [6:54:51<2:25:35, 2.83s/it] {'loss': 0.7948, 'grad_norm': 4.561345669683036, 'learning_rate': 7.780286433098464e-07, 'epoch': 0.75} 75%|███████▍ | 9230/12313 [6:54:51<2:25:35, 2.83s/it] 75%|███████▍ | 9231/12313 [6:54:53<2:22:06, 2.77s/it] {'loss': 0.4752, 'grad_norm': 4.478198203586605, 'learning_rate': 7.775519516034019e-07, 'epoch': 0.75} 75%|███████▍ | 9231/12313 [6:54:53<2:22:06, 2.77s/it] 75%|███████▍ | 9232/12313 [6:54:56<2:15:20, 2.64s/it] {'loss': 0.5631, 'grad_norm': 3.9360754902947894, 'learning_rate': 7.770753790812455e-07, 'epoch': 0.75} 75%|███████▍ | 9232/12313 [6:54:56<2:15:20, 2.64s/it] 75%|███████▍ | 9233/12313 [6:54:58<2:15:41, 2.64s/it] {'loss': 0.4384, 'grad_norm': 6.792888436524307, 'learning_rate': 7.765989257763545e-07, 'epoch': 0.75} 75%|███████▍ | 9233/12313 [6:54:58<2:15:41, 2.64s/it] 75%|███████▍ | 9234/12313 [6:55:01<2:13:58, 2.61s/it] {'loss': 0.4428, 'grad_norm': 5.737542594203367, 'learning_rate': 7.761225917216978e-07, 'epoch': 0.75} 75%|███████▍ | 9234/12313 [6:55:01<2:13:58, 2.61s/it] 75%|███████▌ | 9235/12313 [6:55:04<2:13:25, 2.60s/it] {'loss': 0.4604, 'grad_norm': 10.689284036969283, 'learning_rate': 7.75646376950234e-07, 'epoch': 0.75} 75%|███████▌ | 9235/12313 [6:55:04<2:13:25, 2.60s/it] 75%|███████▌ | 9236/12313 [6:55:06<2:15:37, 2.64s/it] {'loss': 0.5013, 'grad_norm': 22.015199504327047, 'learning_rate': 7.751702814949145e-07, 'epoch': 0.75} 75%|███████▌ | 9236/12313 [6:55:06<2:15:37, 2.64s/it] 75%|███████▌ | 9237/12313 [6:55:09<2:17:33, 2.68s/it] {'loss': 0.4218, 'grad_norm': 9.731543225761502, 'learning_rate': 7.746943053886835e-07, 'epoch': 0.75} 75%|███████▌ | 9237/12313 [6:55:09<2:17:33, 2.68s/it] 75%|███████▌ | 9238/12313 [6:55:12<2:15:16, 2.64s/it] {'loss': 0.5271, 'grad_norm': 6.82059578534053, 'learning_rate': 7.742184486644746e-07, 'epoch': 0.75} 75%|███████▌ | 9238/12313 [6:55:12<2:15:16, 2.64s/it] 75%|███████▌ | 9239/12313 [6:55:14<2:16:16, 2.66s/it] {'loss': 0.5683, 'grad_norm': 4.580942490166404, 'learning_rate': 7.737427113552157e-07, 'epoch': 0.75} 75%|███████▌ | 9239/12313 [6:55:14<2:16:16, 2.66s/it] 75%|███████▌ | 9240/12313 [6:55:17<2:19:31, 2.72s/it] {'loss': 0.4838, 'grad_norm': 3.654932263444829, 'learning_rate': 7.732670934938257e-07, 'epoch': 0.75} 75%|███████▌ | 9240/12313 [6:55:17<2:19:31, 2.72s/it] 75%|███████▌ | 9241/12313 [6:55:20<2:17:37, 2.69s/it] {'loss': 0.478, 'grad_norm': 5.579207786127818, 'learning_rate': 7.727915951132145e-07, 'epoch': 0.75} 75%|███████▌ | 9241/12313 [6:55:20<2:17:37, 2.69s/it] 75%|███████▌ | 9242/12313 [6:55:23<2:18:20, 2.70s/it] {'loss': 0.5107, 'grad_norm': 4.861880759749431, 'learning_rate': 7.723162162462827e-07, 'epoch': 0.75} 75%|███████▌ | 9242/12313 [6:55:23<2:18:20, 2.70s/it] 75%|███████▌ | 9243/12313 [6:55:25<2:17:13, 2.68s/it] {'loss': 0.439, 'grad_norm': 3.336615456803476, 'learning_rate': 7.718409569259261e-07, 'epoch': 0.75} 75%|███████▌ | 9243/12313 [6:55:25<2:17:13, 2.68s/it] 75%|███████▌ | 9244/12313 [6:55:28<2:21:53, 2.77s/it] {'loss': 0.5717, 'grad_norm': 3.594363219165978, 'learning_rate': 7.713658171850289e-07, 'epoch': 0.75} 75%|███████▌ | 9244/12313 [6:55:28<2:21:53, 2.77s/it] 75%|███████▌ | 9245/12313 [6:55:31<2:19:41, 2.73s/it] {'loss': 0.4124, 'grad_norm': 6.352249538029256, 'learning_rate': 7.708907970564672e-07, 'epoch': 0.75} 75%|███████▌ | 9245/12313 [6:55:31<2:19:41, 2.73s/it] 75%|███████▌ | 9246/12313 [6:55:34<2:20:42, 2.75s/it] {'loss': 0.6313, 'grad_norm': 5.815211228408834, 'learning_rate': 7.704158965731126e-07, 'epoch': 0.75} 75%|███████▌ | 9246/12313 [6:55:34<2:20:42, 2.75s/it] 75%|███████▌ | 9247/12313 [6:55:36<2:18:42, 2.71s/it] {'loss': 0.4166, 'grad_norm': 4.251212567683483, 'learning_rate': 7.699411157678241e-07, 'epoch': 0.75} 75%|███████▌ | 9247/12313 [6:55:36<2:18:42, 2.71s/it] 75%|███████▌ | 9248/12313 [6:55:39<2:18:38, 2.71s/it] {'loss': 0.5619, 'grad_norm': 4.650261805037805, 'learning_rate': 7.694664546734534e-07, 'epoch': 0.75} 75%|███████▌ | 9248/12313 [6:55:39<2:18:38, 2.71s/it] 75%|███████▌ | 9249/12313 [6:55:42<2:21:32, 2.77s/it] {'loss': 0.3819, 'grad_norm': 7.169544455959594, 'learning_rate': 7.689919133228462e-07, 'epoch': 0.75} 75%|███████▌ | 9249/12313 [6:55:42<2:21:32, 2.77s/it] 75%|███████▌ | 9250/12313 [6:55:44<2:16:03, 2.67s/it] {'loss': 0.45, 'grad_norm': 5.930269929305272, 'learning_rate': 7.685174917488375e-07, 'epoch': 0.75} 75%|███████▌ | 9250/12313 [6:55:44<2:16:03, 2.67s/it] 75%|███████▌ | 9251/12313 [6:55:48<2:32:32, 2.99s/it] {'loss': 0.4904, 'grad_norm': 4.069763452953068, 'learning_rate': 7.680431899842538e-07, 'epoch': 0.75} 75%|███████▌ | 9251/12313 [6:55:48<2:32:32, 2.99s/it] 75%|███████▌ | 9252/12313 [6:55:51<2:27:11, 2.89s/it] {'loss': 0.42, 'grad_norm': 6.375436578366889, 'learning_rate': 7.67569008061915e-07, 'epoch': 0.75} 75%|███████▌ | 9252/12313 [6:55:51<2:27:11, 2.89s/it] 75%|███████▌ | 9253/12313 [6:55:53<2:24:37, 2.84s/it] {'loss': 0.5423, 'grad_norm': 4.507854006209749, 'learning_rate': 7.670949460146329e-07, 'epoch': 0.75} 75%|███████▌ | 9253/12313 [6:55:53<2:24:37, 2.84s/it] 75%|███████▌ | 9254/12313 [6:55:56<2:21:50, 2.78s/it] {'loss': 0.4196, 'grad_norm': 4.6919308246764615, 'learning_rate': 7.666210038752092e-07, 'epoch': 0.75} 75%|███████▌ | 9254/12313 [6:55:56<2:21:50, 2.78s/it] 75%|███████▌ | 9255/12313 [6:55:59<2:19:41, 2.74s/it] {'loss': 0.5427, 'grad_norm': 7.409437183391873, 'learning_rate': 7.661471816764377e-07, 'epoch': 0.75} 75%|███████▌ | 9255/12313 [6:55:59<2:19:41, 2.74s/it] 75%|███████▌ | 9256/12313 [6:56:01<2:18:31, 2.72s/it] {'loss': 0.4405, 'grad_norm': 3.4419438252999446, 'learning_rate': 7.656734794511056e-07, 'epoch': 0.75} 75%|███████▌ | 9256/12313 [6:56:01<2:18:31, 2.72s/it] 75%|███████▌ | 9257/12313 [6:56:04<2:18:25, 2.72s/it] {'loss': 0.4739, 'grad_norm': 12.815965237950453, 'learning_rate': 7.65199897231989e-07, 'epoch': 0.75} 75%|███████▌ | 9257/12313 [6:56:04<2:18:25, 2.72s/it] 75%|███████▌ | 9258/12313 [6:56:07<2:18:54, 2.73s/it] {'loss': 0.4545, 'grad_norm': 6.654081291463507, 'learning_rate': 7.647264350518582e-07, 'epoch': 0.75} 75%|███████▌ | 9258/12313 [6:56:07<2:18:54, 2.73s/it] 75%|███████▌ | 9259/12313 [6:56:09<2:18:32, 2.72s/it] {'loss': 0.4012, 'grad_norm': 4.719867564770338, 'learning_rate': 7.642530929434752e-07, 'epoch': 0.75} 75%|███████▌ | 9259/12313 [6:56:09<2:18:32, 2.72s/it] 75%|███████▌ | 9260/12313 [6:56:12<2:17:26, 2.70s/it] {'loss': 0.5197, 'grad_norm': 5.37752756201268, 'learning_rate': 7.637798709395919e-07, 'epoch': 0.75} 75%|███████▌ | 9260/12313 [6:56:12<2:17:26, 2.70s/it] 75%|███████▌ | 9261/12313 [6:56:15<2:17:33, 2.70s/it] {'loss': 0.367, 'grad_norm': 5.47385742990871, 'learning_rate': 7.633067690729517e-07, 'epoch': 0.75} 75%|███████▌ | 9261/12313 [6:56:15<2:17:33, 2.70s/it] 75%|███████▌ | 9262/12313 [6:56:17<2:16:05, 2.68s/it] {'loss': 0.4252, 'grad_norm': 4.4733368861841045, 'learning_rate': 7.628337873762928e-07, 'epoch': 0.75} 75%|███████▌ | 9262/12313 [6:56:17<2:16:05, 2.68s/it] 75%|███████▌ | 9263/12313 [6:56:20<2:18:17, 2.72s/it] {'loss': 0.4848, 'grad_norm': 6.510631608383568, 'learning_rate': 7.62360925882342e-07, 'epoch': 0.75} 75%|███████▌ | 9263/12313 [6:56:20<2:18:17, 2.72s/it] 75%|███████▌ | 9264/12313 [6:56:23<2:14:45, 2.65s/it] {'loss': 0.3856, 'grad_norm': 8.2985790641125, 'learning_rate': 7.618881846238177e-07, 'epoch': 0.75} 75%|███████▌ | 9264/12313 [6:56:23<2:14:45, 2.65s/it] 75%|███████▌ | 9265/12313 [6:56:25<2:13:36, 2.63s/it] {'loss': 0.284, 'grad_norm': 8.29108898122942, 'learning_rate': 7.614155636334325e-07, 'epoch': 0.75} 75%|███████▌ | 9265/12313 [6:56:25<2:13:36, 2.63s/it] 75%|███████▌ | 9266/12313 [6:56:28<2:18:48, 2.73s/it] {'loss': 0.4909, 'grad_norm': 14.447143227318355, 'learning_rate': 7.609430629438896e-07, 'epoch': 0.75} 75%|███████▌ | 9266/12313 [6:56:28<2:18:48, 2.73s/it] 75%|███████▌ | 9267/12313 [6:56:31<2:23:04, 2.82s/it] {'loss': 0.5262, 'grad_norm': 9.786187786821417, 'learning_rate': 7.604706825878822e-07, 'epoch': 0.75} 75%|███████▌ | 9267/12313 [6:56:31<2:23:04, 2.82s/it] 75%|███████▌ | 9268/12313 [6:56:34<2:21:21, 2.79s/it] {'loss': 0.4241, 'grad_norm': 6.314038643222093, 'learning_rate': 7.59998422598098e-07, 'epoch': 0.75} 75%|███████▌ | 9268/12313 [6:56:34<2:21:21, 2.79s/it] 75%|███████▌ | 9269/12313 [6:56:37<2:22:24, 2.81s/it] {'loss': 0.4264, 'grad_norm': 3.701696525893301, 'learning_rate': 7.595262830072142e-07, 'epoch': 0.75} 75%|███████▌ | 9269/12313 [6:56:37<2:22:24, 2.81s/it] 75%|███████▌ | 9270/12313 [6:56:40<2:22:48, 2.82s/it] {'loss': 0.5874, 'grad_norm': 6.147998028522049, 'learning_rate': 7.590542638478992e-07, 'epoch': 0.75} 75%|███████▌ | 9270/12313 [6:56:40<2:22:48, 2.82s/it] 75%|███████▌ | 9271/12313 [6:56:42<2:19:06, 2.74s/it] {'loss': 0.5389, 'grad_norm': 5.173315947912089, 'learning_rate': 7.585823651528157e-07, 'epoch': 0.75} 75%|███████▌ | 9271/12313 [6:56:42<2:19:06, 2.74s/it] 75%|███████▌ | 9272/12313 [6:56:45<2:17:11, 2.71s/it] {'loss': 0.3099, 'grad_norm': 5.039146817437555, 'learning_rate': 7.581105869546168e-07, 'epoch': 0.75} 75%|███████▌ | 9272/12313 [6:56:45<2:17:11, 2.71s/it] 75%|███████▌ | 9273/12313 [6:56:48<2:15:38, 2.68s/it] {'loss': 0.5686, 'grad_norm': 6.7461838507335505, 'learning_rate': 7.576389292859465e-07, 'epoch': 0.75} 75%|███████▌ | 9273/12313 [6:56:48<2:15:38, 2.68s/it] 75%|███████▌ | 9274/12313 [6:56:50<2:14:14, 2.65s/it] {'loss': 0.5206, 'grad_norm': 4.776535888289513, 'learning_rate': 7.5716739217944e-07, 'epoch': 0.75} 75%|███████▌ | 9274/12313 [6:56:50<2:14:14, 2.65s/it] 75%|███████▌ | 9275/12313 [6:56:53<2:12:13, 2.61s/it] {'loss': 0.4071, 'grad_norm': 5.586476353122101, 'learning_rate': 7.566959756677272e-07, 'epoch': 0.75} 75%|███████▌ | 9275/12313 [6:56:53<2:12:13, 2.61s/it] 75%|███████▌ | 9276/12313 [6:56:56<2:16:29, 2.70s/it] {'loss': 0.4455, 'grad_norm': 6.574615954432533, 'learning_rate': 7.562246797834266e-07, 'epoch': 0.75} 75%|███████▌ | 9276/12313 [6:56:56<2:16:29, 2.70s/it] 75%|███████▌ | 9277/12313 [6:56:58<2:19:28, 2.76s/it] {'loss': 0.5776, 'grad_norm': 3.8731896502314855, 'learning_rate': 7.557535045591485e-07, 'epoch': 0.75} 75%|███████▌ | 9277/12313 [6:56:58<2:19:28, 2.76s/it] 75%|███████▌ | 9278/12313 [6:57:01<2:19:02, 2.75s/it] {'loss': 0.4607, 'grad_norm': 4.1035501788561906, 'learning_rate': 7.552824500274963e-07, 'epoch': 0.75} 75%|███████▌ | 9278/12313 [6:57:01<2:19:02, 2.75s/it] 75%|███████▌ | 9279/12313 [6:57:04<2:16:16, 2.69s/it] {'loss': 0.5089, 'grad_norm': 3.37747957490476, 'learning_rate': 7.548115162210659e-07, 'epoch': 0.75} 75%|███████▌ | 9279/12313 [6:57:04<2:16:16, 2.69s/it] 75%|███████▌ | 9280/12313 [6:57:06<2:15:03, 2.67s/it] {'loss': 0.7355, 'grad_norm': 5.307633133304405, 'learning_rate': 7.543407031724415e-07, 'epoch': 0.75} 75%|███████▌ | 9280/12313 [6:57:06<2:15:03, 2.67s/it] 75%|███████▌ | 9281/12313 [6:57:09<2:13:21, 2.64s/it] {'loss': 0.5037, 'grad_norm': 5.0482613355790695, 'learning_rate': 7.538700109142022e-07, 'epoch': 0.75} 75%|███████▌ | 9281/12313 [6:57:09<2:13:21, 2.64s/it] 75%|███████▌ | 9282/12313 [6:57:12<2:14:38, 2.67s/it] {'loss': 0.4972, 'grad_norm': 5.458953154438223, 'learning_rate': 7.533994394789171e-07, 'epoch': 0.75} 75%|███████▌ | 9282/12313 [6:57:12<2:14:38, 2.67s/it] 75%|███████▌ | 9283/12313 [6:57:14<2:15:01, 2.67s/it] {'loss': 0.5581, 'grad_norm': 8.54249426936385, 'learning_rate': 7.529289888991462e-07, 'epoch': 0.75} 75%|███████▌ | 9283/12313 [6:57:14<2:15:01, 2.67s/it] 75%|███████▌ | 9284/12313 [6:57:18<2:28:40, 2.94s/it] {'loss': 0.4335, 'grad_norm': 7.083450524943488, 'learning_rate': 7.524586592074432e-07, 'epoch': 0.75} 75%|███████▌ | 9284/12313 [6:57:18<2:28:40, 2.94s/it] 75%|███████▌ | 9285/12313 [6:57:21<2:25:03, 2.87s/it] {'loss': 0.5737, 'grad_norm': 3.984907663593172, 'learning_rate': 7.519884504363525e-07, 'epoch': 0.75} 75%|███████▌ | 9285/12313 [6:57:21<2:25:03, 2.87s/it] 75%|███████▌ | 9286/12313 [6:57:23<2:22:33, 2.83s/it] {'loss': 0.4297, 'grad_norm': 11.241450055916163, 'learning_rate': 7.515183626184095e-07, 'epoch': 0.75} 75%|███████▌ | 9286/12313 [6:57:23<2:22:33, 2.83s/it] 75%|███████▌ | 9287/12313 [6:57:26<2:21:27, 2.80s/it] {'loss': 0.5938, 'grad_norm': 5.7339323931383035, 'learning_rate': 7.510483957861428e-07, 'epoch': 0.75} 75%|███████▌ | 9287/12313 [6:57:26<2:21:27, 2.80s/it] 75%|███████▌ | 9288/12313 [6:57:29<2:17:14, 2.72s/it] {'loss': 0.3925, 'grad_norm': 4.59002005480556, 'learning_rate': 7.505785499720708e-07, 'epoch': 0.75} 75%|███████▌ | 9288/12313 [6:57:29<2:17:14, 2.72s/it] 75%|███████▌ | 9289/12313 [6:57:31<2:17:47, 2.73s/it] {'loss': 0.5777, 'grad_norm': 4.295589622295897, 'learning_rate': 7.501088252087046e-07, 'epoch': 0.75} 75%|███████▌ | 9289/12313 [6:57:31<2:17:47, 2.73s/it] 75%|███████▌ | 9290/12313 [6:57:34<2:16:29, 2.71s/it] {'loss': 0.5826, 'grad_norm': 6.677871787585794, 'learning_rate': 7.496392215285456e-07, 'epoch': 0.75} 75%|███████▌ | 9290/12313 [6:57:34<2:16:29, 2.71s/it] 75%|███████▌ | 9291/12313 [6:57:37<2:18:33, 2.75s/it] {'loss': 0.6462, 'grad_norm': 6.4999364694050366, 'learning_rate': 7.49169738964089e-07, 'epoch': 0.75} 75%|███████▌ | 9291/12313 [6:57:37<2:18:33, 2.75s/it] 75%|███████▌ | 9292/12313 [6:57:40<2:17:33, 2.73s/it] {'loss': 0.4204, 'grad_norm': 4.2437533825373706, 'learning_rate': 7.487003775478208e-07, 'epoch': 0.75} 75%|███████▌ | 9292/12313 [6:57:40<2:17:33, 2.73s/it] 75%|███████▌ | 9293/12313 [6:57:42<2:16:57, 2.72s/it] {'loss': 0.4936, 'grad_norm': 6.9886624820729475, 'learning_rate': 7.482311373122173e-07, 'epoch': 0.75} 75%|███████▌ | 9293/12313 [6:57:42<2:16:57, 2.72s/it] 75%|███████▌ | 9294/12313 [6:57:45<2:14:25, 2.67s/it] {'loss': 0.5231, 'grad_norm': 9.509767514085462, 'learning_rate': 7.477620182897485e-07, 'epoch': 0.75} 75%|███████▌ | 9294/12313 [6:57:45<2:14:25, 2.67s/it] 75%|███████▌ | 9295/12313 [6:57:48<2:18:49, 2.76s/it] {'loss': 0.614, 'grad_norm': 6.97430223457955, 'learning_rate': 7.472930205128748e-07, 'epoch': 0.75} 75%|███████▌ | 9295/12313 [6:57:48<2:18:49, 2.76s/it] 75%|███████▌ | 9296/12313 [6:57:51<2:18:32, 2.76s/it] {'loss': 0.4198, 'grad_norm': 6.802220975306286, 'learning_rate': 7.46824144014047e-07, 'epoch': 0.75} 75%|███████▌ | 9296/12313 [6:57:51<2:18:32, 2.76s/it] 76%|███████▌ | 9297/12313 [6:57:53<2:16:27, 2.71s/it] {'loss': 0.5168, 'grad_norm': 5.419177843717326, 'learning_rate': 7.4635538882571e-07, 'epoch': 0.76} 76%|███████▌ | 9297/12313 [6:57:53<2:16:27, 2.71s/it] 76%|███████▌ | 9298/12313 [6:57:56<2:14:17, 2.67s/it] {'loss': 0.4739, 'grad_norm': 4.5756834213715365, 'learning_rate': 7.458867549802998e-07, 'epoch': 0.76} 76%|███████▌ | 9298/12313 [6:57:56<2:14:17, 2.67s/it] 76%|███████▌ | 9299/12313 [6:57:58<2:15:00, 2.69s/it] {'loss': 0.3756, 'grad_norm': 10.932740692189409, 'learning_rate': 7.454182425102418e-07, 'epoch': 0.76} 76%|███████▌ | 9299/12313 [6:57:58<2:15:00, 2.69s/it] 76%|███████▌ | 9300/12313 [6:58:01<2:14:10, 2.67s/it] {'loss': 0.4295, 'grad_norm': 7.458052877323758, 'learning_rate': 7.449498514479564e-07, 'epoch': 0.76} 76%|███████▌ | 9300/12313 [6:58:01<2:14:10, 2.67s/it] 76%|███████▌ | 9301/12313 [6:58:04<2:12:51, 2.65s/it] {'loss': 0.5706, 'grad_norm': 7.572226076657359, 'learning_rate': 7.444815818258527e-07, 'epoch': 0.76} 76%|███████▌ | 9301/12313 [6:58:04<2:12:51, 2.65s/it] 76%|███████▌ | 9302/12313 [6:58:06<2:13:20, 2.66s/it] {'loss': 0.7025, 'grad_norm': 7.895090561106518, 'learning_rate': 7.440134336763316e-07, 'epoch': 0.76} 76%|███████▌ | 9302/12313 [6:58:06<2:13:20, 2.66s/it] 76%|███████▌ | 9303/12313 [6:58:09<2:13:42, 2.67s/it] {'loss': 0.4952, 'grad_norm': 7.629608085828636, 'learning_rate': 7.435454070317885e-07, 'epoch': 0.76} 76%|███████▌ | 9303/12313 [6:58:09<2:13:42, 2.67s/it] 76%|███████▌ | 9304/12313 [6:58:12<2:12:00, 2.63s/it] {'loss': 0.3872, 'grad_norm': 5.105832653252446, 'learning_rate': 7.430775019246064e-07, 'epoch': 0.76} 76%|███████▌ | 9304/12313 [6:58:12<2:12:00, 2.63s/it] 76%|███████▌ | 9305/12313 [6:58:14<2:11:13, 2.62s/it] {'loss': 0.3978, 'grad_norm': 5.491484533703616, 'learning_rate': 7.426097183871636e-07, 'epoch': 0.76} 76%|███████▌ | 9305/12313 [6:58:14<2:11:13, 2.62s/it] 76%|███████▌ | 9306/12313 [6:58:17<2:15:30, 2.70s/it] {'loss': 0.6585, 'grad_norm': 6.946802048002631, 'learning_rate': 7.421420564518267e-07, 'epoch': 0.76} 76%|███████▌ | 9306/12313 [6:58:17<2:15:30, 2.70s/it] 76%|███████▌ | 9307/12313 [6:58:20<2:13:41, 2.67s/it] {'loss': 0.6047, 'grad_norm': 3.9872313781539206, 'learning_rate': 7.41674516150957e-07, 'epoch': 0.76} 76%|███████▌ | 9307/12313 [6:58:20<2:13:41, 2.67s/it] 76%|███████▌ | 9308/12313 [6:58:22<2:12:17, 2.64s/it] {'loss': 0.583, 'grad_norm': 7.213117113503657, 'learning_rate': 7.412070975169047e-07, 'epoch': 0.76} 76%|███████▌ | 9308/12313 [6:58:22<2:12:17, 2.64s/it] 76%|███████▌ | 9309/12313 [6:58:25<2:19:26, 2.79s/it] {'loss': 0.4034, 'grad_norm': 5.24742138382295, 'learning_rate': 7.407398005820123e-07, 'epoch': 0.76} 76%|███████▌ | 9309/12313 [6:58:25<2:19:26, 2.79s/it] 76%|███████▌ | 9310/12313 [6:58:28<2:18:51, 2.77s/it] {'loss': 0.4058, 'grad_norm': 4.211946251001222, 'learning_rate': 7.402726253786152e-07, 'epoch': 0.76} 76%|███████▌ | 9310/12313 [6:58:28<2:18:51, 2.77s/it] 76%|███████▌ | 9311/12313 [6:58:31<2:24:12, 2.88s/it] {'loss': 0.5176, 'grad_norm': 4.415557038084326, 'learning_rate': 7.398055719390399e-07, 'epoch': 0.76} 76%|███████▌ | 9311/12313 [6:58:31<2:24:12, 2.88s/it] 76%|███████▌ | 9312/12313 [6:58:34<2:18:50, 2.78s/it] {'loss': 0.533, 'grad_norm': 4.229473804773957, 'learning_rate': 7.39338640295602e-07, 'epoch': 0.76} 76%|███████▌ | 9312/12313 [6:58:34<2:18:50, 2.78s/it] 76%|███████▌ | 9313/12313 [6:58:36<2:14:26, 2.69s/it] {'loss': 0.4943, 'grad_norm': 6.390722780847675, 'learning_rate': 7.388718304806133e-07, 'epoch': 0.76} 76%|███████▌ | 9313/12313 [6:58:36<2:14:26, 2.69s/it] 76%|███████▌ | 9314/12313 [6:58:39<2:14:48, 2.70s/it] {'loss': 0.6664, 'grad_norm': 3.9879443116206224, 'learning_rate': 7.384051425263733e-07, 'epoch': 0.76} 76%|███████▌ | 9314/12313 [6:58:39<2:14:48, 2.70s/it] 76%|███████▌ | 9315/12313 [6:58:42<2:13:09, 2.66s/it] {'loss': 0.3858, 'grad_norm': 4.396256081895415, 'learning_rate': 7.379385764651737e-07, 'epoch': 0.76} 76%|███████▌ | 9315/12313 [6:58:42<2:13:09, 2.66s/it] 76%|███████▌ | 9316/12313 [6:58:44<2:09:47, 2.60s/it] {'loss': 0.6082, 'grad_norm': 9.650686792105313, 'learning_rate': 7.374721323292985e-07, 'epoch': 0.76} 76%|███████▌ | 9316/12313 [6:58:44<2:09:47, 2.60s/it] 76%|███████▌ | 9317/12313 [6:58:47<2:08:11, 2.57s/it] {'loss': 0.4969, 'grad_norm': 4.52159603222765, 'learning_rate': 7.370058101510249e-07, 'epoch': 0.76} 76%|███████▌ | 9317/12313 [6:58:47<2:08:11, 2.57s/it] 76%|███████▌ | 9318/12313 [6:58:49<2:06:08, 2.53s/it] {'loss': 0.6284, 'grad_norm': 4.592670810869141, 'learning_rate': 7.365396099626176e-07, 'epoch': 0.76} 76%|███████▌ | 9318/12313 [6:58:49<2:06:08, 2.53s/it] 76%|███████▌ | 9319/12313 [6:58:52<2:08:12, 2.57s/it] {'loss': 0.4282, 'grad_norm': 14.987779633029737, 'learning_rate': 7.360735317963374e-07, 'epoch': 0.76} 76%|███████▌ | 9319/12313 [6:58:52<2:08:12, 2.57s/it] 76%|███████▌ | 9320/12313 [6:58:54<2:06:32, 2.54s/it] {'loss': 0.5101, 'grad_norm': 5.108452215084867, 'learning_rate': 7.356075756844333e-07, 'epoch': 0.76} 76%|███████▌ | 9320/12313 [6:58:54<2:06:32, 2.54s/it] 76%|███████▌ | 9321/12313 [6:58:57<2:06:15, 2.53s/it] {'loss': 0.4218, 'grad_norm': 6.336779399554521, 'learning_rate': 7.351417416591461e-07, 'epoch': 0.76} 76%|███████▌ | 9321/12313 [6:58:57<2:06:15, 2.53s/it] 76%|███████▌ | 9322/12313 [6:59:00<2:12:15, 2.65s/it] {'loss': 0.4596, 'grad_norm': 3.2420955010959633, 'learning_rate': 7.346760297527109e-07, 'epoch': 0.76} 76%|███████▌ | 9322/12313 [6:59:00<2:12:15, 2.65s/it] 76%|███████▌ | 9323/12313 [6:59:02<2:10:57, 2.63s/it] {'loss': 0.6222, 'grad_norm': 14.31442060248585, 'learning_rate': 7.342104399973507e-07, 'epoch': 0.76} 76%|███████▌ | 9323/12313 [6:59:02<2:10:57, 2.63s/it] 76%|███████▌ | 9324/12313 [6:59:05<2:11:38, 2.64s/it] {'loss': 0.3906, 'grad_norm': 4.841998000446461, 'learning_rate': 7.337449724252837e-07, 'epoch': 0.76} 76%|███████▌ | 9324/12313 [6:59:05<2:11:38, 2.64s/it] 76%|███████▌ | 9325/12313 [6:59:07<2:11:06, 2.63s/it] {'loss': 0.5206, 'grad_norm': 6.189602667818374, 'learning_rate': 7.332796270687159e-07, 'epoch': 0.76} 76%|███████▌ | 9325/12313 [6:59:07<2:11:06, 2.63s/it] 76%|███████▌ | 9326/12313 [6:59:11<2:19:59, 2.81s/it] {'loss': 0.4608, 'grad_norm': 3.4330814806419747, 'learning_rate': 7.328144039598487e-07, 'epoch': 0.76} 76%|███████▌ | 9326/12313 [6:59:11<2:19:59, 2.81s/it] 76%|███████▌ | 9327/12313 [6:59:13<2:15:38, 2.73s/it] {'loss': 0.3708, 'grad_norm': 8.246364139369895, 'learning_rate': 7.323493031308718e-07, 'epoch': 0.76} 76%|███████▌ | 9327/12313 [6:59:13<2:15:38, 2.73s/it] 76%|███████▌ | 9328/12313 [6:59:16<2:11:27, 2.64s/it] {'loss': 0.4251, 'grad_norm': 11.977800366186257, 'learning_rate': 7.318843246139673e-07, 'epoch': 0.76} 76%|███████▌ | 9328/12313 [6:59:16<2:11:27, 2.64s/it] 76%|███████▌ | 9329/12313 [6:59:19<2:19:24, 2.80s/it] {'loss': 0.5398, 'grad_norm': 4.480080892757191, 'learning_rate': 7.314194684413098e-07, 'epoch': 0.76} 76%|███████▌ | 9329/12313 [6:59:19<2:19:24, 2.80s/it] 76%|███████▌ | 9330/12313 [6:59:21<2:16:37, 2.75s/it] {'loss': 0.486, 'grad_norm': 5.761110269898293, 'learning_rate': 7.309547346450658e-07, 'epoch': 0.76} 76%|███████▌ | 9330/12313 [6:59:21<2:16:37, 2.75s/it] 76%|███████▌ | 9331/12313 [6:59:24<2:16:12, 2.74s/it] {'loss': 0.4607, 'grad_norm': 4.670162793343675, 'learning_rate': 7.304901232573908e-07, 'epoch': 0.76} 76%|███████▌ | 9331/12313 [6:59:24<2:16:12, 2.74s/it] 76%|███████▌ | 9332/12313 [6:59:27<2:11:10, 2.64s/it] {'loss': 0.4614, 'grad_norm': 7.987770600523061, 'learning_rate': 7.300256343104351e-07, 'epoch': 0.76} 76%|███████▌ | 9332/12313 [6:59:27<2:11:10, 2.64s/it] 76%|███████▌ | 9333/12313 [6:59:29<2:09:27, 2.61s/it] {'loss': 0.3639, 'grad_norm': 5.479915469263117, 'learning_rate': 7.295612678363382e-07, 'epoch': 0.76} 76%|███████▌ | 9333/12313 [6:59:29<2:09:27, 2.61s/it] 76%|███████▌ | 9334/12313 [6:59:32<2:09:53, 2.62s/it] {'loss': 0.3886, 'grad_norm': 25.580608798332342, 'learning_rate': 7.290970238672307e-07, 'epoch': 0.76} 76%|███████▌ | 9334/12313 [6:59:32<2:09:53, 2.62s/it] 76%|███████▌ | 9335/12313 [6:59:34<2:12:29, 2.67s/it] {'loss': 0.5083, 'grad_norm': 3.80358687651955, 'learning_rate': 7.286329024352376e-07, 'epoch': 0.76} 76%|███████▌ | 9335/12313 [6:59:34<2:12:29, 2.67s/it] 76%|███████▌ | 9336/12313 [6:59:37<2:09:54, 2.62s/it] {'loss': 0.4203, 'grad_norm': 3.6188064823034747, 'learning_rate': 7.281689035724718e-07, 'epoch': 0.76} 76%|███████▌ | 9336/12313 [6:59:37<2:09:54, 2.62s/it] 76%|███████▌ | 9337/12313 [6:59:40<2:10:01, 2.62s/it] {'loss': 0.428, 'grad_norm': 6.662801100557623, 'learning_rate': 7.277050273110408e-07, 'epoch': 0.76} 76%|███████▌ | 9337/12313 [6:59:40<2:10:01, 2.62s/it] 76%|███████▌ | 9338/12313 [6:59:42<2:07:23, 2.57s/it] {'loss': 0.5499, 'grad_norm': 4.198195373303361, 'learning_rate': 7.272412736830431e-07, 'epoch': 0.76} 76%|███████▌ | 9338/12313 [6:59:42<2:07:23, 2.57s/it] 76%|███████▌ | 9339/12313 [6:59:45<2:07:17, 2.57s/it] {'loss': 0.5308, 'grad_norm': 7.26119245955303, 'learning_rate': 7.26777642720567e-07, 'epoch': 0.76} 76%|███████▌ | 9339/12313 [6:59:45<2:07:17, 2.57s/it] 76%|███████▌ | 9340/12313 [6:59:47<2:07:53, 2.58s/it] {'loss': 0.4514, 'grad_norm': 6.6134602398961295, 'learning_rate': 7.263141344556924e-07, 'epoch': 0.76} 76%|███████▌ | 9340/12313 [6:59:47<2:07:53, 2.58s/it] 76%|███████▌ | 9341/12313 [6:59:50<2:07:31, 2.57s/it] {'loss': 0.3762, 'grad_norm': 3.9208951218117427, 'learning_rate': 7.258507489204935e-07, 'epoch': 0.76} 76%|███████▌ | 9341/12313 [6:59:50<2:07:31, 2.57s/it] 76%|███████▌ | 9342/12313 [6:59:53<2:11:03, 2.65s/it] {'loss': 0.5326, 'grad_norm': 7.76615893471226, 'learning_rate': 7.253874861470325e-07, 'epoch': 0.76} 76%|███████▌ | 9342/12313 [6:59:53<2:11:03, 2.65s/it] 76%|███████▌ | 9343/12313 [6:59:55<2:09:24, 2.61s/it] {'loss': 0.4045, 'grad_norm': 7.24244627028924, 'learning_rate': 7.24924346167366e-07, 'epoch': 0.76} 76%|███████▌ | 9343/12313 [6:59:55<2:09:24, 2.61s/it] 76%|███████▌ | 9344/12313 [6:59:58<2:08:51, 2.60s/it] {'loss': 0.4882, 'grad_norm': 3.373479639826732, 'learning_rate': 7.244613290135396e-07, 'epoch': 0.76} 76%|███████▌ | 9344/12313 [6:59:58<2:08:51, 2.60s/it] 76%|███████▌ | 9345/12313 [7:00:00<2:06:51, 2.56s/it] {'loss': 0.3442, 'grad_norm': 6.628941975131967, 'learning_rate': 7.239984347175932e-07, 'epoch': 0.76} 76%|███████▌ | 9345/12313 [7:00:00<2:06:51, 2.56s/it] 76%|███████▌ | 9346/12313 [7:00:03<2:04:17, 2.51s/it] {'loss': 0.5905, 'grad_norm': 3.784372823794077, 'learning_rate': 7.235356633115559e-07, 'epoch': 0.76} 76%|███████▌ | 9346/12313 [7:00:03<2:04:17, 2.51s/it] 76%|███████▌ | 9347/12313 [7:00:05<2:09:30, 2.62s/it] {'loss': 0.3702, 'grad_norm': 4.27156618640782, 'learning_rate': 7.230730148274478e-07, 'epoch': 0.76} 76%|███████▌ | 9347/12313 [7:00:05<2:09:30, 2.62s/it] 76%|███████▌ | 9348/12313 [7:00:08<2:06:53, 2.57s/it] {'loss': 0.3701, 'grad_norm': 3.9436102310547327, 'learning_rate': 7.226104892972838e-07, 'epoch': 0.76} 76%|███████▌ | 9348/12313 [7:00:08<2:06:53, 2.57s/it] 76%|███████▌ | 9349/12313 [7:00:11<2:07:07, 2.57s/it] {'loss': 0.4805, 'grad_norm': 3.9798514443021196, 'learning_rate': 7.221480867530664e-07, 'epoch': 0.76} 76%|███████▌ | 9349/12313 [7:00:11<2:07:07, 2.57s/it] 76%|███████▌ | 9350/12313 [7:00:13<2:06:26, 2.56s/it] {'loss': 0.5473, 'grad_norm': 3.87348251530302, 'learning_rate': 7.216858072267924e-07, 'epoch': 0.76} 76%|███████▌ | 9350/12313 [7:00:13<2:06:26, 2.56s/it] 76%|███████▌ | 9351/12313 [7:00:16<2:06:54, 2.57s/it] {'loss': 0.5054, 'grad_norm': 5.915769976523819, 'learning_rate': 7.212236507504494e-07, 'epoch': 0.76} 76%|███████▌ | 9351/12313 [7:00:16<2:06:54, 2.57s/it] 76%|███████▌ | 9352/12313 [7:00:18<2:07:26, 2.58s/it] {'loss': 0.4806, 'grad_norm': 3.8732533424148605, 'learning_rate': 7.207616173560158e-07, 'epoch': 0.76} 76%|███████▌ | 9352/12313 [7:00:18<2:07:26, 2.58s/it] 76%|███████▌ | 9353/12313 [7:00:21<2:09:43, 2.63s/it] {'loss': 0.46, 'grad_norm': 6.11681123799206, 'learning_rate': 7.202997070754613e-07, 'epoch': 0.76} 76%|███████▌ | 9353/12313 [7:00:21<2:09:43, 2.63s/it] 76%|███████▌ | 9354/12313 [7:00:24<2:12:13, 2.68s/it] {'loss': 0.5624, 'grad_norm': 8.365104395894681, 'learning_rate': 7.198379199407488e-07, 'epoch': 0.76} 76%|███████▌ | 9354/12313 [7:00:24<2:12:13, 2.68s/it] 76%|███████▌ | 9355/12313 [7:00:26<2:09:17, 2.62s/it] {'loss': 0.5142, 'grad_norm': 6.4866744161399685, 'learning_rate': 7.193762559838299e-07, 'epoch': 0.76} 76%|███████▌ | 9355/12313 [7:00:26<2:09:17, 2.62s/it] 76%|███████▌ | 9356/12313 [7:00:29<2:07:12, 2.58s/it] {'loss': 0.3683, 'grad_norm': 6.4484042472139675, 'learning_rate': 7.189147152366504e-07, 'epoch': 0.76} 76%|███████▌ | 9356/12313 [7:00:29<2:07:12, 2.58s/it] 76%|███████▌ | 9357/12313 [7:00:31<2:06:39, 2.57s/it] {'loss': 0.4517, 'grad_norm': 4.5783151146826935, 'learning_rate': 7.184532977311471e-07, 'epoch': 0.76} 76%|███████▌ | 9357/12313 [7:00:31<2:06:39, 2.57s/it] 76%|███████▌ | 9358/12313 [7:00:34<2:05:57, 2.56s/it] {'loss': 0.4783, 'grad_norm': 4.142951500808283, 'learning_rate': 7.179920034992469e-07, 'epoch': 0.76} 76%|███████▌ | 9358/12313 [7:00:34<2:05:57, 2.56s/it] 76%|███████▌ | 9359/12313 [7:00:37<2:07:48, 2.60s/it] {'loss': 0.6403, 'grad_norm': 3.327558136006068, 'learning_rate': 7.175308325728689e-07, 'epoch': 0.76} 76%|███████▌ | 9359/12313 [7:00:37<2:07:48, 2.60s/it] 76%|███████▌ | 9360/12313 [7:00:39<2:05:16, 2.55s/it] {'loss': 0.4863, 'grad_norm': 7.732587779509069, 'learning_rate': 7.170697849839229e-07, 'epoch': 0.76} 76%|███████▌ | 9360/12313 [7:00:39<2:05:16, 2.55s/it] 76%|███████▌ | 9361/12313 [7:00:42<2:07:50, 2.60s/it] {'loss': 0.4994, 'grad_norm': 3.4261727551662866, 'learning_rate': 7.166088607643123e-07, 'epoch': 0.76} 76%|███████▌ | 9361/12313 [7:00:42<2:07:50, 2.60s/it] 76%|███████▌ | 9362/12313 [7:00:45<2:13:53, 2.72s/it] {'loss': 0.3715, 'grad_norm': 7.420461816144346, 'learning_rate': 7.161480599459297e-07, 'epoch': 0.76} 76%|███████▌ | 9362/12313 [7:00:45<2:13:53, 2.72s/it] 76%|███████▌ | 9363/12313 [7:00:47<2:15:02, 2.75s/it] {'loss': 0.5635, 'grad_norm': 4.177142639059444, 'learning_rate': 7.156873825606603e-07, 'epoch': 0.76} 76%|███████▌ | 9363/12313 [7:00:47<2:15:02, 2.75s/it] 76%|███████▌ | 9364/12313 [7:00:50<2:13:39, 2.72s/it] {'loss': 0.4047, 'grad_norm': 6.465161714985024, 'learning_rate': 7.152268286403813e-07, 'epoch': 0.76} 76%|███████▌ | 9364/12313 [7:00:50<2:13:39, 2.72s/it] 76%|███████▌ | 9365/12313 [7:00:53<2:14:56, 2.75s/it] {'loss': 0.4028, 'grad_norm': 6.173359562536338, 'learning_rate': 7.147663982169601e-07, 'epoch': 0.76} 76%|███████▌ | 9365/12313 [7:00:53<2:14:56, 2.75s/it] 76%|███████▌ | 9366/12313 [7:00:56<2:16:47, 2.79s/it] {'loss': 0.6058, 'grad_norm': 3.9455176413170583, 'learning_rate': 7.143060913222552e-07, 'epoch': 0.76} 76%|███████▌ | 9366/12313 [7:00:56<2:16:47, 2.79s/it] 76%|███████▌ | 9367/12313 [7:00:59<2:16:48, 2.79s/it] {'loss': 0.706, 'grad_norm': 10.642099423017335, 'learning_rate': 7.138459079881188e-07, 'epoch': 0.76} 76%|███████▌ | 9367/12313 [7:00:59<2:16:48, 2.79s/it] 76%|███████▌ | 9368/12313 [7:01:02<2:19:46, 2.85s/it] {'loss': 0.6098, 'grad_norm': 4.19417453033392, 'learning_rate': 7.133858482463918e-07, 'epoch': 0.76} 76%|███████▌ | 9368/12313 [7:01:02<2:19:46, 2.85s/it] 76%|███████▌ | 9369/12313 [7:01:04<2:17:32, 2.80s/it] {'loss': 0.4453, 'grad_norm': 4.253649965044964, 'learning_rate': 7.129259121289086e-07, 'epoch': 0.76} 76%|███████▌ | 9369/12313 [7:01:04<2:17:32, 2.80s/it] 76%|███████▌ | 9370/12313 [7:01:07<2:12:39, 2.70s/it] {'loss': 0.5733, 'grad_norm': 3.668621601717135, 'learning_rate': 7.124660996674951e-07, 'epoch': 0.76} 76%|███████▌ | 9370/12313 [7:01:07<2:12:39, 2.70s/it] 76%|███████▌ | 9371/12313 [7:01:09<2:10:11, 2.66s/it] {'loss': 0.4285, 'grad_norm': 6.048356569124348, 'learning_rate': 7.12006410893967e-07, 'epoch': 0.76} 76%|███████▌ | 9371/12313 [7:01:09<2:10:11, 2.66s/it] 76%|███████▌ | 9372/12313 [7:01:12<2:16:43, 2.79s/it] {'loss': 0.6383, 'grad_norm': 4.317558436319294, 'learning_rate': 7.115468458401317e-07, 'epoch': 0.76} 76%|███████▌ | 9372/12313 [7:01:12<2:16:43, 2.79s/it] 76%|███████▌ | 9373/12313 [7:01:15<2:13:34, 2.73s/it] {'loss': 0.551, 'grad_norm': 7.296251737717159, 'learning_rate': 7.110874045377902e-07, 'epoch': 0.76} 76%|███████▌ | 9373/12313 [7:01:15<2:13:34, 2.73s/it] 76%|███████▌ | 9374/12313 [7:01:18<2:17:00, 2.80s/it] {'loss': 0.4665, 'grad_norm': 5.66808374273542, 'learning_rate': 7.106280870187326e-07, 'epoch': 0.76} 76%|███████▌ | 9374/12313 [7:01:18<2:17:00, 2.80s/it] 76%|███████▌ | 9375/12313 [7:01:21<2:20:59, 2.88s/it] {'loss': 0.5898, 'grad_norm': 3.897913983532991, 'learning_rate': 7.101688933147397e-07, 'epoch': 0.76} 76%|███████▌ | 9375/12313 [7:01:21<2:20:59, 2.88s/it] 76%|███████▌ | 9376/12313 [7:01:24<2:18:28, 2.83s/it] {'loss': 0.379, 'grad_norm': 5.913213010761257, 'learning_rate': 7.097098234575883e-07, 'epoch': 0.76} 76%|███████▌ | 9376/12313 [7:01:24<2:18:28, 2.83s/it] 76%|███████▌ | 9377/12313 [7:01:26<2:14:39, 2.75s/it] {'loss': 0.3769, 'grad_norm': 8.586921010693551, 'learning_rate': 7.092508774790424e-07, 'epoch': 0.76} 76%|███████▌ | 9377/12313 [7:01:26<2:14:39, 2.75s/it] 76%|███████▌ | 9378/12313 [7:01:29<2:14:51, 2.76s/it] {'loss': 0.5177, 'grad_norm': 6.837790290025689, 'learning_rate': 7.087920554108582e-07, 'epoch': 0.76} 76%|███████▌ | 9378/12313 [7:01:29<2:14:51, 2.76s/it] 76%|███████▌ | 9379/12313 [7:01:32<2:13:04, 2.72s/it] {'loss': 0.4528, 'grad_norm': 4.809392070744878, 'learning_rate': 7.083333572847831e-07, 'epoch': 0.76} 76%|███████▌ | 9379/12313 [7:01:32<2:13:04, 2.72s/it] 76%|███████▌ | 9380/12313 [7:01:34<2:11:47, 2.70s/it] {'loss': 0.5457, 'grad_norm': 3.467857419045606, 'learning_rate': 7.078747831325583e-07, 'epoch': 0.76} 76%|███████▌ | 9380/12313 [7:01:34<2:11:47, 2.70s/it] 76%|███████▌ | 9381/12313 [7:01:37<2:11:15, 2.69s/it] {'loss': 0.3799, 'grad_norm': 4.495338193171126, 'learning_rate': 7.074163329859129e-07, 'epoch': 0.76} 76%|███████▌ | 9381/12313 [7:01:37<2:11:15, 2.69s/it] 76%|███████▌ | 9382/12313 [7:01:41<2:23:43, 2.94s/it] {'loss': 0.3478, 'grad_norm': 4.00025930868917, 'learning_rate': 7.069580068765702e-07, 'epoch': 0.76} 76%|███████▌ | 9382/12313 [7:01:41<2:23:43, 2.94s/it] 76%|███████▌ | 9383/12313 [7:01:43<2:22:33, 2.92s/it] {'loss': 0.4943, 'grad_norm': 10.827446952884547, 'learning_rate': 7.064998048362448e-07, 'epoch': 0.76} 76%|███████▌ | 9383/12313 [7:01:43<2:22:33, 2.92s/it] 76%|███████▌ | 9384/12313 [7:01:46<2:20:51, 2.89s/it] {'loss': 0.4528, 'grad_norm': 5.064306842835891, 'learning_rate': 7.060417268966408e-07, 'epoch': 0.76} 76%|███████▌ | 9384/12313 [7:01:46<2:20:51, 2.89s/it] 76%|███████▌ | 9385/12313 [7:01:49<2:17:31, 2.82s/it] {'loss': 0.4532, 'grad_norm': 4.468211696665271, 'learning_rate': 7.055837730894541e-07, 'epoch': 0.76} 76%|███████▌ | 9385/12313 [7:01:49<2:17:31, 2.82s/it] 76%|███████▌ | 9386/12313 [7:01:51<2:12:33, 2.72s/it] {'loss': 0.347, 'grad_norm': 6.800829039851487, 'learning_rate': 7.051259434463745e-07, 'epoch': 0.76} 76%|███████▌ | 9386/12313 [7:01:51<2:12:33, 2.72s/it] 76%|███████▌ | 9387/12313 [7:01:54<2:09:34, 2.66s/it] {'loss': 0.4928, 'grad_norm': 6.208378537952931, 'learning_rate': 7.046682379990794e-07, 'epoch': 0.76} 76%|███████▌ | 9387/12313 [7:01:54<2:09:34, 2.66s/it] 76%|███████▌ | 9388/12313 [7:01:57<2:10:30, 2.68s/it] {'loss': 0.5339, 'grad_norm': 4.2620464174120425, 'learning_rate': 7.042106567792406e-07, 'epoch': 0.76} 76%|███████▌ | 9388/12313 [7:01:57<2:10:30, 2.68s/it] 76%|███████▋ | 9389/12313 [7:01:59<2:09:27, 2.66s/it] {'loss': 0.4415, 'grad_norm': 5.564383633638113, 'learning_rate': 7.03753199818521e-07, 'epoch': 0.76} 76%|███████▋ | 9389/12313 [7:01:59<2:09:27, 2.66s/it] 76%|███████▋ | 9390/12313 [7:02:02<2:09:13, 2.65s/it] {'loss': 0.4055, 'grad_norm': 6.375756161119263, 'learning_rate': 7.032958671485734e-07, 'epoch': 0.76} 76%|███████▋ | 9390/12313 [7:02:02<2:09:13, 2.65s/it] 76%|███████▋ | 9391/12313 [7:02:04<2:05:11, 2.57s/it] {'loss': 0.5106, 'grad_norm': 6.484850680260223, 'learning_rate': 7.028386588010421e-07, 'epoch': 0.76} 76%|███████▋ | 9391/12313 [7:02:04<2:05:11, 2.57s/it] 76%|███████▋ | 9392/12313 [7:02:07<2:09:20, 2.66s/it] {'loss': 0.324, 'grad_norm': 6.127381595989031, 'learning_rate': 7.023815748075651e-07, 'epoch': 0.76} 76%|███████▋ | 9392/12313 [7:02:07<2:09:20, 2.66s/it] 76%|███████▋ | 9393/12313 [7:02:10<2:09:38, 2.66s/it] {'loss': 0.4221, 'grad_norm': 4.069418782141938, 'learning_rate': 7.019246151997694e-07, 'epoch': 0.76} 76%|███████▋ | 9393/12313 [7:02:10<2:09:38, 2.66s/it] 76%|███████▋ | 9394/12313 [7:02:12<2:10:13, 2.68s/it] {'loss': 0.5576, 'grad_norm': 3.966681143690813, 'learning_rate': 7.014677800092734e-07, 'epoch': 0.76} 76%|███████▋ | 9394/12313 [7:02:12<2:10:13, 2.68s/it] 76%|███████▋ | 9395/12313 [7:02:15<2:10:26, 2.68s/it] {'loss': 0.4702, 'grad_norm': 6.497751818986687, 'learning_rate': 7.010110692676886e-07, 'epoch': 0.76} 76%|███████▋ | 9395/12313 [7:02:15<2:10:26, 2.68s/it] 76%|███████▋ | 9396/12313 [7:02:18<2:13:06, 2.74s/it] {'loss': 0.3902, 'grad_norm': 6.139900453454705, 'learning_rate': 7.005544830066172e-07, 'epoch': 0.76} 76%|███████▋ | 9396/12313 [7:02:18<2:13:06, 2.74s/it] 76%|███████▋ | 9397/12313 [7:02:21<2:09:09, 2.66s/it] {'loss': 0.4043, 'grad_norm': 5.005470766044911, 'learning_rate': 7.000980212576522e-07, 'epoch': 0.76} 76%|███████▋ | 9397/12313 [7:02:21<2:09:09, 2.66s/it] 76%|███████▋ | 9398/12313 [7:02:23<2:08:43, 2.65s/it] {'loss': 0.5992, 'grad_norm': 13.852454716609795, 'learning_rate': 6.996416840523776e-07, 'epoch': 0.76} 76%|███████▋ | 9398/12313 [7:02:23<2:08:43, 2.65s/it] 76%|███████▋ | 9399/12313 [7:02:26<2:12:58, 2.74s/it] {'loss': 0.4657, 'grad_norm': 4.487243506708304, 'learning_rate': 6.991854714223711e-07, 'epoch': 0.76} 76%|███████▋ | 9399/12313 [7:02:26<2:12:58, 2.74s/it] 76%|███████▋ | 9400/12313 [7:02:29<2:10:51, 2.70s/it] {'loss': 0.6265, 'grad_norm': 5.202920899583729, 'learning_rate': 6.987293833991984e-07, 'epoch': 0.76} 76%|███████▋ | 9400/12313 [7:02:29<2:10:51, 2.70s/it] 76%|███████▋ | 9401/12313 [7:02:31<2:11:47, 2.72s/it] {'loss': 0.4925, 'grad_norm': 17.066516911339072, 'learning_rate': 6.982734200144192e-07, 'epoch': 0.76} 76%|███████▋ | 9401/12313 [7:02:31<2:11:47, 2.72s/it] 76%|███████▋ | 9402/12313 [7:02:36<2:33:14, 3.16s/it] {'loss': 0.4923, 'grad_norm': 3.6601744793382305, 'learning_rate': 6.978175812995847e-07, 'epoch': 0.76} 76%|███████▋ | 9402/12313 [7:02:36<2:33:14, 3.16s/it] 76%|███████▋ | 9403/12313 [7:02:38<2:23:56, 2.97s/it] {'loss': 0.3402, 'grad_norm': 6.162261867859467, 'learning_rate': 6.973618672862357e-07, 'epoch': 0.76} 76%|███████▋ | 9403/12313 [7:02:38<2:23:56, 2.97s/it] 76%|███████▋ | 9404/12313 [7:02:41<2:21:05, 2.91s/it] {'loss': 0.4892, 'grad_norm': 3.383805533605749, 'learning_rate': 6.969062780059041e-07, 'epoch': 0.76} 76%|███████▋ | 9404/12313 [7:02:41<2:21:05, 2.91s/it] 76%|███████▋ | 9405/12313 [7:02:44<2:20:32, 2.90s/it] {'loss': 0.4553, 'grad_norm': 5.034659251576166, 'learning_rate': 6.964508134901162e-07, 'epoch': 0.76} 76%|███████▋ | 9405/12313 [7:02:44<2:20:32, 2.90s/it] 76%|███████▋ | 9406/12313 [7:02:46<2:16:06, 2.81s/it] {'loss': 0.3549, 'grad_norm': 4.219393214106526, 'learning_rate': 6.959954737703872e-07, 'epoch': 0.76} 76%|███████▋ | 9406/12313 [7:02:46<2:16:06, 2.81s/it] 76%|███████▋ | 9407/12313 [7:02:49<2:14:13, 2.77s/it] {'loss': 0.4162, 'grad_norm': 7.125686847574229, 'learning_rate': 6.955402588782229e-07, 'epoch': 0.76} 76%|███████▋ | 9407/12313 [7:02:49<2:14:13, 2.77s/it] 76%|███████▋ | 9408/12313 [7:02:52<2:17:18, 2.84s/it] {'loss': 0.4606, 'grad_norm': 4.00897406634947, 'learning_rate': 6.950851688451224e-07, 'epoch': 0.76} 76%|███████▋ | 9408/12313 [7:02:52<2:17:18, 2.84s/it] 76%|███████▋ | 9409/12313 [7:02:55<2:15:19, 2.80s/it] {'loss': 0.6832, 'grad_norm': 5.368532207744499, 'learning_rate': 6.94630203702577e-07, 'epoch': 0.76} 76%|███████▋ | 9409/12313 [7:02:55<2:15:19, 2.80s/it] 76%|███████▋ | 9410/12313 [7:02:57<2:09:13, 2.67s/it] {'loss': 0.4458, 'grad_norm': 4.61391210253197, 'learning_rate': 6.941753634820658e-07, 'epoch': 0.76} 76%|███████▋ | 9410/12313 [7:02:57<2:09:13, 2.67s/it] 76%|███████▋ | 9411/12313 [7:03:00<2:14:36, 2.78s/it] {'loss': 0.3812, 'grad_norm': 10.571864997786388, 'learning_rate': 6.93720648215063e-07, 'epoch': 0.76} 76%|███████▋ | 9411/12313 [7:03:00<2:14:36, 2.78s/it] 76%|███████▋ | 9412/12313 [7:03:03<2:17:09, 2.84s/it] {'loss': 0.3526, 'grad_norm': 6.071699040314471, 'learning_rate': 6.932660579330317e-07, 'epoch': 0.76} 76%|███████▋ | 9412/12313 [7:03:03<2:17:09, 2.84s/it] 76%|███████▋ | 9413/12313 [7:03:06<2:13:03, 2.75s/it] {'loss': 0.5468, 'grad_norm': 5.121163071457247, 'learning_rate': 6.928115926674265e-07, 'epoch': 0.76} 76%|███████▋ | 9413/12313 [7:03:06<2:13:03, 2.75s/it] 76%|███████▋ | 9414/12313 [7:03:09<2:13:38, 2.77s/it] {'loss': 0.4688, 'grad_norm': 4.5789958968091025, 'learning_rate': 6.923572524496946e-07, 'epoch': 0.76} 76%|███████▋ | 9414/12313 [7:03:09<2:13:38, 2.77s/it] 76%|███████▋ | 9415/12313 [7:03:11<2:13:49, 2.77s/it] {'loss': 0.538, 'grad_norm': 4.61496141221473, 'learning_rate': 6.919030373112748e-07, 'epoch': 0.76} 76%|███████▋ | 9415/12313 [7:03:11<2:13:49, 2.77s/it] 76%|███████▋ | 9416/12313 [7:03:14<2:11:04, 2.71s/it] {'loss': 0.3921, 'grad_norm': 5.530010793903644, 'learning_rate': 6.914489472835959e-07, 'epoch': 0.76} 76%|███████▋ | 9416/12313 [7:03:14<2:11:04, 2.71s/it] 76%|███████▋ | 9417/12313 [7:03:16<2:07:27, 2.64s/it] {'loss': 0.4411, 'grad_norm': 4.053580386916616, 'learning_rate': 6.909949823980772e-07, 'epoch': 0.76} 76%|███████▋ | 9417/12313 [7:03:16<2:07:27, 2.64s/it] 76%|███████▋ | 9418/12313 [7:03:19<2:07:50, 2.65s/it] {'loss': 0.4674, 'grad_norm': 5.361420424547812, 'learning_rate': 6.905411426861322e-07, 'epoch': 0.76} 76%|███████▋ | 9418/12313 [7:03:19<2:07:50, 2.65s/it] 76%|███████▋ | 9419/12313 [7:03:21<2:04:32, 2.58s/it] {'loss': 0.4666, 'grad_norm': 6.357458261429573, 'learning_rate': 6.900874281791639e-07, 'epoch': 0.76} 76%|███████▋ | 9419/12313 [7:03:21<2:04:32, 2.58s/it] 77%|███████▋ | 9420/12313 [7:03:24<2:09:18, 2.68s/it] {'loss': 0.4103, 'grad_norm': 6.195545553218025, 'learning_rate': 6.89633838908566e-07, 'epoch': 0.77} 77%|███████▋ | 9420/12313 [7:03:24<2:09:18, 2.68s/it] 77%|███████▋ | 9421/12313 [7:03:27<2:11:16, 2.72s/it] {'loss': 0.415, 'grad_norm': 5.87439782429603, 'learning_rate': 6.891803749057255e-07, 'epoch': 0.77} 77%|███████▋ | 9421/12313 [7:03:27<2:11:16, 2.72s/it] 77%|███████▋ | 9422/12313 [7:03:30<2:12:10, 2.74s/it] {'loss': 0.4371, 'grad_norm': 3.5630192542232497, 'learning_rate': 6.887270362020199e-07, 'epoch': 0.77} 77%|███████▋ | 9422/12313 [7:03:30<2:12:10, 2.74s/it] 77%|███████▋ | 9423/12313 [7:03:33<2:13:48, 2.78s/it] {'loss': 0.3986, 'grad_norm': 6.836068046454834, 'learning_rate': 6.882738228288166e-07, 'epoch': 0.77} 77%|███████▋ | 9423/12313 [7:03:33<2:13:48, 2.78s/it] 77%|███████▋ | 9424/12313 [7:03:35<2:11:11, 2.72s/it] {'loss': 0.4896, 'grad_norm': 6.446997290383441, 'learning_rate': 6.87820734817477e-07, 'epoch': 0.77} 77%|███████▋ | 9424/12313 [7:03:35<2:11:11, 2.72s/it] 77%|███████▋ | 9425/12313 [7:03:38<2:11:47, 2.74s/it] {'loss': 0.514, 'grad_norm': 4.358114968607592, 'learning_rate': 6.873677721993518e-07, 'epoch': 0.77} 77%|███████▋ | 9425/12313 [7:03:38<2:11:47, 2.74s/it] 77%|███████▋ | 9426/12313 [7:03:41<2:11:40, 2.74s/it] {'loss': 0.5265, 'grad_norm': 8.732219383382438, 'learning_rate': 6.86914935005783e-07, 'epoch': 0.77} 77%|███████▋ | 9426/12313 [7:03:41<2:11:40, 2.74s/it] 77%|███████▋ | 9427/12313 [7:03:44<2:10:36, 2.72s/it] {'loss': 0.4076, 'grad_norm': 4.310805179732659, 'learning_rate': 6.864622232681048e-07, 'epoch': 0.77} 77%|███████▋ | 9427/12313 [7:03:44<2:10:36, 2.72s/it] 77%|███████▋ | 9428/12313 [7:03:46<2:09:12, 2.69s/it] {'loss': 0.5051, 'grad_norm': 4.045216285940855, 'learning_rate': 6.860096370176436e-07, 'epoch': 0.77} 77%|███████▋ | 9428/12313 [7:03:46<2:09:12, 2.69s/it] 77%|███████▋ | 9429/12313 [7:03:49<2:07:33, 2.65s/it] {'loss': 0.5137, 'grad_norm': 7.3004627003203435, 'learning_rate': 6.855571762857144e-07, 'epoch': 0.77} 77%|███████▋ | 9429/12313 [7:03:49<2:07:33, 2.65s/it] 77%|███████▋ | 9430/12313 [7:03:52<2:08:50, 2.68s/it] {'loss': 0.4596, 'grad_norm': 4.355642197018315, 'learning_rate': 6.851048411036265e-07, 'epoch': 0.77} 77%|███████▋ | 9430/12313 [7:03:52<2:08:50, 2.68s/it] 77%|███████▋ | 9431/12313 [7:03:54<2:09:43, 2.70s/it] {'loss': 0.5929, 'grad_norm': 4.107096289312408, 'learning_rate': 6.846526315026783e-07, 'epoch': 0.77} 77%|███████▋ | 9431/12313 [7:03:54<2:09:43, 2.70s/it] 77%|███████▋ | 9432/12313 [7:03:57<2:12:36, 2.76s/it] {'loss': 0.5062, 'grad_norm': 4.55847735238035, 'learning_rate': 6.842005475141606e-07, 'epoch': 0.77} 77%|███████▋ | 9432/12313 [7:03:57<2:12:36, 2.76s/it] 77%|███████▋ | 9433/12313 [7:04:00<2:10:07, 2.71s/it] {'loss': 0.5043, 'grad_norm': 4.084279279942593, 'learning_rate': 6.837485891693541e-07, 'epoch': 0.77} 77%|███████▋ | 9433/12313 [7:04:00<2:10:07, 2.71s/it] 77%|███████▋ | 9434/12313 [7:04:02<2:08:18, 2.67s/it] {'loss': 0.4094, 'grad_norm': 13.22441226336881, 'learning_rate': 6.83296756499533e-07, 'epoch': 0.77} 77%|███████▋ | 9434/12313 [7:04:02<2:08:18, 2.67s/it] 77%|███████▋ | 9435/12313 [7:04:05<2:08:30, 2.68s/it] {'loss': 0.4747, 'grad_norm': 4.334505119567936, 'learning_rate': 6.828450495359623e-07, 'epoch': 0.77} 77%|███████▋ | 9435/12313 [7:04:05<2:08:30, 2.68s/it] 77%|███████▋ | 9436/12313 [7:04:08<2:07:41, 2.66s/it] {'loss': 0.6257, 'grad_norm': 5.930691768024282, 'learning_rate': 6.823934683098963e-07, 'epoch': 0.77} 77%|███████▋ | 9436/12313 [7:04:08<2:07:41, 2.66s/it] 77%|███████▋ | 9437/12313 [7:04:10<2:08:49, 2.69s/it] {'loss': 0.3181, 'grad_norm': 3.3704120793280197, 'learning_rate': 6.819420128525834e-07, 'epoch': 0.77} 77%|███████▋ | 9437/12313 [7:04:10<2:08:49, 2.69s/it] 77%|███████▋ | 9438/12313 [7:04:13<2:09:05, 2.69s/it] {'loss': 0.4773, 'grad_norm': 3.842942498154423, 'learning_rate': 6.814906831952611e-07, 'epoch': 0.77} 77%|███████▋ | 9438/12313 [7:04:13<2:09:05, 2.69s/it] 77%|███████▋ | 9439/12313 [7:04:16<2:07:13, 2.66s/it] {'loss': 0.3918, 'grad_norm': 8.895568429556429, 'learning_rate': 6.810394793691585e-07, 'epoch': 0.77} 77%|███████▋ | 9439/12313 [7:04:16<2:07:13, 2.66s/it] 77%|███████▋ | 9440/12313 [7:04:19<2:12:03, 2.76s/it] {'loss': 0.4543, 'grad_norm': 6.8916662279772565, 'learning_rate': 6.805884014054975e-07, 'epoch': 0.77} 77%|███████▋ | 9440/12313 [7:04:19<2:12:03, 2.76s/it] 77%|███████▋ | 9441/12313 [7:04:21<2:09:37, 2.71s/it] {'loss': 0.555, 'grad_norm': 4.932759072758281, 'learning_rate': 6.801374493354907e-07, 'epoch': 0.77} 77%|███████▋ | 9441/12313 [7:04:21<2:09:37, 2.71s/it] 77%|███████▋ | 9442/12313 [7:04:24<2:13:52, 2.80s/it] {'loss': 0.5693, 'grad_norm': 3.165718589290986, 'learning_rate': 6.796866231903402e-07, 'epoch': 0.77} 77%|███████▋ | 9442/12313 [7:04:24<2:13:52, 2.80s/it] 77%|███████▋ | 9443/12313 [7:04:27<2:12:02, 2.76s/it] {'loss': 0.3955, 'grad_norm': 5.661881562952782, 'learning_rate': 6.792359230012418e-07, 'epoch': 0.77} 77%|███████▋ | 9443/12313 [7:04:27<2:12:02, 2.76s/it] 77%|███████▋ | 9444/12313 [7:04:30<2:12:30, 2.77s/it] {'loss': 0.2898, 'grad_norm': 5.857023328763498, 'learning_rate': 6.787853487993817e-07, 'epoch': 0.77} 77%|███████▋ | 9444/12313 [7:04:30<2:12:30, 2.77s/it] 77%|███████▋ | 9445/12313 [7:04:32<2:11:27, 2.75s/it] {'loss': 0.326, 'grad_norm': 6.46410699371152, 'learning_rate': 6.783349006159359e-07, 'epoch': 0.77} 77%|███████▋ | 9445/12313 [7:04:32<2:11:27, 2.75s/it] 77%|███████▋ | 9446/12313 [7:04:35<2:10:13, 2.73s/it] {'loss': 0.4844, 'grad_norm': 6.9335177221214614, 'learning_rate': 6.778845784820739e-07, 'epoch': 0.77} 77%|███████▋ | 9446/12313 [7:04:35<2:10:13, 2.73s/it] 77%|███████▋ | 9447/12313 [7:04:38<2:11:16, 2.75s/it] {'loss': 0.6114, 'grad_norm': 5.439529483084923, 'learning_rate': 6.774343824289567e-07, 'epoch': 0.77} 77%|███████▋ | 9447/12313 [7:04:38<2:11:16, 2.75s/it] 77%|███████▋ | 9448/12313 [7:04:41<2:08:35, 2.69s/it] {'loss': 0.5204, 'grad_norm': 5.645470096440396, 'learning_rate': 6.769843124877343e-07, 'epoch': 0.77} 77%|███████▋ | 9448/12313 [7:04:41<2:08:35, 2.69s/it] 77%|███████▋ | 9449/12313 [7:04:43<2:08:17, 2.69s/it] {'loss': 0.4941, 'grad_norm': 24.655262909769625, 'learning_rate': 6.765343686895484e-07, 'epoch': 0.77} 77%|███████▋ | 9449/12313 [7:04:43<2:08:17, 2.69s/it] 77%|███████▋ | 9450/12313 [7:04:46<2:07:22, 2.67s/it] {'loss': 0.3861, 'grad_norm': 6.241765051887523, 'learning_rate': 6.760845510655345e-07, 'epoch': 0.77} 77%|███████▋ | 9450/12313 [7:04:46<2:07:22, 2.67s/it] 77%|███████▋ | 9451/12313 [7:04:48<2:05:59, 2.64s/it] {'loss': 0.5332, 'grad_norm': 8.035908170373448, 'learning_rate': 6.756348596468168e-07, 'epoch': 0.77} 77%|███████▋ | 9451/12313 [7:04:48<2:05:59, 2.64s/it] 77%|███████▋ | 9452/12313 [7:04:51<2:05:37, 2.63s/it] {'loss': 0.4359, 'grad_norm': 4.6392581562003405, 'learning_rate': 6.751852944645107e-07, 'epoch': 0.77} 77%|███████▋ | 9452/12313 [7:04:51<2:05:37, 2.63s/it] 77%|███████▋ | 9453/12313 [7:04:54<2:06:33, 2.65s/it] {'loss': 0.4006, 'grad_norm': 6.355510758018541, 'learning_rate': 6.747358555497244e-07, 'epoch': 0.77} 77%|███████▋ | 9453/12313 [7:04:54<2:06:33, 2.65s/it] 77%|███████▋ | 9454/12313 [7:04:56<2:05:22, 2.63s/it] {'loss': 0.6034, 'grad_norm': 5.191185254070647, 'learning_rate': 6.742865429335576e-07, 'epoch': 0.77} 77%|███████▋ | 9454/12313 [7:04:56<2:05:22, 2.63s/it] 77%|███████▋ | 9455/12313 [7:04:59<2:07:37, 2.68s/it] {'loss': 0.3565, 'grad_norm': 4.336515041136279, 'learning_rate': 6.738373566470991e-07, 'epoch': 0.77} 77%|███████▋ | 9455/12313 [7:04:59<2:07:37, 2.68s/it] 77%|███████▋ | 9456/12313 [7:05:02<2:07:23, 2.68s/it] {'loss': 0.4001, 'grad_norm': 7.028294785077958, 'learning_rate': 6.733882967214312e-07, 'epoch': 0.77} 77%|███████▋ | 9456/12313 [7:05:02<2:07:23, 2.68s/it] 77%|███████▋ | 9457/12313 [7:05:05<2:09:49, 2.73s/it] {'loss': 0.4984, 'grad_norm': 3.345393934309023, 'learning_rate': 6.729393631876257e-07, 'epoch': 0.77} 77%|███████▋ | 9457/12313 [7:05:05<2:09:49, 2.73s/it] 77%|███████▋ | 9458/12313 [7:05:07<2:09:26, 2.72s/it] {'loss': 0.3027, 'grad_norm': 4.618093066401069, 'learning_rate': 6.724905560767464e-07, 'epoch': 0.77} 77%|███████▋ | 9458/12313 [7:05:07<2:09:26, 2.72s/it] 77%|███████▋ | 9459/12313 [7:05:10<2:07:46, 2.69s/it] {'loss': 0.3492, 'grad_norm': 5.511844154772235, 'learning_rate': 6.720418754198485e-07, 'epoch': 0.77} 77%|███████▋ | 9459/12313 [7:05:10<2:07:46, 2.69s/it] 77%|███████▋ | 9460/12313 [7:05:12<2:03:52, 2.61s/it] {'loss': 0.3126, 'grad_norm': 4.512587186638037, 'learning_rate': 6.715933212479791e-07, 'epoch': 0.77} 77%|███████▋ | 9460/12313 [7:05:12<2:03:52, 2.61s/it] 77%|███████▋ | 9461/12313 [7:05:15<2:04:37, 2.62s/it] {'loss': 0.4618, 'grad_norm': 3.8768216178706996, 'learning_rate': 6.711448935921744e-07, 'epoch': 0.77} 77%|███████▋ | 9461/12313 [7:05:15<2:04:37, 2.62s/it] 77%|███████▋ | 9462/12313 [7:05:18<2:04:54, 2.63s/it] {'loss': 0.3315, 'grad_norm': 4.849694261108834, 'learning_rate': 6.706965924834649e-07, 'epoch': 0.77} 77%|███████▋ | 9462/12313 [7:05:18<2:04:54, 2.63s/it] 77%|███████▋ | 9463/12313 [7:05:20<2:05:46, 2.65s/it] {'loss': 0.5456, 'grad_norm': 4.801051258082802, 'learning_rate': 6.702484179528699e-07, 'epoch': 0.77} 77%|███████▋ | 9463/12313 [7:05:20<2:05:46, 2.65s/it] 77%|███████▋ | 9464/12313 [7:05:23<2:04:27, 2.62s/it] {'loss': 0.427, 'grad_norm': 9.352209783795086, 'learning_rate': 6.698003700313993e-07, 'epoch': 0.77} 77%|███████▋ | 9464/12313 [7:05:23<2:04:27, 2.62s/it] 77%|███████▋ | 9465/12313 [7:05:25<2:01:16, 2.55s/it] {'loss': 0.7298, 'grad_norm': 7.643828934266029, 'learning_rate': 6.69352448750058e-07, 'epoch': 0.77} 77%|███████▋ | 9465/12313 [7:05:25<2:01:16, 2.55s/it] 77%|███████▋ | 9466/12313 [7:05:28<2:03:41, 2.61s/it] {'loss': 0.5492, 'grad_norm': 5.286073745430146, 'learning_rate': 6.689046541398378e-07, 'epoch': 0.77} 77%|███████▋ | 9466/12313 [7:05:28<2:03:41, 2.61s/it] 77%|███████▋ | 9467/12313 [7:05:31<2:04:17, 2.62s/it] {'loss': 0.4082, 'grad_norm': 7.104403560909407, 'learning_rate': 6.684569862317255e-07, 'epoch': 0.77} 77%|███████▋ | 9467/12313 [7:05:31<2:04:17, 2.62s/it] 77%|███████▋ | 9468/12313 [7:05:33<2:05:27, 2.65s/it] {'loss': 0.4536, 'grad_norm': 21.80499840324673, 'learning_rate': 6.680094450566957e-07, 'epoch': 0.77} 77%|███████▋ | 9468/12313 [7:05:33<2:05:27, 2.65s/it] 77%|███████▋ | 9469/12313 [7:05:36<2:04:32, 2.63s/it] {'loss': 0.4041, 'grad_norm': 9.81351894485181, 'learning_rate': 6.675620306457172e-07, 'epoch': 0.77} 77%|███████▋ | 9469/12313 [7:05:36<2:04:32, 2.63s/it] 77%|███████▋ | 9470/12313 [7:05:39<2:05:12, 2.64s/it] {'loss': 0.4332, 'grad_norm': 5.220882981566686, 'learning_rate': 6.671147430297481e-07, 'epoch': 0.77} 77%|███████▋ | 9470/12313 [7:05:39<2:05:12, 2.64s/it] 77%|███████▋ | 9471/12313 [7:05:41<2:01:21, 2.56s/it] {'loss': 0.4061, 'grad_norm': 5.429857205093562, 'learning_rate': 6.666675822397378e-07, 'epoch': 0.77} 77%|███████▋ | 9471/12313 [7:05:41<2:01:21, 2.56s/it] 77%|███████▋ | 9472/12313 [7:05:44<2:03:21, 2.61s/it] {'loss': 0.326, 'grad_norm': 8.052028153479819, 'learning_rate': 6.662205483066281e-07, 'epoch': 0.77} 77%|███████▋ | 9472/12313 [7:05:44<2:03:21, 2.61s/it] 77%|███████▋ | 9473/12313 [7:05:46<2:02:23, 2.59s/it] {'loss': 0.469, 'grad_norm': 4.655903640053554, 'learning_rate': 6.65773641261352e-07, 'epoch': 0.77} 77%|███████▋ | 9473/12313 [7:05:46<2:02:23, 2.59s/it] 77%|███████▋ | 9474/12313 [7:05:49<2:02:52, 2.60s/it] {'loss': 0.3736, 'grad_norm': 6.279574614073089, 'learning_rate': 6.653268611348315e-07, 'epoch': 0.77} 77%|███████▋ | 9474/12313 [7:05:49<2:02:52, 2.60s/it] 77%|███████▋ | 9475/12313 [7:05:51<2:03:02, 2.60s/it] {'loss': 0.4281, 'grad_norm': 4.349805162366064, 'learning_rate': 6.64880207957983e-07, 'epoch': 0.77} 77%|███████▋ | 9475/12313 [7:05:51<2:03:02, 2.60s/it] 77%|███████▋ | 9476/12313 [7:05:54<2:04:16, 2.63s/it] {'loss': 0.4795, 'grad_norm': 4.485659687329684, 'learning_rate': 6.644336817617122e-07, 'epoch': 0.77} 77%|███████▋ | 9476/12313 [7:05:54<2:04:16, 2.63s/it] 77%|███████▋ | 9477/12313 [7:05:57<2:12:15, 2.80s/it] {'loss': 0.4077, 'grad_norm': 5.1817758651078885, 'learning_rate': 6.63987282576915e-07, 'epoch': 0.77} 77%|███████▋ | 9477/12313 [7:05:57<2:12:15, 2.80s/it] 77%|███████▋ | 9478/12313 [7:06:00<2:10:57, 2.77s/it] {'loss': 0.5877, 'grad_norm': 3.6533998487921298, 'learning_rate': 6.635410104344819e-07, 'epoch': 0.77} 77%|███████▋ | 9478/12313 [7:06:00<2:10:57, 2.77s/it] 77%|███████▋ | 9479/12313 [7:06:03<2:10:30, 2.76s/it] {'loss': 0.4759, 'grad_norm': 5.823071091152919, 'learning_rate': 6.630948653652905e-07, 'epoch': 0.77} 77%|███████▋ | 9479/12313 [7:06:03<2:10:30, 2.76s/it] 77%|███████▋ | 9480/12313 [7:06:06<2:13:49, 2.83s/it] {'loss': 0.4428, 'grad_norm': 6.866234496364005, 'learning_rate': 6.62648847400213e-07, 'epoch': 0.77} 77%|███████▋ | 9480/12313 [7:06:06<2:13:49, 2.83s/it] 77%|███████▋ | 9481/12313 [7:06:08<2:10:22, 2.76s/it] {'loss': 0.4417, 'grad_norm': 5.054351987364898, 'learning_rate': 6.622029565701118e-07, 'epoch': 0.77} 77%|███████▋ | 9481/12313 [7:06:08<2:10:22, 2.76s/it] 77%|███████▋ | 9482/12313 [7:06:11<2:07:18, 2.70s/it] {'loss': 0.4682, 'grad_norm': 4.79929266732889, 'learning_rate': 6.617571929058397e-07, 'epoch': 0.77} 77%|███████▋ | 9482/12313 [7:06:11<2:07:18, 2.70s/it] 77%|███████▋ | 9483/12313 [7:06:14<2:09:43, 2.75s/it] {'loss': 0.402, 'grad_norm': 4.783027204950755, 'learning_rate': 6.613115564382403e-07, 'epoch': 0.77} 77%|███████▋ | 9483/12313 [7:06:14<2:09:43, 2.75s/it] 77%|███████▋ | 9484/12313 [7:06:16<2:07:57, 2.71s/it] {'loss': 0.4791, 'grad_norm': 11.930675421175312, 'learning_rate': 6.608660471981509e-07, 'epoch': 0.77} 77%|███████▋ | 9484/12313 [7:06:16<2:07:57, 2.71s/it] 77%|███████▋ | 9485/12313 [7:06:19<2:06:36, 2.69s/it] {'loss': 0.4235, 'grad_norm': 6.054291067646692, 'learning_rate': 6.604206652163967e-07, 'epoch': 0.77} 77%|███████▋ | 9485/12313 [7:06:19<2:06:36, 2.69s/it] 77%|███████▋ | 9486/12313 [7:06:22<2:07:14, 2.70s/it] {'loss': 0.4832, 'grad_norm': 7.228203275275898, 'learning_rate': 6.599754105237974e-07, 'epoch': 0.77} 77%|███████▋ | 9486/12313 [7:06:22<2:07:14, 2.70s/it] 77%|███████▋ | 9487/12313 [7:06:25<2:11:00, 2.78s/it] {'loss': 0.537, 'grad_norm': 3.7391869318420285, 'learning_rate': 6.595302831511607e-07, 'epoch': 0.77} 77%|███████▋ | 9487/12313 [7:06:25<2:11:00, 2.78s/it] 77%|███████▋ | 9488/12313 [7:06:28<2:17:30, 2.92s/it] {'loss': 0.6264, 'grad_norm': 5.637019086673706, 'learning_rate': 6.590852831292885e-07, 'epoch': 0.77} 77%|███████▋ | 9488/12313 [7:06:28<2:17:30, 2.92s/it] 77%|███████▋ | 9489/12313 [7:06:31<2:16:11, 2.89s/it] {'loss': 0.477, 'grad_norm': 7.91711166378353, 'learning_rate': 6.586404104889721e-07, 'epoch': 0.77} 77%|███████▋ | 9489/12313 [7:06:31<2:16:11, 2.89s/it] 77%|███████▋ | 9490/12313 [7:06:34<2:13:54, 2.85s/it] {'loss': 0.433, 'grad_norm': 5.980422311755343, 'learning_rate': 6.58195665260993e-07, 'epoch': 0.77} 77%|███████▋ | 9490/12313 [7:06:34<2:13:54, 2.85s/it] 77%|███████▋ | 9491/12313 [7:06:36<2:10:37, 2.78s/it] {'loss': 0.4323, 'grad_norm': 10.886812979952378, 'learning_rate': 6.577510474761272e-07, 'epoch': 0.77} 77%|███████▋ | 9491/12313 [7:06:36<2:10:37, 2.78s/it] 77%|███████▋ | 9492/12313 [7:06:39<2:09:18, 2.75s/it] {'loss': 0.3652, 'grad_norm': 5.242170156688564, 'learning_rate': 6.573065571651383e-07, 'epoch': 0.77} 77%|███████▋ | 9492/12313 [7:06:39<2:09:18, 2.75s/it] 77%|███████▋ | 9493/12313 [7:06:41<2:04:25, 2.65s/it] {'loss': 0.4055, 'grad_norm': 5.032479849055025, 'learning_rate': 6.56862194358783e-07, 'epoch': 0.77} 77%|███████▋ | 9493/12313 [7:06:41<2:04:25, 2.65s/it] 77%|███████▋ | 9494/12313 [7:06:44<2:06:15, 2.69s/it] {'loss': 0.5108, 'grad_norm': 9.525750208818708, 'learning_rate': 6.5641795908781e-07, 'epoch': 0.77} 77%|███████▋ | 9494/12313 [7:06:44<2:06:15, 2.69s/it] 77%|███████▋ | 9495/12313 [7:06:47<2:04:28, 2.65s/it] {'loss': 0.679, 'grad_norm': 4.492371142698525, 'learning_rate': 6.559738513829572e-07, 'epoch': 0.77} 77%|███████▋ | 9495/12313 [7:06:47<2:04:28, 2.65s/it] 77%|███████▋ | 9496/12313 [7:06:49<2:05:14, 2.67s/it] {'loss': 0.5479, 'grad_norm': 5.27190852592138, 'learning_rate': 6.555298712749538e-07, 'epoch': 0.77} 77%|███████▋ | 9496/12313 [7:06:49<2:05:14, 2.67s/it] 77%|███████▋ | 9497/12313 [7:06:52<2:09:18, 2.76s/it] {'loss': 0.4585, 'grad_norm': 5.955473036514189, 'learning_rate': 6.550860187945227e-07, 'epoch': 0.77} 77%|███████▋ | 9497/12313 [7:06:52<2:09:18, 2.76s/it] 77%|███████▋ | 9498/12313 [7:06:55<2:07:55, 2.73s/it] {'loss': 0.4537, 'grad_norm': 5.382341493930171, 'learning_rate': 6.546422939723738e-07, 'epoch': 0.77} 77%|███████▋ | 9498/12313 [7:06:55<2:07:55, 2.73s/it] 77%|███████▋ | 9499/12313 [7:06:57<2:03:50, 2.64s/it] {'loss': 0.4458, 'grad_norm': 5.842094824500748, 'learning_rate': 6.541986968392119e-07, 'epoch': 0.77} 77%|███████▋ | 9499/12313 [7:06:57<2:03:50, 2.64s/it] 77%|███████▋ | 9500/12313 [7:07:00<2:03:27, 2.63s/it] {'loss': 0.4932, 'grad_norm': 4.8906020204774165, 'learning_rate': 6.537552274257322e-07, 'epoch': 0.77} 77%|███████▋ | 9500/12313 [7:07:00<2:03:27, 2.63s/it] 77%|███████▋ | 9501/12313 [7:07:03<2:04:08, 2.65s/it] {'loss': 0.5223, 'grad_norm': 8.934213855225288, 'learning_rate': 6.533118857626194e-07, 'epoch': 0.77} 77%|███████▋ | 9501/12313 [7:07:03<2:04:08, 2.65s/it] 77%|███████▋ | 9502/12313 [7:07:05<2:03:49, 2.64s/it] {'loss': 0.5565, 'grad_norm': 4.746748520348421, 'learning_rate': 6.52868671880551e-07, 'epoch': 0.77} 77%|███████▋ | 9502/12313 [7:07:05<2:03:49, 2.64s/it] 77%|███████▋ | 9503/12313 [7:07:08<2:02:24, 2.61s/it] {'loss': 0.5495, 'grad_norm': 8.532705058823186, 'learning_rate': 6.524255858101938e-07, 'epoch': 0.77} 77%|███████▋ | 9503/12313 [7:07:08<2:02:24, 2.61s/it] 77%|███████▋ | 9504/12313 [7:07:11<2:08:52, 2.75s/it] {'loss': 0.5132, 'grad_norm': 10.865485946925974, 'learning_rate': 6.519826275822086e-07, 'epoch': 0.77} 77%|███████▋ | 9504/12313 [7:07:11<2:08:52, 2.75s/it] 77%|███████▋ | 9505/12313 [7:07:14<2:07:45, 2.73s/it] {'loss': 0.4055, 'grad_norm': 3.470727903343789, 'learning_rate': 6.515397972272444e-07, 'epoch': 0.77} 77%|███████▋ | 9505/12313 [7:07:14<2:07:45, 2.73s/it] 77%|███████▋ | 9506/12313 [7:07:16<2:06:52, 2.71s/it] {'loss': 0.4745, 'grad_norm': 4.190361234303148, 'learning_rate': 6.510970947759434e-07, 'epoch': 0.77} 77%|███████▋ | 9506/12313 [7:07:16<2:06:52, 2.71s/it] 77%|███████▋ | 9507/12313 [7:07:19<2:06:36, 2.71s/it] {'loss': 0.4198, 'grad_norm': 4.1269789461455515, 'learning_rate': 6.50654520258939e-07, 'epoch': 0.77} 77%|███████▋ | 9507/12313 [7:07:19<2:06:36, 2.71s/it] 77%|███████▋ | 9508/12313 [7:07:22<2:11:44, 2.82s/it] {'loss': 0.5517, 'grad_norm': 5.796517123286705, 'learning_rate': 6.502120737068543e-07, 'epoch': 0.77} 77%|███████▋ | 9508/12313 [7:07:22<2:11:44, 2.82s/it] 77%|███████▋ | 9509/12313 [7:07:25<2:08:12, 2.74s/it] {'loss': 0.4771, 'grad_norm': 3.9798948730265002, 'learning_rate': 6.497697551503032e-07, 'epoch': 0.77} 77%|███████▋ | 9509/12313 [7:07:25<2:08:12, 2.74s/it] 77%|███████▋ | 9510/12313 [7:07:28<2:09:29, 2.77s/it] {'loss': 0.507, 'grad_norm': 3.3670043032275445, 'learning_rate': 6.493275646198941e-07, 'epoch': 0.77} 77%|███████▋ | 9510/12313 [7:07:28<2:09:29, 2.77s/it] 77%|███████▋ | 9511/12313 [7:07:30<2:09:18, 2.77s/it] {'loss': 0.651, 'grad_norm': 4.789052775696503, 'learning_rate': 6.488855021462218e-07, 'epoch': 0.77} 77%|███████▋ | 9511/12313 [7:07:30<2:09:18, 2.77s/it] 77%|███████▋ | 9512/12313 [7:07:33<2:07:55, 2.74s/it] {'loss': 0.3833, 'grad_norm': 4.574449211149501, 'learning_rate': 6.484435677598761e-07, 'epoch': 0.77} 77%|███████▋ | 9512/12313 [7:07:33<2:07:55, 2.74s/it] 77%|███████▋ | 9513/12313 [7:07:36<2:07:13, 2.73s/it] {'loss': 0.52, 'grad_norm': 5.744708779873541, 'learning_rate': 6.480017614914369e-07, 'epoch': 0.77} 77%|███████▋ | 9513/12313 [7:07:36<2:07:13, 2.73s/it] 77%|███████▋ | 9514/12313 [7:07:38<2:06:57, 2.72s/it] {'loss': 0.5064, 'grad_norm': 6.185709557761946, 'learning_rate': 6.475600833714743e-07, 'epoch': 0.77} 77%|███████▋ | 9514/12313 [7:07:38<2:06:57, 2.72s/it] 77%|███████▋ | 9515/12313 [7:07:41<2:06:45, 2.72s/it] {'loss': 0.5049, 'grad_norm': 6.093796766128401, 'learning_rate': 6.471185334305491e-07, 'epoch': 0.77} 77%|███████▋ | 9515/12313 [7:07:41<2:06:45, 2.72s/it] 77%|███████▋ | 9516/12313 [7:07:44<2:13:33, 2.86s/it] {'loss': 0.4234, 'grad_norm': 5.325643999422, 'learning_rate': 6.466771116992162e-07, 'epoch': 0.77} 77%|███████▋ | 9516/12313 [7:07:44<2:13:33, 2.86s/it] 77%|███████▋ | 9517/12313 [7:07:47<2:10:45, 2.81s/it] {'loss': 0.504, 'grad_norm': 5.854284079602375, 'learning_rate': 6.462358182080175e-07, 'epoch': 0.77} 77%|███████▋ | 9517/12313 [7:07:47<2:10:45, 2.81s/it] 77%|███████▋ | 9518/12313 [7:07:49<2:07:05, 2.73s/it] {'loss': 0.6357, 'grad_norm': 6.5341398235283865, 'learning_rate': 6.457946529874895e-07, 'epoch': 0.77} 77%|███████▋ | 9518/12313 [7:07:49<2:07:05, 2.73s/it] 77%|███████▋ | 9519/12313 [7:07:52<2:08:24, 2.76s/it] {'loss': 0.484, 'grad_norm': 3.7271101675803027, 'learning_rate': 6.453536160681592e-07, 'epoch': 0.77} 77%|███████▋ | 9519/12313 [7:07:52<2:08:24, 2.76s/it] 77%|███████▋ | 9520/12313 [7:07:55<2:09:14, 2.78s/it] {'loss': 0.464, 'grad_norm': 3.610155546754207, 'learning_rate': 6.449127074805428e-07, 'epoch': 0.77} 77%|███████▋ | 9520/12313 [7:07:55<2:09:14, 2.78s/it] 77%|███████▋ | 9521/12313 [7:07:58<2:07:36, 2.74s/it] {'loss': 0.4131, 'grad_norm': 4.384516099481292, 'learning_rate': 6.444719272551491e-07, 'epoch': 0.77} 77%|███████▋ | 9521/12313 [7:07:58<2:07:36, 2.74s/it] 77%|███████▋ | 9522/12313 [7:08:01<2:07:10, 2.73s/it] {'loss': 0.489, 'grad_norm': 7.395867107582111, 'learning_rate': 6.440312754224773e-07, 'epoch': 0.77} 77%|███████▋ | 9522/12313 [7:08:01<2:07:10, 2.73s/it] 77%|███████▋ | 9523/12313 [7:08:03<2:04:42, 2.68s/it] {'loss': 0.3974, 'grad_norm': 3.52191839672315, 'learning_rate': 6.435907520130191e-07, 'epoch': 0.77} 77%|███████▋ | 9523/12313 [7:08:03<2:04:42, 2.68s/it] 77%|███████▋ | 9524/12313 [7:08:06<2:06:24, 2.72s/it] {'loss': 0.3681, 'grad_norm': 5.735438136851437, 'learning_rate': 6.431503570572554e-07, 'epoch': 0.77} 77%|███████▋ | 9524/12313 [7:08:06<2:06:24, 2.72s/it] 77%|███████▋ | 9525/12313 [7:08:09<2:08:05, 2.76s/it] {'loss': 0.46, 'grad_norm': 4.138098595663441, 'learning_rate': 6.427100905856598e-07, 'epoch': 0.77} 77%|███████▋ | 9525/12313 [7:08:09<2:08:05, 2.76s/it] 77%|███████▋ | 9526/12313 [7:08:11<2:04:03, 2.67s/it] {'loss': 0.5792, 'grad_norm': 4.627439119735813, 'learning_rate': 6.422699526286969e-07, 'epoch': 0.77} 77%|███████▋ | 9526/12313 [7:08:11<2:04:03, 2.67s/it] 77%|███████▋ | 9527/12313 [7:08:14<2:04:22, 2.68s/it] {'loss': 0.6043, 'grad_norm': 4.326175948208715, 'learning_rate': 6.418299432168215e-07, 'epoch': 0.77} 77%|███████▋ | 9527/12313 [7:08:14<2:04:22, 2.68s/it] 77%|███████▋ | 9528/12313 [7:08:17<2:04:53, 2.69s/it] {'loss': 0.3815, 'grad_norm': 4.7243209100130334, 'learning_rate': 6.413900623804792e-07, 'epoch': 0.77} 77%|███████▋ | 9528/12313 [7:08:17<2:04:53, 2.69s/it] 77%|███████▋ | 9529/12313 [7:08:19<2:06:24, 2.72s/it] {'loss': 0.5016, 'grad_norm': 6.9628262078532535, 'learning_rate': 6.409503101501086e-07, 'epoch': 0.77} 77%|███████▋ | 9529/12313 [7:08:19<2:06:24, 2.72s/it] 77%|███████▋ | 9530/12313 [7:08:22<2:08:48, 2.78s/it] {'loss': 0.6054, 'grad_norm': 3.781446710484677, 'learning_rate': 6.405106865561367e-07, 'epoch': 0.77} 77%|███████▋ | 9530/12313 [7:08:22<2:08:48, 2.78s/it] 77%|███████▋ | 9531/12313 [7:08:25<2:11:19, 2.83s/it] {'loss': 0.443, 'grad_norm': 8.30986430715804, 'learning_rate': 6.400711916289846e-07, 'epoch': 0.77} 77%|███████▋ | 9531/12313 [7:08:25<2:11:19, 2.83s/it] 77%|███████▋ | 9532/12313 [7:08:28<2:08:53, 2.78s/it] {'loss': 0.4772, 'grad_norm': 4.838062384790877, 'learning_rate': 6.396318253990628e-07, 'epoch': 0.77} 77%|███████▋ | 9532/12313 [7:08:28<2:08:53, 2.78s/it] 77%|███████▋ | 9533/12313 [7:08:31<2:10:28, 2.82s/it] {'loss': 0.4941, 'grad_norm': 6.597678306183141, 'learning_rate': 6.391925878967728e-07, 'epoch': 0.77} 77%|███████▋ | 9533/12313 [7:08:31<2:10:28, 2.82s/it] 77%|███████▋ | 9534/12313 [7:08:34<2:09:02, 2.79s/it] {'loss': 0.5037, 'grad_norm': 4.6205393788485845, 'learning_rate': 6.387534791525072e-07, 'epoch': 0.77} 77%|███████▋ | 9534/12313 [7:08:34<2:09:02, 2.79s/it] 77%|███████▋ | 9535/12313 [7:08:36<2:08:08, 2.77s/it] {'loss': 0.7124, 'grad_norm': 4.269745258902035, 'learning_rate': 6.383144991966508e-07, 'epoch': 0.77} 77%|███████▋ | 9535/12313 [7:08:36<2:08:08, 2.77s/it] 77%|███████▋ | 9536/12313 [7:08:39<2:06:27, 2.73s/it] {'loss': 0.4697, 'grad_norm': 7.2941195850877145, 'learning_rate': 6.378756480595782e-07, 'epoch': 0.77} 77%|███████▋ | 9536/12313 [7:08:39<2:06:27, 2.73s/it] 77%|███████▋ | 9537/12313 [7:08:42<2:06:07, 2.73s/it] {'loss': 0.4272, 'grad_norm': 8.30813779565925, 'learning_rate': 6.374369257716548e-07, 'epoch': 0.77} 77%|███████▋ | 9537/12313 [7:08:42<2:06:07, 2.73s/it] 77%|███████▋ | 9538/12313 [7:08:44<2:05:20, 2.71s/it] {'loss': 0.3696, 'grad_norm': 5.268713325301926, 'learning_rate': 6.369983323632389e-07, 'epoch': 0.77} 77%|███████▋ | 9538/12313 [7:08:44<2:05:20, 2.71s/it] 77%|███████▋ | 9539/12313 [7:08:47<2:05:05, 2.71s/it] {'loss': 0.499, 'grad_norm': 6.353077785001505, 'learning_rate': 6.365598678646793e-07, 'epoch': 0.77} 77%|███████▋ | 9539/12313 [7:08:47<2:05:05, 2.71s/it] 77%|███████▋ | 9540/12313 [7:08:50<2:03:16, 2.67s/it] {'loss': 0.517, 'grad_norm': 7.787690788411613, 'learning_rate': 6.361215323063144e-07, 'epoch': 0.77} 77%|███████▋ | 9540/12313 [7:08:50<2:03:16, 2.67s/it] 77%|███████▋ | 9541/12313 [7:08:52<2:00:43, 2.61s/it] {'loss': 0.3541, 'grad_norm': 6.712726861513404, 'learning_rate': 6.356833257184747e-07, 'epoch': 0.77} 77%|███████▋ | 9541/12313 [7:08:52<2:00:43, 2.61s/it] 77%|███████▋ | 9542/12313 [7:08:55<2:02:32, 2.65s/it] {'loss': 0.5433, 'grad_norm': 7.036101207768536, 'learning_rate': 6.352452481314825e-07, 'epoch': 0.77} 77%|███████▋ | 9542/12313 [7:08:55<2:02:32, 2.65s/it] 78%|███████▊ | 9543/12313 [7:08:58<2:04:57, 2.71s/it] {'loss': 0.4672, 'grad_norm': 4.440795097972874, 'learning_rate': 6.348072995756497e-07, 'epoch': 0.78} 78%|███████▊ | 9543/12313 [7:08:58<2:04:57, 2.71s/it] 78%|███████▊ | 9544/12313 [7:09:00<2:03:34, 2.68s/it] {'loss': 0.479, 'grad_norm': 6.623356794323855, 'learning_rate': 6.3436948008128e-07, 'epoch': 0.78} 78%|███████▊ | 9544/12313 [7:09:00<2:03:34, 2.68s/it] 78%|███████▊ | 9545/12313 [7:09:03<2:03:32, 2.68s/it] {'loss': 0.5124, 'grad_norm': 3.870234328714397, 'learning_rate': 6.339317896786693e-07, 'epoch': 0.78} 78%|███████▊ | 9545/12313 [7:09:03<2:03:32, 2.68s/it] 78%|███████▊ | 9546/12313 [7:09:06<2:02:08, 2.65s/it] {'loss': 0.5108, 'grad_norm': 7.935205106400605, 'learning_rate': 6.33494228398103e-07, 'epoch': 0.78} 78%|███████▊ | 9546/12313 [7:09:06<2:02:08, 2.65s/it] 78%|███████▊ | 9547/12313 [7:09:09<2:08:06, 2.78s/it] {'loss': 0.5893, 'grad_norm': 4.477009003593662, 'learning_rate': 6.33056796269857e-07, 'epoch': 0.78} 78%|███████▊ | 9547/12313 [7:09:09<2:08:06, 2.78s/it] 78%|███████▊ | 9548/12313 [7:09:12<2:10:54, 2.84s/it] {'loss': 0.5316, 'grad_norm': 6.221717457872238, 'learning_rate': 6.326194933242006e-07, 'epoch': 0.78} 78%|███████▊ | 9548/12313 [7:09:12<2:10:54, 2.84s/it] 78%|███████▊ | 9549/12313 [7:09:14<2:05:11, 2.72s/it] {'loss': 0.5283, 'grad_norm': 5.507373178005721, 'learning_rate': 6.321823195913924e-07, 'epoch': 0.78} 78%|███████▊ | 9549/12313 [7:09:14<2:05:11, 2.72s/it] 78%|███████▊ | 9550/12313 [7:09:17<2:06:11, 2.74s/it] {'loss': 0.5322, 'grad_norm': 5.749251136653241, 'learning_rate': 6.317452751016815e-07, 'epoch': 0.78} 78%|███████▊ | 9550/12313 [7:09:17<2:06:11, 2.74s/it] 78%|███████▊ | 9551/12313 [7:09:20<2:08:01, 2.78s/it] {'loss': 0.5895, 'grad_norm': 4.194072441757716, 'learning_rate': 6.313083598853101e-07, 'epoch': 0.78} 78%|███████▊ | 9551/12313 [7:09:20<2:08:01, 2.78s/it] 78%|███████▊ | 9552/12313 [7:09:23<2:12:22, 2.88s/it] {'loss': 0.3903, 'grad_norm': 5.534135119543315, 'learning_rate': 6.308715739725108e-07, 'epoch': 0.78} 78%|███████▊ | 9552/12313 [7:09:23<2:12:22, 2.88s/it] 78%|███████▊ | 9553/12313 [7:09:25<2:08:33, 2.79s/it] {'loss': 0.4641, 'grad_norm': 5.890217163199354, 'learning_rate': 6.30434917393506e-07, 'epoch': 0.78} 78%|███████▊ | 9553/12313 [7:09:25<2:08:33, 2.79s/it] 78%|███████▊ | 9554/12313 [7:09:28<2:10:04, 2.83s/it] {'loss': 0.5238, 'grad_norm': 4.610182903829491, 'learning_rate': 6.299983901785109e-07, 'epoch': 0.78} 78%|███████▊ | 9554/12313 [7:09:28<2:10:04, 2.83s/it] 78%|███████▊ | 9555/12313 [7:09:31<2:12:40, 2.89s/it] {'loss': 0.4373, 'grad_norm': 5.475473491313324, 'learning_rate': 6.295619923577303e-07, 'epoch': 0.78} 78%|███████▊ | 9555/12313 [7:09:31<2:12:40, 2.89s/it] 78%|███████▊ | 9556/12313 [7:09:34<2:06:57, 2.76s/it] {'loss': 0.5475, 'grad_norm': 3.7085653093083364, 'learning_rate': 6.291257239613599e-07, 'epoch': 0.78} 78%|███████▊ | 9556/12313 [7:09:34<2:06:57, 2.76s/it] 78%|███████▊ | 9557/12313 [7:09:36<2:05:05, 2.72s/it] {'loss': 0.521, 'grad_norm': 5.431007075610515, 'learning_rate': 6.286895850195882e-07, 'epoch': 0.78} 78%|███████▊ | 9557/12313 [7:09:36<2:05:05, 2.72s/it] 78%|███████▊ | 9558/12313 [7:09:39<2:03:09, 2.68s/it] {'loss': 0.5119, 'grad_norm': 7.326528629894263, 'learning_rate': 6.28253575562594e-07, 'epoch': 0.78} 78%|███████▊ | 9558/12313 [7:09:39<2:03:09, 2.68s/it] 78%|███████▊ | 9559/12313 [7:09:42<2:03:24, 2.69s/it] {'loss': 0.468, 'grad_norm': 6.69328051954586, 'learning_rate': 6.278176956205462e-07, 'epoch': 0.78} 78%|███████▊ | 9559/12313 [7:09:42<2:03:24, 2.69s/it] 78%|███████▊ | 9560/12313 [7:09:45<2:05:04, 2.73s/it] {'loss': 0.4013, 'grad_norm': 7.6181081748087776, 'learning_rate': 6.273819452236049e-07, 'epoch': 0.78} 78%|███████▊ | 9560/12313 [7:09:45<2:05:04, 2.73s/it] 78%|███████▊ | 9561/12313 [7:09:48<2:10:15, 2.84s/it] {'loss': 0.4919, 'grad_norm': 5.517368857197591, 'learning_rate': 6.269463244019231e-07, 'epoch': 0.78} 78%|███████▊ | 9561/12313 [7:09:48<2:10:15, 2.84s/it] 78%|███████▊ | 9562/12313 [7:09:50<2:03:59, 2.70s/it] {'loss': 0.4883, 'grad_norm': 5.2478286059509145, 'learning_rate': 6.265108331856423e-07, 'epoch': 0.78} 78%|███████▊ | 9562/12313 [7:09:50<2:03:59, 2.70s/it] 78%|███████▊ | 9563/12313 [7:09:53<2:06:43, 2.76s/it] {'loss': 0.4527, 'grad_norm': 8.241994509786712, 'learning_rate': 6.260754716048961e-07, 'epoch': 0.78} 78%|███████▊ | 9563/12313 [7:09:53<2:06:43, 2.76s/it] 78%|███████▊ | 9564/12313 [7:09:56<2:05:43, 2.74s/it] {'loss': 0.4505, 'grad_norm': 5.796864584164602, 'learning_rate': 6.256402396898095e-07, 'epoch': 0.78} 78%|███████▊ | 9564/12313 [7:09:56<2:05:43, 2.74s/it] 78%|███████▊ | 9565/12313 [7:09:58<2:02:16, 2.67s/it] {'loss': 0.5593, 'grad_norm': 5.288449361519552, 'learning_rate': 6.252051374704992e-07, 'epoch': 0.78} 78%|███████▊ | 9565/12313 [7:09:58<2:02:16, 2.67s/it] 78%|███████▊ | 9566/12313 [7:10:01<2:04:39, 2.72s/it] {'loss': 0.4833, 'grad_norm': 4.215256696565076, 'learning_rate': 6.247701649770707e-07, 'epoch': 0.78} 78%|███████▊ | 9566/12313 [7:10:01<2:04:39, 2.72s/it] 78%|███████▊ | 9567/12313 [7:10:04<2:02:32, 2.68s/it] {'loss': 0.4169, 'grad_norm': 4.625763848409002, 'learning_rate': 6.243353222396229e-07, 'epoch': 0.78} 78%|███████▊ | 9567/12313 [7:10:04<2:02:32, 2.68s/it] 78%|███████▊ | 9568/12313 [7:10:06<2:02:49, 2.68s/it] {'loss': 0.5572, 'grad_norm': 7.386761076675003, 'learning_rate': 6.239006092882438e-07, 'epoch': 0.78} 78%|███████▊ | 9568/12313 [7:10:06<2:02:49, 2.68s/it] 78%|███████▊ | 9569/12313 [7:10:09<1:59:35, 2.62s/it] {'loss': 0.448, 'grad_norm': 3.960429753595598, 'learning_rate': 6.234660261530126e-07, 'epoch': 0.78} 78%|███████▊ | 9569/12313 [7:10:09<1:59:35, 2.62s/it] 78%|███████▊ | 9570/12313 [7:10:11<1:58:27, 2.59s/it] {'loss': 0.4399, 'grad_norm': 6.218816667980998, 'learning_rate': 6.23031572864001e-07, 'epoch': 0.78} 78%|███████▊ | 9570/12313 [7:10:11<1:58:27, 2.59s/it] 78%|███████▊ | 9571/12313 [7:10:14<1:56:59, 2.56s/it] {'loss': 0.4474, 'grad_norm': 11.37451355769962, 'learning_rate': 6.225972494512719e-07, 'epoch': 0.78} 78%|███████▊ | 9571/12313 [7:10:14<1:56:59, 2.56s/it] 78%|███████▊ | 9572/12313 [7:10:16<1:57:55, 2.58s/it] {'loss': 0.4599, 'grad_norm': 4.63095111539401, 'learning_rate': 6.22163055944876e-07, 'epoch': 0.78} 78%|███████▊ | 9572/12313 [7:10:16<1:57:55, 2.58s/it] 78%|███████▊ | 9573/12313 [7:10:19<1:57:59, 2.58s/it] {'loss': 0.3144, 'grad_norm': 4.637853908423072, 'learning_rate': 6.217289923748592e-07, 'epoch': 0.78} 78%|███████▊ | 9573/12313 [7:10:19<1:57:59, 2.58s/it] 78%|███████▊ | 9574/12313 [7:10:21<1:57:29, 2.57s/it] {'loss': 0.6146, 'grad_norm': 7.081685491650372, 'learning_rate': 6.212950587712557e-07, 'epoch': 0.78} 78%|███████▊ | 9574/12313 [7:10:21<1:57:29, 2.57s/it] 78%|███████▊ | 9575/12313 [7:10:24<1:58:52, 2.60s/it] {'loss': 0.5314, 'grad_norm': 4.122348761014315, 'learning_rate': 6.20861255164091e-07, 'epoch': 0.78} 78%|███████▊ | 9575/12313 [7:10:24<1:58:52, 2.60s/it] 78%|███████▊ | 9576/12313 [7:10:27<2:00:05, 2.63s/it] {'loss': 0.4767, 'grad_norm': 5.74857146041133, 'learning_rate': 6.204275815833807e-07, 'epoch': 0.78} 78%|███████▊ | 9576/12313 [7:10:27<2:00:05, 2.63s/it] 78%|███████▊ | 9577/12313 [7:10:29<1:58:19, 2.59s/it] {'loss': 0.4707, 'grad_norm': 5.874273219932277, 'learning_rate': 6.19994038059136e-07, 'epoch': 0.78} 78%|███████▊ | 9577/12313 [7:10:29<1:58:19, 2.59s/it] 78%|███████▊ | 9578/12313 [7:10:32<2:02:08, 2.68s/it] {'loss': 0.5144, 'grad_norm': 6.134515006712459, 'learning_rate': 6.19560624621354e-07, 'epoch': 0.78} 78%|███████▊ | 9578/12313 [7:10:32<2:02:08, 2.68s/it] 78%|███████▊ | 9579/12313 [7:10:35<2:03:12, 2.70s/it] {'loss': 0.3622, 'grad_norm': 5.019484047549089, 'learning_rate': 6.191273413000237e-07, 'epoch': 0.78} 78%|███████▊ | 9579/12313 [7:10:35<2:03:12, 2.70s/it] 78%|███████▊ | 9580/12313 [7:10:37<1:59:26, 2.62s/it] {'loss': 0.3609, 'grad_norm': 4.813176127586792, 'learning_rate': 6.186941881251279e-07, 'epoch': 0.78} 78%|███████▊ | 9580/12313 [7:10:37<1:59:26, 2.62s/it] 78%|███████▊ | 9581/12313 [7:10:40<1:59:22, 2.62s/it] {'loss': 0.4625, 'grad_norm': 6.957964558660779, 'learning_rate': 6.182611651266376e-07, 'epoch': 0.78} 78%|███████▊ | 9581/12313 [7:10:40<1:59:22, 2.62s/it] 78%|███████▊ | 9582/12313 [7:10:43<2:03:30, 2.71s/it] {'loss': 0.4196, 'grad_norm': 4.786938388125974, 'learning_rate': 6.17828272334515e-07, 'epoch': 0.78} 78%|███████▊ | 9582/12313 [7:10:43<2:03:30, 2.71s/it] 78%|███████▊ | 9583/12313 [7:10:45<1:58:55, 2.61s/it] {'loss': 0.3105, 'grad_norm': 14.455541969389369, 'learning_rate': 6.173955097787149e-07, 'epoch': 0.78} 78%|███████▊ | 9583/12313 [7:10:45<1:58:55, 2.61s/it] 78%|███████▊ | 9584/12313 [7:10:48<1:59:38, 2.63s/it] {'loss': 0.6127, 'grad_norm': 6.015474094967977, 'learning_rate': 6.169628774891826e-07, 'epoch': 0.78} 78%|███████▊ | 9584/12313 [7:10:48<1:59:38, 2.63s/it] 78%|███████▊ | 9585/12313 [7:10:50<1:56:30, 2.56s/it] {'loss': 0.5329, 'grad_norm': 7.69183168267593, 'learning_rate': 6.165303754958524e-07, 'epoch': 0.78} 78%|███████▊ | 9585/12313 [7:10:50<1:56:30, 2.56s/it] 78%|███████▊ | 9586/12313 [7:10:53<1:56:04, 2.55s/it] {'loss': 0.3892, 'grad_norm': 7.494666404519941, 'learning_rate': 6.160980038286529e-07, 'epoch': 0.78} 78%|███████▊ | 9586/12313 [7:10:53<1:56:04, 2.55s/it] 78%|███████▊ | 9587/12313 [7:10:56<1:58:23, 2.61s/it] {'loss': 0.4718, 'grad_norm': 3.6352807891152508, 'learning_rate': 6.156657625175011e-07, 'epoch': 0.78} 78%|███████▊ | 9587/12313 [7:10:56<1:58:23, 2.61s/it] 78%|███████▊ | 9588/12313 [7:10:58<2:00:04, 2.64s/it] {'loss': 0.49, 'grad_norm': 3.6374040934237066, 'learning_rate': 6.152336515923052e-07, 'epoch': 0.78} 78%|███████▊ | 9588/12313 [7:10:58<2:00:04, 2.64s/it] 78%|███████▊ | 9589/12313 [7:11:01<2:02:16, 2.69s/it] {'loss': 0.5964, 'grad_norm': 4.790151735557977, 'learning_rate': 6.148016710829654e-07, 'epoch': 0.78} 78%|███████▊ | 9589/12313 [7:11:01<2:02:16, 2.69s/it] 78%|███████▊ | 9590/12313 [7:11:04<2:06:12, 2.78s/it] {'loss': 0.7207, 'grad_norm': 6.6959826045900614, 'learning_rate': 6.143698210193738e-07, 'epoch': 0.78} 78%|███████▊ | 9590/12313 [7:11:04<2:06:12, 2.78s/it] 78%|███████▊ | 9591/12313 [7:11:07<2:01:56, 2.69s/it] {'loss': 0.4336, 'grad_norm': 5.021956045573845, 'learning_rate': 6.139381014314108e-07, 'epoch': 0.78} 78%|███████▊ | 9591/12313 [7:11:07<2:01:56, 2.69s/it] 78%|███████▊ | 9592/12313 [7:11:09<2:00:49, 2.66s/it] {'loss': 0.4282, 'grad_norm': 8.574991586907696, 'learning_rate': 6.135065123489486e-07, 'epoch': 0.78} 78%|███████▊ | 9592/12313 [7:11:09<2:00:49, 2.66s/it] 78%|███████▊ | 9593/12313 [7:11:12<2:00:29, 2.66s/it] {'loss': 0.5189, 'grad_norm': 4.869738833833407, 'learning_rate': 6.130750538018524e-07, 'epoch': 0.78} 78%|███████▊ | 9593/12313 [7:11:12<2:00:29, 2.66s/it] 78%|███████▊ | 9594/12313 [7:11:14<1:58:13, 2.61s/it] {'loss': 0.4978, 'grad_norm': 5.876692632158238, 'learning_rate': 6.12643725819976e-07, 'epoch': 0.78} 78%|███████▊ | 9594/12313 [7:11:14<1:58:13, 2.61s/it] 78%|███████▊ | 9595/12313 [7:11:17<1:57:38, 2.60s/it] {'loss': 0.5031, 'grad_norm': 7.302404610138221, 'learning_rate': 6.122125284331646e-07, 'epoch': 0.78} 78%|███████▊ | 9595/12313 [7:11:17<1:57:38, 2.60s/it] 78%|███████▊ | 9596/12313 [7:11:20<2:02:00, 2.69s/it] {'loss': 0.4399, 'grad_norm': 6.239496444525358, 'learning_rate': 6.117814616712548e-07, 'epoch': 0.78} 78%|███████▊ | 9596/12313 [7:11:20<2:02:00, 2.69s/it] 78%|███████▊ | 9597/12313 [7:11:23<2:02:29, 2.71s/it] {'loss': 0.4661, 'grad_norm': 4.341820954785043, 'learning_rate': 6.113505255640756e-07, 'epoch': 0.78} 78%|███████▊ | 9597/12313 [7:11:23<2:02:29, 2.71s/it] 78%|███████▊ | 9598/12313 [7:11:25<2:00:52, 2.67s/it] {'loss': 0.4285, 'grad_norm': 4.82168987167205, 'learning_rate': 6.109197201414438e-07, 'epoch': 0.78} 78%|███████▊ | 9598/12313 [7:11:25<2:00:52, 2.67s/it] 78%|███████▊ | 9599/12313 [7:11:28<2:01:17, 2.68s/it] {'loss': 0.6044, 'grad_norm': 5.620360996597726, 'learning_rate': 6.104890454331702e-07, 'epoch': 0.78} 78%|███████▊ | 9599/12313 [7:11:28<2:01:17, 2.68s/it] 78%|███████▊ | 9600/12313 [7:11:31<2:01:54, 2.70s/it] {'loss': 0.6426, 'grad_norm': 4.809738135444571, 'learning_rate': 6.100585014690547e-07, 'epoch': 0.78} 78%|███████▊ | 9600/12313 [7:11:31<2:01:54, 2.70s/it] 78%|███████▊ | 9601/12313 [7:11:34<2:06:22, 2.80s/it] {'loss': 0.5404, 'grad_norm': 3.621342250605208, 'learning_rate': 6.096280882788874e-07, 'epoch': 0.78} 78%|███████▊ | 9601/12313 [7:11:34<2:06:22, 2.80s/it] 78%|███████▊ | 9602/12313 [7:11:37<2:06:22, 2.80s/it] {'loss': 0.3656, 'grad_norm': 8.468527887913863, 'learning_rate': 6.091978058924522e-07, 'epoch': 0.78} 78%|███████▊ | 9602/12313 [7:11:37<2:06:22, 2.80s/it] 78%|███████▊ | 9603/12313 [7:11:39<2:04:52, 2.76s/it] {'loss': 0.6374, 'grad_norm': 3.464489359474417, 'learning_rate': 6.087676543395224e-07, 'epoch': 0.78} 78%|███████▊ | 9603/12313 [7:11:39<2:04:52, 2.76s/it] 78%|███████▊ | 9604/12313 [7:11:42<2:03:11, 2.73s/it] {'loss': 0.5772, 'grad_norm': 5.245897271466931, 'learning_rate': 6.083376336498608e-07, 'epoch': 0.78} 78%|███████▊ | 9604/12313 [7:11:42<2:03:11, 2.73s/it] 78%|███████▊ | 9605/12313 [7:11:45<2:03:14, 2.73s/it] {'loss': 0.5533, 'grad_norm': 6.62332037882836, 'learning_rate': 6.079077438532246e-07, 'epoch': 0.78} 78%|███████▊ | 9605/12313 [7:11:45<2:03:14, 2.73s/it] 78%|███████▊ | 9606/12313 [7:11:47<2:02:48, 2.72s/it] {'loss': 0.5408, 'grad_norm': 5.690296444200083, 'learning_rate': 6.074779849793585e-07, 'epoch': 0.78} 78%|███████▊ | 9606/12313 [7:11:47<2:02:48, 2.72s/it] 78%|███████▊ | 9607/12313 [7:11:50<2:02:30, 2.72s/it] {'loss': 0.558, 'grad_norm': 4.804271179305116, 'learning_rate': 6.07048357057999e-07, 'epoch': 0.78} 78%|███████▊ | 9607/12313 [7:11:50<2:02:30, 2.72s/it] 78%|███████▊ | 9608/12313 [7:11:53<1:59:53, 2.66s/it] {'loss': 0.5146, 'grad_norm': 6.074710914942616, 'learning_rate': 6.066188601188757e-07, 'epoch': 0.78} 78%|███████▊ | 9608/12313 [7:11:53<1:59:53, 2.66s/it] 78%|███████▊ | 9609/12313 [7:11:55<2:00:39, 2.68s/it] {'loss': 0.5551, 'grad_norm': 6.506228847573674, 'learning_rate': 6.061894941917062e-07, 'epoch': 0.78} 78%|███████▊ | 9609/12313 [7:11:55<2:00:39, 2.68s/it] 78%|███████▊ | 9610/12313 [7:11:58<2:00:31, 2.68s/it] {'loss': 0.4042, 'grad_norm': 5.035663800312012, 'learning_rate': 6.057602593062015e-07, 'epoch': 0.78} 78%|███████▊ | 9610/12313 [7:11:58<2:00:31, 2.68s/it] 78%|███████▊ | 9611/12313 [7:12:01<2:01:27, 2.70s/it] {'loss': 0.4688, 'grad_norm': 3.7689993752107234, 'learning_rate': 6.053311554920607e-07, 'epoch': 0.78} 78%|███████▊ | 9611/12313 [7:12:01<2:01:27, 2.70s/it] 78%|███████▊ | 9612/12313 [7:12:03<1:58:27, 2.63s/it] {'loss': 0.3031, 'grad_norm': 6.095711004984003, 'learning_rate': 6.049021827789774e-07, 'epoch': 0.78} 78%|███████▊ | 9612/12313 [7:12:03<1:58:27, 2.63s/it] 78%|███████▊ | 9613/12313 [7:12:06<2:01:26, 2.70s/it] {'loss': 0.4798, 'grad_norm': 5.01460034158621, 'learning_rate': 6.044733411966336e-07, 'epoch': 0.78} 78%|███████▊ | 9613/12313 [7:12:06<2:01:26, 2.70s/it] 78%|███████▊ | 9614/12313 [7:12:09<2:00:27, 2.68s/it] {'loss': 0.5674, 'grad_norm': 8.19175212779431, 'learning_rate': 6.040446307747019e-07, 'epoch': 0.78} 78%|███████▊ | 9614/12313 [7:12:09<2:00:27, 2.68s/it] 78%|███████▊ | 9615/12313 [7:12:11<1:58:38, 2.64s/it] {'loss': 0.4449, 'grad_norm': 4.961605893602875, 'learning_rate': 6.036160515428475e-07, 'epoch': 0.78} 78%|███████▊ | 9615/12313 [7:12:11<1:58:38, 2.64s/it] 78%|███████▊ | 9616/12313 [7:12:14<1:59:27, 2.66s/it] {'loss': 0.4569, 'grad_norm': 6.915675610043426, 'learning_rate': 6.031876035307263e-07, 'epoch': 0.78} 78%|███████▊ | 9616/12313 [7:12:14<1:59:27, 2.66s/it] 78%|███████▊ | 9617/12313 [7:12:17<1:59:48, 2.67s/it] {'loss': 0.3962, 'grad_norm': 5.568377529756666, 'learning_rate': 6.027592867679838e-07, 'epoch': 0.78} 78%|███████▊ | 9617/12313 [7:12:17<1:59:48, 2.67s/it] 78%|███████▊ | 9618/12313 [7:12:19<2:02:52, 2.74s/it] {'loss': 0.5745, 'grad_norm': 6.129156630011155, 'learning_rate': 6.023311012842581e-07, 'epoch': 0.78} 78%|███████▊ | 9618/12313 [7:12:19<2:02:52, 2.74s/it] 78%|███████▊ | 9619/12313 [7:12:22<2:04:28, 2.77s/it] {'loss': 0.3951, 'grad_norm': 5.108160511033742, 'learning_rate': 6.019030471091772e-07, 'epoch': 0.78} 78%|███████▊ | 9619/12313 [7:12:22<2:04:28, 2.77s/it] 78%|███████▊ | 9620/12313 [7:12:25<1:59:52, 2.67s/it] {'loss': 0.5265, 'grad_norm': 5.561892142462487, 'learning_rate': 6.014751242723591e-07, 'epoch': 0.78} 78%|███████▊ | 9620/12313 [7:12:25<1:59:52, 2.67s/it] 78%|███████▊ | 9621/12313 [7:12:27<1:58:24, 2.64s/it] {'loss': 0.5185, 'grad_norm': 3.7511127265919244, 'learning_rate': 6.010473328034153e-07, 'epoch': 0.78} 78%|███████▊ | 9621/12313 [7:12:27<1:58:24, 2.64s/it] 78%|███████▊ | 9622/12313 [7:12:30<1:59:24, 2.66s/it] {'loss': 0.4383, 'grad_norm': 5.167066450794399, 'learning_rate': 6.006196727319452e-07, 'epoch': 0.78} 78%|███████▊ | 9622/12313 [7:12:30<1:59:24, 2.66s/it] 78%|███████▊ | 9623/12313 [7:12:32<1:56:48, 2.61s/it] {'loss': 0.3846, 'grad_norm': 7.96179114023694, 'learning_rate': 6.001921440875414e-07, 'epoch': 0.78} 78%|███████▊ | 9623/12313 [7:12:32<1:56:48, 2.61s/it] 78%|███████▊ | 9624/12313 [7:12:35<1:56:34, 2.60s/it] {'loss': 0.6281, 'grad_norm': 5.371452420693181, 'learning_rate': 5.997647468997875e-07, 'epoch': 0.78} 78%|███████▊ | 9624/12313 [7:12:35<1:56:34, 2.60s/it] 78%|███████▊ | 9625/12313 [7:12:38<1:56:46, 2.61s/it] {'loss': 0.5173, 'grad_norm': 5.8732462009916, 'learning_rate': 5.99337481198256e-07, 'epoch': 0.78} 78%|███████▊ | 9625/12313 [7:12:38<1:56:46, 2.61s/it] 78%|███████▊ | 9626/12313 [7:12:40<1:59:15, 2.66s/it] {'loss': 0.5523, 'grad_norm': 5.56845044238569, 'learning_rate': 5.989103470125113e-07, 'epoch': 0.78} 78%|███████▊ | 9626/12313 [7:12:40<1:59:15, 2.66s/it] 78%|███████▊ | 9627/12313 [7:12:43<2:02:31, 2.74s/it] {'loss': 0.3735, 'grad_norm': 6.424489397897517, 'learning_rate': 5.984833443721097e-07, 'epoch': 0.78} 78%|███████▊ | 9627/12313 [7:12:43<2:02:31, 2.74s/it] 78%|███████▊ | 9628/12313 [7:12:46<2:02:32, 2.74s/it] {'loss': 0.4501, 'grad_norm': 4.1718433726191275, 'learning_rate': 5.980564733065963e-07, 'epoch': 0.78} 78%|███████▊ | 9628/12313 [7:12:46<2:02:32, 2.74s/it] 78%|███████▊ | 9629/12313 [7:12:49<2:02:23, 2.74s/it] {'loss': 0.5626, 'grad_norm': 5.012432986771707, 'learning_rate': 5.976297338455101e-07, 'epoch': 0.78} 78%|███████▊ | 9629/12313 [7:12:49<2:02:23, 2.74s/it] 78%|███████▊ | 9630/12313 [7:12:52<2:01:08, 2.71s/it] {'loss': 0.5116, 'grad_norm': 13.705889019793014, 'learning_rate': 5.972031260183772e-07, 'epoch': 0.78} 78%|███████▊ | 9630/12313 [7:12:52<2:01:08, 2.71s/it] 78%|███████▊ | 9631/12313 [7:12:54<2:01:33, 2.72s/it] {'loss': 0.4009, 'grad_norm': 5.262747778591087, 'learning_rate': 5.967766498547181e-07, 'epoch': 0.78} 78%|███████▊ | 9631/12313 [7:12:54<2:01:33, 2.72s/it] 78%|███████▊ | 9632/12313 [7:12:57<2:04:25, 2.78s/it] {'loss': 0.4744, 'grad_norm': 4.146136341129552, 'learning_rate': 5.963503053840425e-07, 'epoch': 0.78} 78%|███████▊ | 9632/12313 [7:12:57<2:04:25, 2.78s/it] 78%|███████▊ | 9633/12313 [7:13:00<2:02:35, 2.74s/it] {'loss': 0.4348, 'grad_norm': 6.971385385680695, 'learning_rate': 5.959240926358501e-07, 'epoch': 0.78} 78%|███████▊ | 9633/12313 [7:13:00<2:02:35, 2.74s/it] 78%|███████▊ | 9634/12313 [7:13:03<2:02:25, 2.74s/it] {'loss': 0.6681, 'grad_norm': 4.565877029510119, 'learning_rate': 5.954980116396336e-07, 'epoch': 0.78} 78%|███████▊ | 9634/12313 [7:13:03<2:02:25, 2.74s/it] 78%|███████▊ | 9635/12313 [7:13:05<1:58:23, 2.65s/it] {'loss': 0.484, 'grad_norm': 4.089174310097666, 'learning_rate': 5.950720624248749e-07, 'epoch': 0.78} 78%|███████▊ | 9635/12313 [7:13:05<1:58:23, 2.65s/it] 78%|███████▊ | 9636/12313 [7:13:08<1:56:15, 2.61s/it] {'loss': 0.4509, 'grad_norm': 5.870091842126987, 'learning_rate': 5.946462450210477e-07, 'epoch': 0.78} 78%|███████▊ | 9636/12313 [7:13:08<1:56:15, 2.61s/it] 78%|███████▊ | 9637/12313 [7:13:10<1:56:27, 2.61s/it] {'loss': 0.5734, 'grad_norm': 17.233940029392116, 'learning_rate': 5.942205594576173e-07, 'epoch': 0.78} 78%|███████▊ | 9637/12313 [7:13:10<1:56:27, 2.61s/it] 78%|███████▊ | 9638/12313 [7:13:13<1:56:49, 2.62s/it] {'loss': 0.4828, 'grad_norm': 12.113335333152698, 'learning_rate': 5.937950057640376e-07, 'epoch': 0.78} 78%|███████▊ | 9638/12313 [7:13:13<1:56:49, 2.62s/it] 78%|███████▊ | 9639/12313 [7:13:15<1:57:51, 2.64s/it] {'loss': 0.4928, 'grad_norm': 7.210552071495601, 'learning_rate': 5.933695839697548e-07, 'epoch': 0.78} 78%|███████▊ | 9639/12313 [7:13:15<1:57:51, 2.64s/it] 78%|███████▊ | 9640/12313 [7:13:18<1:56:22, 2.61s/it] {'loss': 0.578, 'grad_norm': 4.338185170282578, 'learning_rate': 5.929442941042066e-07, 'epoch': 0.78} 78%|███████▊ | 9640/12313 [7:13:18<1:56:22, 2.61s/it] 78%|███████▊ | 9641/12313 [7:13:21<2:01:05, 2.72s/it] {'loss': 0.4616, 'grad_norm': 6.223280812892159, 'learning_rate': 5.925191361968194e-07, 'epoch': 0.78} 78%|███████▊ | 9641/12313 [7:13:21<2:01:05, 2.72s/it] 78%|███████▊ | 9642/12313 [7:13:24<1:59:40, 2.69s/it] {'loss': 0.539, 'grad_norm': 3.9906953099976397, 'learning_rate': 5.920941102770128e-07, 'epoch': 0.78} 78%|███████▊ | 9642/12313 [7:13:24<1:59:40, 2.69s/it] 78%|███████▊ | 9643/12313 [7:13:26<1:57:21, 2.64s/it] {'loss': 0.5437, 'grad_norm': 4.895008165041246, 'learning_rate': 5.916692163741972e-07, 'epoch': 0.78} 78%|███████▊ | 9643/12313 [7:13:26<1:57:21, 2.64s/it] 78%|███████▊ | 9644/12313 [7:13:29<1:55:53, 2.61s/it] {'loss': 0.4387, 'grad_norm': 7.001022857461772, 'learning_rate': 5.91244454517772e-07, 'epoch': 0.78} 78%|███████▊ | 9644/12313 [7:13:29<1:55:53, 2.61s/it] 78%|███████▊ | 9645/12313 [7:13:31<1:58:11, 2.66s/it] {'loss': 0.4938, 'grad_norm': 8.865782780347201, 'learning_rate': 5.908198247371289e-07, 'epoch': 0.78} 78%|███████▊ | 9645/12313 [7:13:31<1:58:11, 2.66s/it] 78%|███████▊ | 9646/12313 [7:13:34<1:57:36, 2.65s/it] {'loss': 0.4084, 'grad_norm': 3.951135802126336, 'learning_rate': 5.903953270616486e-07, 'epoch': 0.78} 78%|███████▊ | 9646/12313 [7:13:34<1:57:36, 2.65s/it] 78%|███████▊ | 9647/12313 [7:13:37<1:57:52, 2.65s/it] {'loss': 0.5784, 'grad_norm': 6.755133249113319, 'learning_rate': 5.899709615207055e-07, 'epoch': 0.78} 78%|███████▊ | 9647/12313 [7:13:37<1:57:52, 2.65s/it] 78%|███████▊ | 9648/12313 [7:13:40<2:00:30, 2.71s/it] {'loss': 0.5064, 'grad_norm': 7.946028938098305, 'learning_rate': 5.895467281436637e-07, 'epoch': 0.78} 78%|███████▊ | 9648/12313 [7:13:40<2:00:30, 2.71s/it] 78%|███████▊ | 9649/12313 [7:13:42<1:59:30, 2.69s/it] {'loss': 0.6636, 'grad_norm': 5.014225943190868, 'learning_rate': 5.891226269598768e-07, 'epoch': 0.78} 78%|███████▊ | 9649/12313 [7:13:42<1:59:30, 2.69s/it] 78%|███████▊ | 9650/12313 [7:13:45<1:59:13, 2.69s/it] {'loss': 0.4543, 'grad_norm': 5.034789109316721, 'learning_rate': 5.886986579986917e-07, 'epoch': 0.78} 78%|███████▊ | 9650/12313 [7:13:45<1:59:13, 2.69s/it] 78%|███████▊ | 9651/12313 [7:13:47<1:57:16, 2.64s/it] {'loss': 0.5141, 'grad_norm': 5.031634092156704, 'learning_rate': 5.882748212894441e-07, 'epoch': 0.78} 78%|███████▊ | 9651/12313 [7:13:47<1:57:16, 2.64s/it] 78%|███████▊ | 9652/12313 [7:13:50<1:55:23, 2.60s/it] {'loss': 0.5764, 'grad_norm': 3.651635786477034, 'learning_rate': 5.878511168614601e-07, 'epoch': 0.78} 78%|███████▊ | 9652/12313 [7:13:50<1:55:23, 2.60s/it] 78%|███████▊ | 9653/12313 [7:13:53<2:02:06, 2.75s/it] {'loss': 0.4339, 'grad_norm': 4.577152340868677, 'learning_rate': 5.874275447440599e-07, 'epoch': 0.78} 78%|███████▊ | 9653/12313 [7:13:53<2:02:06, 2.75s/it] 78%|███████▊ | 9654/12313 [7:13:55<1:57:36, 2.65s/it] {'loss': 0.5523, 'grad_norm': 6.2520933191135875, 'learning_rate': 5.870041049665507e-07, 'epoch': 0.78} 78%|███████▊ | 9654/12313 [7:13:55<1:57:36, 2.65s/it] 78%|███████▊ | 9655/12313 [7:13:58<1:59:48, 2.70s/it] {'loss': 0.4017, 'grad_norm': 4.385286574382671, 'learning_rate': 5.86580797558233e-07, 'epoch': 0.78} 78%|███████▊ | 9655/12313 [7:13:58<1:59:48, 2.70s/it] 78%|███████▊ | 9656/12313 [7:14:01<1:58:25, 2.67s/it] {'loss': 0.4594, 'grad_norm': 3.6357368477226233, 'learning_rate': 5.861576225483984e-07, 'epoch': 0.78} 78%|███████▊ | 9656/12313 [7:14:01<1:58:25, 2.67s/it] 78%|███████▊ | 9657/12313 [7:14:04<1:59:26, 2.70s/it] {'loss': 0.3793, 'grad_norm': 5.476827776987912, 'learning_rate': 5.857345799663272e-07, 'epoch': 0.78} 78%|███████▊ | 9657/12313 [7:14:04<1:59:26, 2.70s/it] 78%|███████▊ | 9658/12313 [7:14:06<1:58:22, 2.68s/it] {'loss': 0.4516, 'grad_norm': 5.500782599304935, 'learning_rate': 5.853116698412913e-07, 'epoch': 0.78} 78%|███████▊ | 9658/12313 [7:14:06<1:58:22, 2.68s/it] 78%|███████▊ | 9659/12313 [7:14:09<2:00:08, 2.72s/it] {'loss': 0.5161, 'grad_norm': 4.882146338011125, 'learning_rate': 5.848888922025553e-07, 'epoch': 0.78} 78%|███████▊ | 9659/12313 [7:14:09<2:00:08, 2.72s/it] 78%|███████▊ | 9660/12313 [7:14:12<1:58:13, 2.67s/it] {'loss': 0.4623, 'grad_norm': 5.737418334120377, 'learning_rate': 5.844662470793716e-07, 'epoch': 0.78} 78%|███████▊ | 9660/12313 [7:14:12<1:58:13, 2.67s/it] 78%|███████▊ | 9661/12313 [7:14:15<2:02:47, 2.78s/it] {'loss': 0.5734, 'grad_norm': 5.3911366434680685, 'learning_rate': 5.840437345009859e-07, 'epoch': 0.78} 78%|███████▊ | 9661/12313 [7:14:15<2:02:47, 2.78s/it] 78%|███████▊ | 9662/12313 [7:14:17<2:02:26, 2.77s/it] {'loss': 0.5312, 'grad_norm': 4.290906281542307, 'learning_rate': 5.83621354496634e-07, 'epoch': 0.78} 78%|███████▊ | 9662/12313 [7:14:17<2:02:26, 2.77s/it] 78%|███████▊ | 9663/12313 [7:14:20<2:04:14, 2.81s/it] {'loss': 0.4219, 'grad_norm': 5.746160856248301, 'learning_rate': 5.831991070955426e-07, 'epoch': 0.78} 78%|███████▊ | 9663/12313 [7:14:20<2:04:14, 2.81s/it] 78%|███████▊ | 9664/12313 [7:14:23<2:01:13, 2.75s/it] {'loss': 0.5044, 'grad_norm': 5.04902095963468, 'learning_rate': 5.827769923269283e-07, 'epoch': 0.78} 78%|███████▊ | 9664/12313 [7:14:23<2:01:13, 2.75s/it] 78%|███████▊ | 9665/12313 [7:14:25<1:57:21, 2.66s/it] {'loss': 0.4453, 'grad_norm': 10.70114453686239, 'learning_rate': 5.823550102199985e-07, 'epoch': 0.78} 78%|███████▊ | 9665/12313 [7:14:25<1:57:21, 2.66s/it] 79%|███████▊ | 9666/12313 [7:14:28<1:59:35, 2.71s/it] {'loss': 0.4819, 'grad_norm': 4.150840098492769, 'learning_rate': 5.819331608039538e-07, 'epoch': 0.79} 79%|███████▊ | 9666/12313 [7:14:28<1:59:35, 2.71s/it] 79%|███████▊ | 9667/12313 [7:14:31<1:57:51, 2.67s/it] {'loss': 0.5268, 'grad_norm': 7.526924883890027, 'learning_rate': 5.815114441079825e-07, 'epoch': 0.79} 79%|███████▊ | 9667/12313 [7:14:31<1:57:51, 2.67s/it] 79%|███████▊ | 9668/12313 [7:14:33<1:56:49, 2.65s/it] {'loss': 0.6169, 'grad_norm': 6.803056853476825, 'learning_rate': 5.810898601612657e-07, 'epoch': 0.79} 79%|███████▊ | 9668/12313 [7:14:33<1:56:49, 2.65s/it] 79%|███████▊ | 9669/12313 [7:14:36<1:56:59, 2.65s/it] {'loss': 0.4574, 'grad_norm': 6.656336422437745, 'learning_rate': 5.806684089929756e-07, 'epoch': 0.79} 79%|███████▊ | 9669/12313 [7:14:36<1:56:59, 2.65s/it] 79%|███████▊ | 9670/12313 [7:14:39<1:57:18, 2.66s/it] {'loss': 0.5343, 'grad_norm': 4.099463565565569, 'learning_rate': 5.802470906322738e-07, 'epoch': 0.79} 79%|███████▊ | 9670/12313 [7:14:39<1:57:18, 2.66s/it] 79%|███████▊ | 9671/12313 [7:14:41<1:57:18, 2.66s/it] {'loss': 0.4658, 'grad_norm': 9.804778473842848, 'learning_rate': 5.798259051083124e-07, 'epoch': 0.79} 79%|███████▊ | 9671/12313 [7:14:41<1:57:18, 2.66s/it] 79%|███████▊ | 9672/12313 [7:14:44<1:57:32, 2.67s/it] {'loss': 0.3633, 'grad_norm': 5.831632381423531, 'learning_rate': 5.794048524502366e-07, 'epoch': 0.79} 79%|███████▊ | 9672/12313 [7:14:44<1:57:32, 2.67s/it] 79%|███████▊ | 9673/12313 [7:14:47<1:56:56, 2.66s/it] {'loss': 0.5674, 'grad_norm': 4.512592207772697, 'learning_rate': 5.789839326871799e-07, 'epoch': 0.79} 79%|███████▊ | 9673/12313 [7:14:47<1:56:56, 2.66s/it] 79%|███████▊ | 9674/12313 [7:14:49<1:57:28, 2.67s/it] {'loss': 0.4566, 'grad_norm': 7.058643124685738, 'learning_rate': 5.785631458482679e-07, 'epoch': 0.79} 79%|███████▊ | 9674/12313 [7:14:49<1:57:28, 2.67s/it] 79%|███████▊ | 9675/12313 [7:14:52<1:57:38, 2.68s/it] {'loss': 0.5513, 'grad_norm': 9.677711342457231, 'learning_rate': 5.781424919626183e-07, 'epoch': 0.79} 79%|███████▊ | 9675/12313 [7:14:52<1:57:38, 2.68s/it] 79%|███████▊ | 9676/12313 [7:14:55<1:57:38, 2.68s/it] {'loss': 0.4374, 'grad_norm': 7.654772369572434, 'learning_rate': 5.777219710593365e-07, 'epoch': 0.79} 79%|███████▊ | 9676/12313 [7:14:55<1:57:38, 2.68s/it] 79%|███████▊ | 9677/12313 [7:14:57<1:56:03, 2.64s/it] {'loss': 0.518, 'grad_norm': 5.412971281189123, 'learning_rate': 5.773015831675204e-07, 'epoch': 0.79} 79%|███████▊ | 9677/12313 [7:14:57<1:56:03, 2.64s/it] 79%|███████▊ | 9678/12313 [7:15:00<1:56:18, 2.65s/it] {'loss': 0.4422, 'grad_norm': 7.042856826106027, 'learning_rate': 5.768813283162597e-07, 'epoch': 0.79} 79%|███████▊ | 9678/12313 [7:15:00<1:56:18, 2.65s/it] 79%|███████▊ | 9679/12313 [7:15:03<1:57:36, 2.68s/it] {'loss': 0.3727, 'grad_norm': 8.697761416414153, 'learning_rate': 5.764612065346328e-07, 'epoch': 0.79} 79%|███████▊ | 9679/12313 [7:15:03<1:57:36, 2.68s/it] 79%|███████▊ | 9680/12313 [7:15:05<1:54:21, 2.61s/it] {'loss': 0.3518, 'grad_norm': 11.951299771554737, 'learning_rate': 5.760412178517099e-07, 'epoch': 0.79} 79%|███████▊ | 9680/12313 [7:15:05<1:54:21, 2.61s/it] 79%|███████▊ | 9681/12313 [7:15:08<1:56:22, 2.65s/it] {'loss': 0.7013, 'grad_norm': 5.170910615421707, 'learning_rate': 5.75621362296552e-07, 'epoch': 0.79} 79%|███████▊ | 9681/12313 [7:15:08<1:56:22, 2.65s/it] 79%|███████▊ | 9682/12313 [7:15:11<2:01:01, 2.76s/it] {'loss': 0.4973, 'grad_norm': 15.391790408942455, 'learning_rate': 5.752016398982122e-07, 'epoch': 0.79} 79%|███████▊ | 9682/12313 [7:15:11<2:01:01, 2.76s/it] 79%|███████▊ | 9683/12313 [7:15:14<1:58:19, 2.70s/it] {'loss': 0.5486, 'grad_norm': 6.4992455503524145, 'learning_rate': 5.747820506857318e-07, 'epoch': 0.79} 79%|███████▊ | 9683/12313 [7:15:14<1:58:19, 2.70s/it] 79%|███████▊ | 9684/12313 [7:15:16<1:57:42, 2.69s/it] {'loss': 0.434, 'grad_norm': 5.514244182216503, 'learning_rate': 5.74362594688144e-07, 'epoch': 0.79} 79%|███████▊ | 9684/12313 [7:15:16<1:57:42, 2.69s/it] 79%|███████▊ | 9685/12313 [7:15:19<1:57:44, 2.69s/it] {'loss': 0.4048, 'grad_norm': 4.203294166143777, 'learning_rate': 5.739432719344737e-07, 'epoch': 0.79} 79%|███████▊ | 9685/12313 [7:15:19<1:57:44, 2.69s/it] 79%|███████▊ | 9686/12313 [7:15:22<1:58:46, 2.71s/it] {'loss': 0.4279, 'grad_norm': 5.401361035345732, 'learning_rate': 5.73524082453735e-07, 'epoch': 0.79} 79%|███████▊ | 9686/12313 [7:15:22<1:58:46, 2.71s/it] 79%|███████▊ | 9687/12313 [7:15:24<1:59:30, 2.73s/it] {'loss': 0.4841, 'grad_norm': 4.158513777957124, 'learning_rate': 5.731050262749341e-07, 'epoch': 0.79} 79%|███████▊ | 9687/12313 [7:15:24<1:59:30, 2.73s/it] 79%|███████▊ | 9688/12313 [7:15:27<2:00:26, 2.75s/it] {'loss': 0.3681, 'grad_norm': 7.879580004180549, 'learning_rate': 5.726861034270681e-07, 'epoch': 0.79} 79%|███████▊ | 9688/12313 [7:15:27<2:00:26, 2.75s/it] 79%|███████▊ | 9689/12313 [7:15:30<1:59:06, 2.72s/it] {'loss': 0.4581, 'grad_norm': 3.9835235897565107, 'learning_rate': 5.722673139391236e-07, 'epoch': 0.79} 79%|███████▊ | 9689/12313 [7:15:30<1:59:06, 2.72s/it] 79%|███████▊ | 9690/12313 [7:15:32<1:57:06, 2.68s/it] {'loss': 0.4241, 'grad_norm': 5.506433493815765, 'learning_rate': 5.718486578400775e-07, 'epoch': 0.79} 79%|███████▊ | 9690/12313 [7:15:32<1:57:06, 2.68s/it] 79%|███████▊ | 9691/12313 [7:15:35<1:56:21, 2.66s/it] {'loss': 0.6751, 'grad_norm': 6.423378795375947, 'learning_rate': 5.714301351589008e-07, 'epoch': 0.79} 79%|███████▊ | 9691/12313 [7:15:35<1:56:21, 2.66s/it] 79%|███████▊ | 9692/12313 [7:15:38<1:55:12, 2.64s/it] {'loss': 0.6218, 'grad_norm': 4.6037563920362095, 'learning_rate': 5.710117459245518e-07, 'epoch': 0.79} 79%|███████▊ | 9692/12313 [7:15:38<1:55:12, 2.64s/it] 79%|███████▊ | 9693/12313 [7:15:40<1:56:10, 2.66s/it] {'loss': 0.609, 'grad_norm': 5.535444733765645, 'learning_rate': 5.705934901659804e-07, 'epoch': 0.79} 79%|███████▊ | 9693/12313 [7:15:40<1:56:10, 2.66s/it] 79%|███████▊ | 9694/12313 [7:15:43<2:01:06, 2.77s/it] {'loss': 0.3255, 'grad_norm': 6.072040161253787, 'learning_rate': 5.70175367912128e-07, 'epoch': 0.79} 79%|███████▊ | 9694/12313 [7:15:43<2:01:06, 2.77s/it] 79%|███████▊ | 9695/12313 [7:15:46<2:04:28, 2.85s/it] {'loss': 0.3883, 'grad_norm': 3.930200276565603, 'learning_rate': 5.697573791919275e-07, 'epoch': 0.79} 79%|███████▊ | 9695/12313 [7:15:46<2:04:28, 2.85s/it] 79%|███████▊ | 9696/12313 [7:15:49<1:58:57, 2.73s/it] {'loss': 0.469, 'grad_norm': 13.405961625131877, 'learning_rate': 5.693395240343e-07, 'epoch': 0.79} 79%|███████▊ | 9696/12313 [7:15:49<1:58:57, 2.73s/it] 79%|███████▉ | 9697/12313 [7:15:51<1:57:02, 2.68s/it] {'loss': 0.4996, 'grad_norm': 3.8674362850461357, 'learning_rate': 5.689218024681603e-07, 'epoch': 0.79} 79%|███████▉ | 9697/12313 [7:15:51<1:57:02, 2.68s/it] 79%|███████▉ | 9698/12313 [7:15:54<1:53:46, 2.61s/it] {'loss': 0.3813, 'grad_norm': 6.411118676070839, 'learning_rate': 5.685042145224118e-07, 'epoch': 0.79} 79%|███████▉ | 9698/12313 [7:15:54<1:53:46, 2.61s/it] 79%|███████▉ | 9699/12313 [7:15:57<1:56:43, 2.68s/it] {'loss': 0.449, 'grad_norm': 3.36119530692789, 'learning_rate': 5.680867602259485e-07, 'epoch': 0.79} 79%|███████▉ | 9699/12313 [7:15:57<1:56:43, 2.68s/it] 79%|███████▉ | 9700/12313 [7:15:59<1:56:29, 2.67s/it] {'loss': 0.5444, 'grad_norm': 5.637861341121901, 'learning_rate': 5.676694396076568e-07, 'epoch': 0.79} 79%|███████▉ | 9700/12313 [7:15:59<1:56:29, 2.67s/it] 79%|███████▉ | 9701/12313 [7:16:02<1:57:37, 2.70s/it] {'loss': 0.4141, 'grad_norm': 8.011959392935337, 'learning_rate': 5.672522526964141e-07, 'epoch': 0.79} 79%|███████▉ | 9701/12313 [7:16:02<1:57:37, 2.70s/it] 79%|███████▉ | 9702/12313 [7:16:05<1:55:45, 2.66s/it] {'loss': 0.5489, 'grad_norm': 4.254934260559165, 'learning_rate': 5.668351995210866e-07, 'epoch': 0.79} 79%|███████▉ | 9702/12313 [7:16:05<1:55:45, 2.66s/it] 79%|███████▉ | 9703/12313 [7:16:08<2:00:28, 2.77s/it] {'loss': 0.4893, 'grad_norm': 4.2266389347699995, 'learning_rate': 5.664182801105314e-07, 'epoch': 0.79} 79%|███████▉ | 9703/12313 [7:16:08<2:00:28, 2.77s/it] 79%|███████▉ | 9704/12313 [7:16:11<2:01:14, 2.79s/it] {'loss': 0.5584, 'grad_norm': 9.12782592389075, 'learning_rate': 5.660014944935985e-07, 'epoch': 0.79} 79%|███████▉ | 9704/12313 [7:16:11<2:01:14, 2.79s/it] 79%|███████▉ | 9705/12313 [7:16:13<1:58:42, 2.73s/it] {'loss': 0.4832, 'grad_norm': 4.637756752610576, 'learning_rate': 5.655848426991267e-07, 'epoch': 0.79} 79%|███████▉ | 9705/12313 [7:16:13<1:58:42, 2.73s/it] 79%|███████▉ | 9706/12313 [7:16:16<2:00:22, 2.77s/it] {'loss': 0.3528, 'grad_norm': 5.685502111243551, 'learning_rate': 5.651683247559445e-07, 'epoch': 0.79} 79%|███████▉ | 9706/12313 [7:16:16<2:00:22, 2.77s/it] 79%|███████▉ | 9707/12313 [7:16:19<1:57:21, 2.70s/it] {'loss': 0.3939, 'grad_norm': 6.879039707986046, 'learning_rate': 5.647519406928758e-07, 'epoch': 0.79} 79%|███████▉ | 9707/12313 [7:16:19<1:57:21, 2.70s/it] 79%|███████▉ | 9708/12313 [7:16:21<1:57:53, 2.72s/it] {'loss': 0.6541, 'grad_norm': 4.090096342166239, 'learning_rate': 5.643356905387307e-07, 'epoch': 0.79} 79%|███████▉ | 9708/12313 [7:16:21<1:57:53, 2.72s/it] 79%|███████▉ | 9709/12313 [7:16:24<1:57:40, 2.71s/it] {'loss': 0.4791, 'grad_norm': 5.705199569403093, 'learning_rate': 5.639195743223105e-07, 'epoch': 0.79} 79%|███████▉ | 9709/12313 [7:16:24<1:57:40, 2.71s/it] 79%|███████▉ | 9710/12313 [7:16:27<1:54:55, 2.65s/it] {'loss': 0.4197, 'grad_norm': 6.503352654339194, 'learning_rate': 5.635035920724102e-07, 'epoch': 0.79} 79%|███████▉ | 9710/12313 [7:16:27<1:54:55, 2.65s/it] 79%|███████▉ | 9711/12313 [7:16:29<1:54:06, 2.63s/it] {'loss': 0.6805, 'grad_norm': 3.9110515788245204, 'learning_rate': 5.630877438178126e-07, 'epoch': 0.79} 79%|███████▉ | 9711/12313 [7:16:29<1:54:06, 2.63s/it] 79%|███████▉ | 9712/12313 [7:16:32<1:53:08, 2.61s/it] {'loss': 0.4074, 'grad_norm': 6.1493240468982275, 'learning_rate': 5.626720295872911e-07, 'epoch': 0.79} 79%|███████▉ | 9712/12313 [7:16:32<1:53:08, 2.61s/it] 79%|███████▉ | 9713/12313 [7:16:34<1:54:18, 2.64s/it] {'loss': 0.4565, 'grad_norm': 3.943608994312866, 'learning_rate': 5.622564494096122e-07, 'epoch': 0.79} 79%|███████▉ | 9713/12313 [7:16:34<1:54:18, 2.64s/it] 79%|███████▉ | 9714/12313 [7:16:37<1:52:01, 2.59s/it] {'loss': 0.5792, 'grad_norm': 7.779376580705911, 'learning_rate': 5.618410033135325e-07, 'epoch': 0.79} 79%|███████▉ | 9714/12313 [7:16:37<1:52:01, 2.59s/it] 79%|███████▉ | 9715/12313 [7:16:39<1:50:45, 2.56s/it] {'loss': 0.3237, 'grad_norm': 3.3797672800758134, 'learning_rate': 5.614256913277968e-07, 'epoch': 0.79} 79%|███████▉ | 9715/12313 [7:16:39<1:50:45, 2.56s/it] 79%|███████▉ | 9716/12313 [7:16:42<1:48:33, 2.51s/it] {'loss': 0.4209, 'grad_norm': 5.330674659642353, 'learning_rate': 5.610105134811444e-07, 'epoch': 0.79} 79%|███████▉ | 9716/12313 [7:16:42<1:48:33, 2.51s/it] 79%|███████▉ | 9717/12313 [7:16:44<1:50:29, 2.55s/it] {'loss': 0.4035, 'grad_norm': 4.322221219959529, 'learning_rate': 5.605954698023023e-07, 'epoch': 0.79} 79%|███████▉ | 9717/12313 [7:16:44<1:50:29, 2.55s/it] 79%|███████▉ | 9718/12313 [7:16:47<1:53:03, 2.61s/it] {'loss': 0.3763, 'grad_norm': 9.488626438377107, 'learning_rate': 5.601805603199889e-07, 'epoch': 0.79} 79%|███████▉ | 9718/12313 [7:16:47<1:53:03, 2.61s/it] 79%|███████▉ | 9719/12313 [7:16:51<2:03:28, 2.86s/it] {'loss': 0.5129, 'grad_norm': 5.681079603298455, 'learning_rate': 5.597657850629145e-07, 'epoch': 0.79} 79%|███████▉ | 9719/12313 [7:16:51<2:03:28, 2.86s/it] 79%|███████▉ | 9720/12313 [7:16:53<1:58:28, 2.74s/it] {'loss': 0.4432, 'grad_norm': 4.177871819811436, 'learning_rate': 5.593511440597799e-07, 'epoch': 0.79} 79%|███████▉ | 9720/12313 [7:16:53<1:58:28, 2.74s/it] 79%|███████▉ | 9721/12313 [7:16:56<2:01:23, 2.81s/it] {'loss': 0.4561, 'grad_norm': 7.924561057142348, 'learning_rate': 5.589366373392754e-07, 'epoch': 0.79} 79%|███████▉ | 9721/12313 [7:16:56<2:01:23, 2.81s/it] 79%|███████▉ | 9722/12313 [7:16:59<1:56:38, 2.70s/it] {'loss': 0.5691, 'grad_norm': 3.32290128813931, 'learning_rate': 5.58522264930082e-07, 'epoch': 0.79} 79%|███████▉ | 9722/12313 [7:16:59<1:56:38, 2.70s/it] 79%|███████▉ | 9723/12313 [7:17:01<1:56:46, 2.71s/it] {'loss': 0.5209, 'grad_norm': 7.189607322217405, 'learning_rate': 5.581080268608733e-07, 'epoch': 0.79} 79%|███████▉ | 9723/12313 [7:17:01<1:56:46, 2.71s/it] 79%|███████▉ | 9724/12313 [7:17:04<1:56:19, 2.70s/it] {'loss': 0.533, 'grad_norm': 6.847085179945522, 'learning_rate': 5.576939231603118e-07, 'epoch': 0.79} 79%|███████▉ | 9724/12313 [7:17:04<1:56:19, 2.70s/it] 79%|███████▉ | 9725/12313 [7:17:06<1:54:00, 2.64s/it] {'loss': 0.3767, 'grad_norm': 5.441399177870424, 'learning_rate': 5.572799538570506e-07, 'epoch': 0.79} 79%|███████▉ | 9725/12313 [7:17:06<1:54:00, 2.64s/it] 79%|███████▉ | 9726/12313 [7:17:09<1:53:47, 2.64s/it] {'loss': 0.5119, 'grad_norm': 5.540677770063936, 'learning_rate': 5.56866118979735e-07, 'epoch': 0.79} 79%|███████▉ | 9726/12313 [7:17:09<1:53:47, 2.64s/it] 79%|███████▉ | 9727/12313 [7:17:12<1:54:34, 2.66s/it] {'loss': 0.5606, 'grad_norm': 8.524810246072764, 'learning_rate': 5.564524185570008e-07, 'epoch': 0.79} 79%|███████▉ | 9727/12313 [7:17:12<1:54:34, 2.66s/it] 79%|███████▉ | 9728/12313 [7:17:14<1:54:44, 2.66s/it] {'loss': 0.6541, 'grad_norm': 3.4073724271339327, 'learning_rate': 5.560388526174723e-07, 'epoch': 0.79} 79%|███████▉ | 9728/12313 [7:17:14<1:54:44, 2.66s/it] 79%|███████▉ | 9729/12313 [7:17:17<1:53:19, 2.63s/it] {'loss': 0.481, 'grad_norm': 4.045724757251306, 'learning_rate': 5.556254211897677e-07, 'epoch': 0.79} 79%|███████▉ | 9729/12313 [7:17:17<1:53:19, 2.63s/it] 79%|███████▉ | 9730/12313 [7:17:20<1:53:38, 2.64s/it] {'loss': 0.456, 'grad_norm': 6.886529740016336, 'learning_rate': 5.552121243024935e-07, 'epoch': 0.79} 79%|███████▉ | 9730/12313 [7:17:20<1:53:38, 2.64s/it] 79%|███████▉ | 9731/12313 [7:17:22<1:52:52, 2.62s/it] {'loss': 0.4138, 'grad_norm': 4.136880017146439, 'learning_rate': 5.54798961984247e-07, 'epoch': 0.79} 79%|███████▉ | 9731/12313 [7:17:22<1:52:52, 2.62s/it] 79%|███████▉ | 9732/12313 [7:17:25<1:52:03, 2.60s/it] {'loss': 0.4179, 'grad_norm': 5.315316660423206, 'learning_rate': 5.543859342636177e-07, 'epoch': 0.79} 79%|███████▉ | 9732/12313 [7:17:25<1:52:03, 2.60s/it] 79%|███████▉ | 9733/12313 [7:17:28<1:58:34, 2.76s/it] {'loss': 0.4608, 'grad_norm': 9.808582227554735, 'learning_rate': 5.539730411691851e-07, 'epoch': 0.79} 79%|███████▉ | 9733/12313 [7:17:28<1:58:34, 2.76s/it] 79%|███████▉ | 9734/12313 [7:17:31<2:01:45, 2.83s/it] {'loss': 0.4465, 'grad_norm': 3.7605530791600725, 'learning_rate': 5.535602827295189e-07, 'epoch': 0.79} 79%|███████▉ | 9734/12313 [7:17:31<2:01:45, 2.83s/it] 79%|███████▉ | 9735/12313 [7:17:33<1:58:41, 2.76s/it] {'loss': 0.3428, 'grad_norm': 5.688896710047547, 'learning_rate': 5.53147658973179e-07, 'epoch': 0.79} 79%|███████▉ | 9735/12313 [7:17:34<1:58:41, 2.76s/it] 79%|███████▉ | 9736/12313 [7:17:36<1:56:19, 2.71s/it] {'loss': 0.4684, 'grad_norm': 11.37494567313138, 'learning_rate': 5.527351699287184e-07, 'epoch': 0.79} 79%|███████▉ | 9736/12313 [7:17:36<1:56:19, 2.71s/it] 79%|███████▉ | 9737/12313 [7:17:39<1:53:07, 2.63s/it] {'loss': 0.5216, 'grad_norm': 6.335210866254778, 'learning_rate': 5.523228156246782e-07, 'epoch': 0.79} 79%|███████▉ | 9737/12313 [7:17:39<1:53:07, 2.63s/it] 79%|███████▉ | 9738/12313 [7:17:41<1:52:14, 2.62s/it] {'loss': 0.6721, 'grad_norm': 3.543537249807746, 'learning_rate': 5.519105960895904e-07, 'epoch': 0.79} 79%|███████▉ | 9738/12313 [7:17:41<1:52:14, 2.62s/it] 79%|███████▉ | 9739/12313 [7:17:45<2:04:10, 2.89s/it] {'loss': 0.5059, 'grad_norm': 3.746923394479714, 'learning_rate': 5.514985113519794e-07, 'epoch': 0.79} 79%|███████▉ | 9739/12313 [7:17:45<2:04:10, 2.89s/it] 79%|███████▉ | 9740/12313 [7:17:48<2:03:47, 2.89s/it] {'loss': 0.5028, 'grad_norm': 3.243317942178259, 'learning_rate': 5.510865614403599e-07, 'epoch': 0.79} 79%|███████▉ | 9740/12313 [7:17:48<2:03:47, 2.89s/it] 79%|███████▉ | 9741/12313 [7:17:50<2:04:43, 2.91s/it] {'loss': 0.506, 'grad_norm': 5.231091301479268, 'learning_rate': 5.506747463832348e-07, 'epoch': 0.79} 79%|███████▉ | 9741/12313 [7:17:50<2:04:43, 2.91s/it] 79%|███████▉ | 9742/12313 [7:17:53<2:02:55, 2.87s/it] {'loss': 0.4508, 'grad_norm': 3.037936622624663, 'learning_rate': 5.502630662091016e-07, 'epoch': 0.79} 79%|███████▉ | 9742/12313 [7:17:53<2:02:55, 2.87s/it] 79%|███████▉ | 9743/12313 [7:17:56<2:01:03, 2.83s/it] {'loss': 0.5612, 'grad_norm': 5.011070638322092, 'learning_rate': 5.498515209464453e-07, 'epoch': 0.79} 79%|███████▉ | 9743/12313 [7:17:56<2:01:03, 2.83s/it] 79%|███████▉ | 9744/12313 [7:17:59<1:59:59, 2.80s/it] {'loss': 0.4245, 'grad_norm': 57.93538906369309, 'learning_rate': 5.49440110623742e-07, 'epoch': 0.79} 79%|███████▉ | 9744/12313 [7:17:59<1:59:59, 2.80s/it] 79%|███████▉ | 9745/12313 [7:18:01<1:58:25, 2.77s/it] {'loss': 0.4482, 'grad_norm': 4.391572013936799, 'learning_rate': 5.490288352694598e-07, 'epoch': 0.79} 79%|███████▉ | 9745/12313 [7:18:01<1:58:25, 2.77s/it] 79%|███████▉ | 9746/12313 [7:18:04<1:55:50, 2.71s/it] {'loss': 0.6039, 'grad_norm': 4.558873733467807, 'learning_rate': 5.486176949120575e-07, 'epoch': 0.79} 79%|███████▉ | 9746/12313 [7:18:04<1:55:50, 2.71s/it] 79%|███████▉ | 9747/12313 [7:18:07<1:56:48, 2.73s/it] {'loss': 0.467, 'grad_norm': 7.119700732402204, 'learning_rate': 5.482066895799825e-07, 'epoch': 0.79} 79%|███████▉ | 9747/12313 [7:18:07<1:56:48, 2.73s/it] 79%|███████▉ | 9748/12313 [7:18:09<1:53:21, 2.65s/it] {'loss': 0.589, 'grad_norm': 9.302916508581008, 'learning_rate': 5.477958193016758e-07, 'epoch': 0.79} 79%|███████▉ | 9748/12313 [7:18:09<1:53:21, 2.65s/it] 79%|███████▉ | 9749/12313 [7:18:12<1:54:21, 2.68s/it] {'loss': 0.5026, 'grad_norm': 3.810258704896879, 'learning_rate': 5.473850841055664e-07, 'epoch': 0.79} 79%|███████▉ | 9749/12313 [7:18:12<1:54:21, 2.68s/it] 79%|███████▉ | 9750/12313 [7:18:15<1:58:06, 2.76s/it] {'loss': 0.4808, 'grad_norm': 3.970336831775221, 'learning_rate': 5.469744840200741e-07, 'epoch': 0.79} 79%|███████▉ | 9750/12313 [7:18:15<1:58:06, 2.76s/it] 79%|███████▉ | 9751/12313 [7:18:18<1:56:13, 2.72s/it] {'loss': 0.5038, 'grad_norm': 4.344838984297026, 'learning_rate': 5.465640190736124e-07, 'epoch': 0.79} 79%|███████▉ | 9751/12313 [7:18:18<1:56:13, 2.72s/it] 79%|███████▉ | 9752/12313 [7:18:20<1:53:37, 2.66s/it] {'loss': 0.4581, 'grad_norm': 4.912118996595724, 'learning_rate': 5.461536892945812e-07, 'epoch': 0.79} 79%|███████▉ | 9752/12313 [7:18:20<1:53:37, 2.66s/it] 79%|███████▉ | 9753/12313 [7:18:23<1:51:54, 2.62s/it] {'loss': 0.4009, 'grad_norm': 5.652749892181998, 'learning_rate': 5.457434947113749e-07, 'epoch': 0.79} 79%|███████▉ | 9753/12313 [7:18:23<1:51:54, 2.62s/it] 79%|███████▉ | 9754/12313 [7:18:25<1:51:28, 2.61s/it] {'loss': 0.4287, 'grad_norm': 17.57147989196119, 'learning_rate': 5.453334353523754e-07, 'epoch': 0.79} 79%|███████▉ | 9754/12313 [7:18:25<1:51:28, 2.61s/it] 79%|███████▉ | 9755/12313 [7:18:28<1:56:23, 2.73s/it] {'loss': 0.4907, 'grad_norm': 3.898615950581783, 'learning_rate': 5.449235112459577e-07, 'epoch': 0.79} 79%|███████▉ | 9755/12313 [7:18:28<1:56:23, 2.73s/it] 79%|███████▉ | 9756/12313 [7:18:31<1:58:02, 2.77s/it] {'loss': 0.6232, 'grad_norm': 6.661559536211671, 'learning_rate': 5.445137224204861e-07, 'epoch': 0.79} 79%|███████▉ | 9756/12313 [7:18:31<1:58:02, 2.77s/it] 79%|███████▉ | 9757/12313 [7:18:34<1:55:56, 2.72s/it] {'loss': 0.3485, 'grad_norm': 4.6703720755130265, 'learning_rate': 5.441040689043148e-07, 'epoch': 0.79} 79%|███████▉ | 9757/12313 [7:18:34<1:55:56, 2.72s/it] 79%|███████▉ | 9758/12313 [7:18:37<1:57:10, 2.75s/it] {'loss': 0.485, 'grad_norm': 7.3100697341525995, 'learning_rate': 5.436945507257907e-07, 'epoch': 0.79} 79%|███████▉ | 9758/12313 [7:18:37<1:57:10, 2.75s/it] 79%|███████▉ | 9759/12313 [7:18:39<1:54:13, 2.68s/it] {'loss': 0.5487, 'grad_norm': 5.877076789096071, 'learning_rate': 5.432851679132506e-07, 'epoch': 0.79} 79%|███████▉ | 9759/12313 [7:18:39<1:54:13, 2.68s/it] 79%|███████▉ | 9760/12313 [7:18:42<1:51:27, 2.62s/it] {'loss': 0.8112, 'grad_norm': 6.67706951439948, 'learning_rate': 5.428759204950204e-07, 'epoch': 0.79} 79%|███████▉ | 9760/12313 [7:18:42<1:51:27, 2.62s/it] 79%|███████▉ | 9761/12313 [7:18:44<1:50:36, 2.60s/it] {'loss': 0.5156, 'grad_norm': 5.439821280458846, 'learning_rate': 5.424668084994195e-07, 'epoch': 0.79} 79%|███████▉ | 9761/12313 [7:18:44<1:50:36, 2.60s/it] 79%|███████▉ | 9762/12313 [7:18:47<1:54:21, 2.69s/it] {'loss': 0.3297, 'grad_norm': 4.572295130779405, 'learning_rate': 5.420578319547551e-07, 'epoch': 0.79} 79%|███████▉ | 9762/12313 [7:18:47<1:54:21, 2.69s/it] 79%|███████▉ | 9763/12313 [7:18:50<1:53:39, 2.67s/it] {'loss': 0.5123, 'grad_norm': 5.618034848926441, 'learning_rate': 5.416489908893258e-07, 'epoch': 0.79} 79%|███████▉ | 9763/12313 [7:18:50<1:53:39, 2.67s/it] 79%|███████▉ | 9764/12313 [7:18:52<1:52:11, 2.64s/it] {'loss': 0.3917, 'grad_norm': 5.221377474566307, 'learning_rate': 5.412402853314227e-07, 'epoch': 0.79} 79%|███████▉ | 9764/12313 [7:18:52<1:52:11, 2.64s/it] 79%|███████▉ | 9765/12313 [7:18:55<1:50:46, 2.61s/it] {'loss': 0.6027, 'grad_norm': 10.356233466887792, 'learning_rate': 5.408317153093245e-07, 'epoch': 0.79} 79%|███████▉ | 9765/12313 [7:18:55<1:50:46, 2.61s/it] 79%|███████▉ | 9766/12313 [7:18:58<1:54:29, 2.70s/it] {'loss': 0.5062, 'grad_norm': 4.822402093871006, 'learning_rate': 5.404232808513027e-07, 'epoch': 0.79} 79%|███████▉ | 9766/12313 [7:18:58<1:54:29, 2.70s/it] 79%|███████▉ | 9767/12313 [7:19:00<1:56:41, 2.75s/it] {'loss': 0.4948, 'grad_norm': 4.342762770694148, 'learning_rate': 5.400149819856199e-07, 'epoch': 0.79} 79%|███████▉ | 9767/12313 [7:19:00<1:56:41, 2.75s/it] 79%|███████▉ | 9768/12313 [7:19:03<1:56:03, 2.74s/it] {'loss': 0.4032, 'grad_norm': 5.561331791979438, 'learning_rate': 5.396068187405273e-07, 'epoch': 0.79} 79%|███████▉ | 9768/12313 [7:19:03<1:56:03, 2.74s/it] 79%|███████▉ | 9769/12313 [7:19:06<1:57:02, 2.76s/it] {'loss': 0.613, 'grad_norm': 8.670840918114324, 'learning_rate': 5.391987911442667e-07, 'epoch': 0.79} 79%|███████▉ | 9769/12313 [7:19:06<1:57:02, 2.76s/it] 79%|███████▉ | 9770/12313 [7:19:09<1:56:51, 2.76s/it] {'loss': 0.3999, 'grad_norm': 4.613030257259587, 'learning_rate': 5.387908992250731e-07, 'epoch': 0.79} 79%|███████▉ | 9770/12313 [7:19:09<1:56:51, 2.76s/it] 79%|███████▉ | 9771/12313 [7:19:11<1:54:36, 2.71s/it] {'loss': 0.4021, 'grad_norm': 5.329616352899454, 'learning_rate': 5.383831430111691e-07, 'epoch': 0.79} 79%|███████▉ | 9771/12313 [7:19:11<1:54:36, 2.71s/it] 79%|███████▉ | 9772/12313 [7:19:14<1:55:00, 2.72s/it] {'loss': 0.6117, 'grad_norm': 5.051235158011703, 'learning_rate': 5.379755225307707e-07, 'epoch': 0.79} 79%|███████▉ | 9772/12313 [7:19:14<1:55:00, 2.72s/it] 79%|███████▉ | 9773/12313 [7:19:17<1:52:34, 2.66s/it] {'loss': 0.5466, 'grad_norm': 4.393658723431004, 'learning_rate': 5.375680378120812e-07, 'epoch': 0.79} 79%|███████▉ | 9773/12313 [7:19:17<1:52:34, 2.66s/it] 79%|███████▉ | 9774/12313 [7:19:19<1:53:41, 2.69s/it] {'loss': 0.4675, 'grad_norm': 8.092758743122241, 'learning_rate': 5.371606888832984e-07, 'epoch': 0.79} 79%|███████▉ | 9774/12313 [7:19:19<1:53:41, 2.69s/it] 79%|███████▉ | 9775/12313 [7:19:22<1:51:02, 2.62s/it] {'loss': 0.5704, 'grad_norm': 4.672078958128614, 'learning_rate': 5.367534757726079e-07, 'epoch': 0.79} 79%|███████▉ | 9775/12313 [7:19:22<1:51:02, 2.62s/it] 79%|███████▉ | 9776/12313 [7:19:24<1:49:15, 2.58s/it] {'loss': 0.5743, 'grad_norm': 5.017160554885112, 'learning_rate': 5.363463985081854e-07, 'epoch': 0.79} 79%|███████▉ | 9776/12313 [7:19:24<1:49:15, 2.58s/it] 79%|███████▉ | 9777/12313 [7:19:27<1:55:11, 2.73s/it] {'loss': 0.5656, 'grad_norm': 3.292901108068312, 'learning_rate': 5.359394571182e-07, 'epoch': 0.79} 79%|███████▉ | 9777/12313 [7:19:27<1:55:11, 2.73s/it] 79%|███████▉ | 9778/12313 [7:19:30<1:54:18, 2.71s/it] {'loss': 0.484, 'grad_norm': 5.959333091031521, 'learning_rate': 5.355326516308102e-07, 'epoch': 0.79} 79%|███████▉ | 9778/12313 [7:19:30<1:54:18, 2.71s/it] 79%|███████▉ | 9779/12313 [7:19:33<1:51:29, 2.64s/it] {'loss': 0.5468, 'grad_norm': 3.555927641182116, 'learning_rate': 5.351259820741633e-07, 'epoch': 0.79} 79%|███████▉ | 9779/12313 [7:19:33<1:51:29, 2.64s/it] 79%|███████▉ | 9780/12313 [7:19:35<1:52:17, 2.66s/it] {'loss': 0.5981, 'grad_norm': 9.33015466014825, 'learning_rate': 5.347194484764001e-07, 'epoch': 0.79} 79%|███████▉ | 9780/12313 [7:19:35<1:52:17, 2.66s/it] 79%|███████▉ | 9781/12313 [7:19:38<1:51:39, 2.65s/it] {'loss': 0.5224, 'grad_norm': 5.675486153355801, 'learning_rate': 5.343130508656502e-07, 'epoch': 0.79} 79%|███████▉ | 9781/12313 [7:19:38<1:51:39, 2.65s/it] 79%|███████▉ | 9782/12313 [7:19:40<1:50:56, 2.63s/it] {'loss': 0.4188, 'grad_norm': 4.362949089225925, 'learning_rate': 5.339067892700331e-07, 'epoch': 0.79} 79%|███████▉ | 9782/12313 [7:19:40<1:50:56, 2.63s/it] 79%|███████▉ | 9783/12313 [7:19:43<1:55:04, 2.73s/it] {'loss': 0.4768, 'grad_norm': 4.529916181314969, 'learning_rate': 5.335006637176612e-07, 'epoch': 0.79} 79%|███████▉ | 9783/12313 [7:19:43<1:55:04, 2.73s/it] 79%|███████▉ | 9784/12313 [7:19:47<2:00:39, 2.86s/it] {'loss': 0.4384, 'grad_norm': 6.84665053039972, 'learning_rate': 5.330946742366356e-07, 'epoch': 0.79} 79%|███████▉ | 9784/12313 [7:19:47<2:00:39, 2.86s/it] 79%|███████▉ | 9785/12313 [7:19:49<2:01:06, 2.87s/it] {'loss': 0.3712, 'grad_norm': 5.58529418435716, 'learning_rate': 5.326888208550485e-07, 'epoch': 0.79} 79%|███████▉ | 9785/12313 [7:19:49<2:01:06, 2.87s/it] 79%|███████▉ | 9786/12313 [7:19:52<2:02:22, 2.91s/it] {'loss': 0.5674, 'grad_norm': 5.963815630595506, 'learning_rate': 5.32283103600984e-07, 'epoch': 0.79} 79%|███████▉ | 9786/12313 [7:19:52<2:02:22, 2.91s/it] 79%|███████▉ | 9787/12313 [7:19:55<2:01:14, 2.88s/it] {'loss': 0.5324, 'grad_norm': 5.054597490597483, 'learning_rate': 5.318775225025147e-07, 'epoch': 0.79} 79%|███████▉ | 9787/12313 [7:19:55<2:01:14, 2.88s/it] 79%|███████▉ | 9788/12313 [7:19:58<1:57:38, 2.80s/it] {'loss': 0.4342, 'grad_norm': 5.080349599593879, 'learning_rate': 5.314720775877046e-07, 'epoch': 0.79} 79%|███████▉ | 9788/12313 [7:19:58<1:57:38, 2.80s/it] 80%|███████▉ | 9789/12313 [7:20:01<1:57:57, 2.80s/it] {'loss': 0.3672, 'grad_norm': 4.048899362069842, 'learning_rate': 5.31066768884608e-07, 'epoch': 0.8} 80%|███████▉ | 9789/12313 [7:20:01<1:57:57, 2.80s/it] 80%|███████▉ | 9790/12313 [7:20:03<1:55:32, 2.75s/it] {'loss': 0.5476, 'grad_norm': 33.20617438804645, 'learning_rate': 5.306615964212705e-07, 'epoch': 0.8} 80%|███████▉ | 9790/12313 [7:20:03<1:55:32, 2.75s/it] 80%|███████▉ | 9791/12313 [7:20:06<1:54:13, 2.72s/it] {'loss': 0.6486, 'grad_norm': 7.450168497762338, 'learning_rate': 5.302565602257285e-07, 'epoch': 0.8} 80%|███████▉ | 9791/12313 [7:20:06<1:54:13, 2.72s/it] 80%|███████▉ | 9792/12313 [7:20:09<1:57:11, 2.79s/it] {'loss': 0.3788, 'grad_norm': 7.689105107124658, 'learning_rate': 5.298516603260071e-07, 'epoch': 0.8} 80%|███████▉ | 9792/12313 [7:20:09<1:57:11, 2.79s/it] 80%|███████▉ | 9793/12313 [7:20:11<1:53:38, 2.71s/it] {'loss': 0.4255, 'grad_norm': 4.276239675857061, 'learning_rate': 5.294468967501248e-07, 'epoch': 0.8} 80%|███████▉ | 9793/12313 [7:20:11<1:53:38, 2.71s/it] 80%|███████▉ | 9794/12313 [7:20:14<1:49:57, 2.62s/it] {'loss': 0.5787, 'grad_norm': 5.366407144652406, 'learning_rate': 5.29042269526088e-07, 'epoch': 0.8} 80%|███████▉ | 9794/12313 [7:20:14<1:49:57, 2.62s/it] 80%|███████▉ | 9795/12313 [7:20:17<1:51:28, 2.66s/it] {'loss': 0.6228, 'grad_norm': 3.8340051916962503, 'learning_rate': 5.286377786818944e-07, 'epoch': 0.8} 80%|███████▉ | 9795/12313 [7:20:17<1:51:28, 2.66s/it] 80%|███████▉ | 9796/12313 [7:20:19<1:50:55, 2.64s/it] {'loss': 0.5358, 'grad_norm': 4.50626750141108, 'learning_rate': 5.282334242455339e-07, 'epoch': 0.8} 80%|███████▉ | 9796/12313 [7:20:19<1:50:55, 2.64s/it] 80%|███████▉ | 9797/12313 [7:20:22<1:49:56, 2.62s/it] {'loss': 0.5658, 'grad_norm': 6.044552408083624, 'learning_rate': 5.278292062449844e-07, 'epoch': 0.8} 80%|███████▉ | 9797/12313 [7:20:22<1:49:56, 2.62s/it] 80%|███████▉ | 9798/12313 [7:20:24<1:48:12, 2.58s/it] {'loss': 0.5208, 'grad_norm': 4.404044522106774, 'learning_rate': 5.274251247082163e-07, 'epoch': 0.8} 80%|███████▉ | 9798/12313 [7:20:24<1:48:12, 2.58s/it] 80%|███████▉ | 9799/12313 [7:20:27<1:51:19, 2.66s/it] {'loss': 0.3324, 'grad_norm': 7.333443652190031, 'learning_rate': 5.270211796631905e-07, 'epoch': 0.8} 80%|███████▉ | 9799/12313 [7:20:27<1:51:19, 2.66s/it] 80%|███████▉ | 9800/12313 [7:20:30<1:52:36, 2.69s/it] {'loss': 0.5734, 'grad_norm': 4.223546024424326, 'learning_rate': 5.266173711378572e-07, 'epoch': 0.8} 80%|███████▉ | 9800/12313 [7:20:30<1:52:36, 2.69s/it] 80%|███████▉ | 9801/12313 [7:20:32<1:51:09, 2.66s/it] {'loss': 0.385, 'grad_norm': 5.341487545794119, 'learning_rate': 5.262136991601572e-07, 'epoch': 0.8} 80%|███████▉ | 9801/12313 [7:20:32<1:51:09, 2.66s/it] 80%|███████▉ | 9802/12313 [7:20:35<1:53:40, 2.72s/it] {'loss': 0.4812, 'grad_norm': 4.803777217199065, 'learning_rate': 5.258101637580238e-07, 'epoch': 0.8} 80%|███████▉ | 9802/12313 [7:20:35<1:53:40, 2.72s/it] 80%|███████▉ | 9803/12313 [7:20:38<1:53:22, 2.71s/it] {'loss': 0.4336, 'grad_norm': 6.650426970575839, 'learning_rate': 5.254067649593781e-07, 'epoch': 0.8} 80%|███████▉ | 9803/12313 [7:20:38<1:53:22, 2.71s/it] 80%|███████▉ | 9804/12313 [7:20:41<1:53:19, 2.71s/it] {'loss': 0.3946, 'grad_norm': 5.441981893530827, 'learning_rate': 5.250035027921338e-07, 'epoch': 0.8} 80%|███████▉ | 9804/12313 [7:20:41<1:53:19, 2.71s/it] 80%|███████▉ | 9805/12313 [7:20:43<1:49:34, 2.62s/it] {'loss': 0.4776, 'grad_norm': 5.107486747720455, 'learning_rate': 5.246003772841953e-07, 'epoch': 0.8} 80%|███████▉ | 9805/12313 [7:20:43<1:49:34, 2.62s/it] 80%|███████▉ | 9806/12313 [7:20:46<1:52:40, 2.70s/it] {'loss': 0.328, 'grad_norm': 3.5626105502835497, 'learning_rate': 5.24197388463456e-07, 'epoch': 0.8} 80%|███████▉ | 9806/12313 [7:20:46<1:52:40, 2.70s/it] 80%|███████▉ | 9807/12313 [7:20:49<1:51:41, 2.67s/it] {'loss': 0.4275, 'grad_norm': 6.074463700483569, 'learning_rate': 5.237945363578006e-07, 'epoch': 0.8} 80%|███████▉ | 9807/12313 [7:20:49<1:51:41, 2.67s/it] 80%|███████▉ | 9808/12313 [7:20:51<1:49:48, 2.63s/it] {'loss': 0.3317, 'grad_norm': 12.182536694266675, 'learning_rate': 5.233918209951039e-07, 'epoch': 0.8} 80%|███████▉ | 9808/12313 [7:20:51<1:49:48, 2.63s/it] 80%|███████▉ | 9809/12313 [7:20:54<1:47:20, 2.57s/it] {'loss': 0.4446, 'grad_norm': 6.106295220054585, 'learning_rate': 5.229892424032326e-07, 'epoch': 0.8} 80%|███████▉ | 9809/12313 [7:20:54<1:47:20, 2.57s/it] 80%|███████▉ | 9810/12313 [7:20:56<1:48:11, 2.59s/it] {'loss': 0.4815, 'grad_norm': 4.589038843159026, 'learning_rate': 5.225868006100421e-07, 'epoch': 0.8} 80%|███████▉ | 9810/12313 [7:20:56<1:48:11, 2.59s/it] 80%|███████▉ | 9811/12313 [7:20:59<1:52:57, 2.71s/it] {'loss': 0.3585, 'grad_norm': 13.701424714841433, 'learning_rate': 5.221844956433794e-07, 'epoch': 0.8} 80%|███████▉ | 9811/12313 [7:20:59<1:52:57, 2.71s/it] 80%|███████▉ | 9812/12313 [7:21:02<1:51:52, 2.68s/it] {'loss': 0.426, 'grad_norm': 4.331772567167185, 'learning_rate': 5.21782327531083e-07, 'epoch': 0.8} 80%|███████▉ | 9812/12313 [7:21:02<1:51:52, 2.68s/it] 80%|███████▉ | 9813/12313 [7:21:04<1:51:02, 2.67s/it] {'loss': 0.5894, 'grad_norm': 5.814768311032915, 'learning_rate': 5.213802963009798e-07, 'epoch': 0.8} 80%|███████▉ | 9813/12313 [7:21:04<1:51:02, 2.67s/it] 80%|███████▉ | 9814/12313 [7:21:07<1:53:05, 2.72s/it] {'loss': 0.401, 'grad_norm': 4.7885091116888505, 'learning_rate': 5.209784019808877e-07, 'epoch': 0.8} 80%|███████▉ | 9814/12313 [7:21:07<1:53:05, 2.72s/it] 80%|███████▉ | 9815/12313 [7:21:10<1:51:17, 2.67s/it] {'loss': 0.4015, 'grad_norm': 3.7476316440320403, 'learning_rate': 5.205766445986174e-07, 'epoch': 0.8} 80%|███████▉ | 9815/12313 [7:21:10<1:51:17, 2.67s/it] 80%|███████▉ | 9816/12313 [7:21:13<1:53:29, 2.73s/it] {'loss': 0.3485, 'grad_norm': 4.916810203123699, 'learning_rate': 5.201750241819664e-07, 'epoch': 0.8} 80%|███████▉ | 9816/12313 [7:21:13<1:53:29, 2.73s/it] 80%|███████▉ | 9817/12313 [7:21:16<1:55:31, 2.78s/it] {'loss': 0.4305, 'grad_norm': 4.506048343346604, 'learning_rate': 5.197735407587257e-07, 'epoch': 0.8} 80%|███████▉ | 9817/12313 [7:21:16<1:55:31, 2.78s/it] 80%|███████▉ | 9818/12313 [7:21:18<1:54:11, 2.75s/it] {'loss': 0.5148, 'grad_norm': 3.516529711458559, 'learning_rate': 5.193721943566762e-07, 'epoch': 0.8} 80%|███████▉ | 9818/12313 [7:21:18<1:54:11, 2.75s/it] 80%|███████▉ | 9819/12313 [7:21:21<1:54:57, 2.77s/it] {'loss': 0.4342, 'grad_norm': 4.681381707890613, 'learning_rate': 5.189709850035887e-07, 'epoch': 0.8} 80%|███████▉ | 9819/12313 [7:21:21<1:54:57, 2.77s/it] 80%|███████▉ | 9820/12313 [7:21:24<1:55:17, 2.77s/it] {'loss': 0.3261, 'grad_norm': 5.7757941641398665, 'learning_rate': 5.185699127272243e-07, 'epoch': 0.8} 80%|███████▉ | 9820/12313 [7:21:24<1:55:17, 2.77s/it] 80%|███████▉ | 9821/12313 [7:21:27<1:53:33, 2.73s/it] {'loss': 0.5548, 'grad_norm': 3.0498451518722196, 'learning_rate': 5.181689775553355e-07, 'epoch': 0.8} 80%|███████▉ | 9821/12313 [7:21:27<1:53:33, 2.73s/it] 80%|███████▉ | 9822/12313 [7:21:29<1:51:18, 2.68s/it] {'loss': 0.3438, 'grad_norm': 7.950302605138208, 'learning_rate': 5.17768179515665e-07, 'epoch': 0.8} 80%|███████▉ | 9822/12313 [7:21:29<1:51:18, 2.68s/it] 80%|███████▉ | 9823/12313 [7:21:32<1:51:53, 2.70s/it] {'loss': 0.4751, 'grad_norm': 6.18464348555484, 'learning_rate': 5.173675186359451e-07, 'epoch': 0.8} 80%|███████▉ | 9823/12313 [7:21:32<1:51:53, 2.70s/it] 80%|███████▉ | 9824/12313 [7:21:35<1:52:39, 2.72s/it] {'loss': 0.3245, 'grad_norm': 7.611226667202139, 'learning_rate': 5.169669949438996e-07, 'epoch': 0.8} 80%|███████▉ | 9824/12313 [7:21:35<1:52:39, 2.72s/it] 80%|███████▉ | 9825/12313 [7:21:37<1:50:37, 2.67s/it] {'loss': 0.4096, 'grad_norm': 6.064790144719224, 'learning_rate': 5.165666084672439e-07, 'epoch': 0.8} 80%|███████▉ | 9825/12313 [7:21:37<1:50:37, 2.67s/it] 80%|███████▉ | 9826/12313 [7:21:40<1:50:08, 2.66s/it] {'loss': 0.6826, 'grad_norm': 4.770411960870846, 'learning_rate': 5.161663592336815e-07, 'epoch': 0.8} 80%|███████▉ | 9826/12313 [7:21:40<1:50:08, 2.66s/it] 80%|███████▉ | 9827/12313 [7:21:43<1:52:40, 2.72s/it] {'loss': 0.5609, 'grad_norm': 7.597577840704956, 'learning_rate': 5.157662472709075e-07, 'epoch': 0.8} 80%|███████▉ | 9827/12313 [7:21:43<1:52:40, 2.72s/it] 80%|███████▉ | 9828/12313 [7:21:45<1:52:22, 2.71s/it] {'loss': 0.4814, 'grad_norm': 8.243596223621436, 'learning_rate': 5.153662726066083e-07, 'epoch': 0.8} 80%|███████▉ | 9828/12313 [7:21:45<1:52:22, 2.71s/it] 80%|███████▉ | 9829/12313 [7:21:48<1:51:59, 2.71s/it] {'loss': 0.5136, 'grad_norm': 9.676688199336681, 'learning_rate': 5.149664352684586e-07, 'epoch': 0.8} 80%|███████▉ | 9829/12313 [7:21:48<1:51:59, 2.71s/it] 80%|███████▉ | 9830/12313 [7:21:51<1:52:18, 2.71s/it] {'loss': 0.3747, 'grad_norm': 5.795108480675768, 'learning_rate': 5.14566735284126e-07, 'epoch': 0.8} 80%|███████▉ | 9830/12313 [7:21:51<1:52:18, 2.71s/it] 80%|███████▉ | 9831/12313 [7:21:53<1:48:49, 2.63s/it] {'loss': 0.5808, 'grad_norm': 6.471676661405976, 'learning_rate': 5.141671726812683e-07, 'epoch': 0.8} 80%|███████▉ | 9831/12313 [7:21:53<1:48:49, 2.63s/it] 80%|███████▉ | 9832/12313 [7:21:56<1:52:26, 2.72s/it] {'loss': 0.5517, 'grad_norm': 4.438123959791211, 'learning_rate': 5.137677474875324e-07, 'epoch': 0.8} 80%|███████▉ | 9832/12313 [7:21:56<1:52:26, 2.72s/it] 80%|███████▉ | 9833/12313 [7:21:59<1:54:00, 2.76s/it] {'loss': 0.3537, 'grad_norm': 7.424549295374417, 'learning_rate': 5.133684597305557e-07, 'epoch': 0.8} 80%|███████▉ | 9833/12313 [7:21:59<1:54:00, 2.76s/it] 80%|███████▉ | 9834/12313 [7:22:02<1:53:14, 2.74s/it] {'loss': 0.3372, 'grad_norm': 5.275736853110713, 'learning_rate': 5.129693094379684e-07, 'epoch': 0.8} 80%|███████▉ | 9834/12313 [7:22:02<1:53:14, 2.74s/it] 80%|███████▉ | 9835/12313 [7:22:04<1:52:14, 2.72s/it] {'loss': 0.3272, 'grad_norm': 3.2450613994463056, 'learning_rate': 5.125702966373883e-07, 'epoch': 0.8} 80%|███████▉ | 9835/12313 [7:22:04<1:52:14, 2.72s/it] 80%|███████▉ | 9836/12313 [7:22:07<1:50:44, 2.68s/it] {'loss': 0.4256, 'grad_norm': 12.566077839285736, 'learning_rate': 5.121714213564249e-07, 'epoch': 0.8} 80%|███████▉ | 9836/12313 [7:22:07<1:50:44, 2.68s/it] 80%|███████▉ | 9837/12313 [7:22:10<1:50:51, 2.69s/it] {'loss': 0.4961, 'grad_norm': 3.457305067918944, 'learning_rate': 5.117726836226786e-07, 'epoch': 0.8} 80%|███████▉ | 9837/12313 [7:22:10<1:50:51, 2.69s/it] 80%|███████▉ | 9838/12313 [7:22:12<1:49:40, 2.66s/it] {'loss': 0.3549, 'grad_norm': 5.900089910314316, 'learning_rate': 5.113740834637407e-07, 'epoch': 0.8} 80%|███████▉ | 9838/12313 [7:22:12<1:49:40, 2.66s/it] 80%|███████▉ | 9839/12313 [7:22:15<1:50:34, 2.68s/it] {'loss': 0.469, 'grad_norm': 8.54709344532498, 'learning_rate': 5.109756209071908e-07, 'epoch': 0.8} 80%|███████▉ | 9839/12313 [7:22:15<1:50:34, 2.68s/it] 80%|███████▉ | 9840/12313 [7:22:18<1:50:42, 2.69s/it] {'loss': 0.5095, 'grad_norm': 3.7791884771682147, 'learning_rate': 5.105772959806021e-07, 'epoch': 0.8} 80%|███████▉ | 9840/12313 [7:22:18<1:50:42, 2.69s/it] 80%|███████▉ | 9841/12313 [7:22:20<1:48:08, 2.62s/it] {'loss': 0.5591, 'grad_norm': 6.060813178732931, 'learning_rate': 5.101791087115354e-07, 'epoch': 0.8} 80%|███████▉ | 9841/12313 [7:22:20<1:48:08, 2.62s/it] 80%|███████▉ | 9842/12313 [7:22:23<1:50:37, 2.69s/it] {'loss': 0.3985, 'grad_norm': 3.1018063396354605, 'learning_rate': 5.097810591275429e-07, 'epoch': 0.8} 80%|███████▉ | 9842/12313 [7:22:23<1:50:37, 2.69s/it] 80%|███████▉ | 9843/12313 [7:22:26<1:51:00, 2.70s/it] {'loss': 0.4773, 'grad_norm': 5.327968286749991, 'learning_rate': 5.093831472561681e-07, 'epoch': 0.8} 80%|███████▉ | 9843/12313 [7:22:26<1:51:00, 2.70s/it] 80%|███████▉ | 9844/12313 [7:22:28<1:51:52, 2.72s/it] {'loss': 0.4411, 'grad_norm': 7.284888223378119, 'learning_rate': 5.089853731249448e-07, 'epoch': 0.8} 80%|███████▉ | 9844/12313 [7:22:28<1:51:52, 2.72s/it] 80%|███████▉ | 9845/12313 [7:22:31<1:52:38, 2.74s/it] {'loss': 0.4432, 'grad_norm': 9.03698387870564, 'learning_rate': 5.085877367613964e-07, 'epoch': 0.8} 80%|███████▉ | 9845/12313 [7:22:31<1:52:38, 2.74s/it] 80%|███████▉ | 9846/12313 [7:22:34<1:52:19, 2.73s/it] {'loss': 0.5223, 'grad_norm': 4.655253921964928, 'learning_rate': 5.081902381930365e-07, 'epoch': 0.8} 80%|███████▉ | 9846/12313 [7:22:34<1:52:19, 2.73s/it] 80%|███████▉ | 9847/12313 [7:22:37<1:50:55, 2.70s/it] {'loss': 0.3228, 'grad_norm': 6.893829661308056, 'learning_rate': 5.077928774473714e-07, 'epoch': 0.8} 80%|███████▉ | 9847/12313 [7:22:37<1:50:55, 2.70s/it] 80%|███████▉ | 9848/12313 [7:22:39<1:51:16, 2.71s/it] {'loss': 0.4231, 'grad_norm': 4.268542033494732, 'learning_rate': 5.073956545518949e-07, 'epoch': 0.8} 80%|███████▉ | 9848/12313 [7:22:39<1:51:16, 2.71s/it] 80%|███████▉ | 9849/12313 [7:22:42<1:51:56, 2.73s/it] {'loss': 0.4657, 'grad_norm': 14.574424300264582, 'learning_rate': 5.069985695340931e-07, 'epoch': 0.8} 80%|███████▉ | 9849/12313 [7:22:42<1:51:56, 2.73s/it] 80%|███████▉ | 9850/12313 [7:22:45<1:53:05, 2.75s/it] {'loss': 0.6228, 'grad_norm': 5.388314959320381, 'learning_rate': 5.066016224214435e-07, 'epoch': 0.8} 80%|███████▉ | 9850/12313 [7:22:45<1:53:05, 2.75s/it] 80%|████████ | 9851/12313 [7:22:48<1:53:10, 2.76s/it] {'loss': 0.4077, 'grad_norm': 5.9704183797863575, 'learning_rate': 5.062048132414116e-07, 'epoch': 0.8} 80%|████████ | 9851/12313 [7:22:48<1:53:10, 2.76s/it] 80%|████████ | 9852/12313 [7:22:50<1:52:25, 2.74s/it] {'loss': 0.394, 'grad_norm': 7.054787471974678, 'learning_rate': 5.058081420214538e-07, 'epoch': 0.8} 80%|████████ | 9852/12313 [7:22:50<1:52:25, 2.74s/it] 80%|████████ | 9853/12313 [7:22:53<1:50:52, 2.70s/it] {'loss': 0.5202, 'grad_norm': 9.477648614150743, 'learning_rate': 5.054116087890196e-07, 'epoch': 0.8} 80%|████████ | 9853/12313 [7:22:53<1:50:52, 2.70s/it] 80%|████████ | 9854/12313 [7:22:56<1:50:10, 2.69s/it] {'loss': 0.4712, 'grad_norm': 5.337615316923885, 'learning_rate': 5.050152135715453e-07, 'epoch': 0.8} 80%|████████ | 9854/12313 [7:22:56<1:50:10, 2.69s/it] 80%|████████ | 9855/12313 [7:22:58<1:52:22, 2.74s/it] {'loss': 0.4629, 'grad_norm': 4.045630006380179, 'learning_rate': 5.046189563964595e-07, 'epoch': 0.8} 80%|████████ | 9855/12313 [7:22:58<1:52:22, 2.74s/it] 80%|████████ | 9856/12313 [7:23:01<1:49:33, 2.68s/it] {'loss': 0.4685, 'grad_norm': 4.540887627720557, 'learning_rate': 5.042228372911815e-07, 'epoch': 0.8} 80%|████████ | 9856/12313 [7:23:01<1:49:33, 2.68s/it] 80%|████████ | 9857/12313 [7:23:04<1:49:48, 2.68s/it] {'loss': 0.4574, 'grad_norm': 5.474722039970322, 'learning_rate': 5.038268562831214e-07, 'epoch': 0.8} 80%|████████ | 9857/12313 [7:23:04<1:49:48, 2.68s/it] 80%|████████ | 9858/12313 [7:23:06<1:50:28, 2.70s/it] {'loss': 0.6693, 'grad_norm': 4.262273306813277, 'learning_rate': 5.034310133996772e-07, 'epoch': 0.8} 80%|████████ | 9858/12313 [7:23:06<1:50:28, 2.70s/it] 80%|████████ | 9859/12313 [7:23:09<1:51:43, 2.73s/it] {'loss': 0.4987, 'grad_norm': 4.349294849404306, 'learning_rate': 5.030353086682413e-07, 'epoch': 0.8} 80%|████████ | 9859/12313 [7:23:09<1:51:43, 2.73s/it] 80%|████████ | 9860/12313 [7:23:12<1:52:31, 2.75s/it] {'loss': 0.5335, 'grad_norm': 3.749623526864484, 'learning_rate': 5.02639742116193e-07, 'epoch': 0.8} 80%|████████ | 9860/12313 [7:23:12<1:52:31, 2.75s/it] 80%|████████ | 9861/12313 [7:23:15<1:54:23, 2.80s/it] {'loss': 0.5024, 'grad_norm': 3.6511910926652926, 'learning_rate': 5.022443137709032e-07, 'epoch': 0.8} 80%|████████ | 9861/12313 [7:23:15<1:54:23, 2.80s/it] 80%|████████ | 9862/12313 [7:23:17<1:51:05, 2.72s/it] {'loss': 0.7358, 'grad_norm': 7.167220329931862, 'learning_rate': 5.018490236597337e-07, 'epoch': 0.8} 80%|████████ | 9862/12313 [7:23:17<1:51:05, 2.72s/it] 80%|████████ | 9863/12313 [7:23:20<1:46:51, 2.62s/it] {'loss': 0.5198, 'grad_norm': 4.748703110219761, 'learning_rate': 5.014538718100373e-07, 'epoch': 0.8} 80%|████████ | 9863/12313 [7:23:20<1:46:51, 2.62s/it] 80%|████████ | 9864/12313 [7:23:22<1:46:16, 2.60s/it] {'loss': 0.5499, 'grad_norm': 6.53063347027075, 'learning_rate': 5.01058858249156e-07, 'epoch': 0.8} 80%|████████ | 9864/12313 [7:23:22<1:46:16, 2.60s/it] 80%|████████ | 9865/12313 [7:23:25<1:50:27, 2.71s/it] {'loss': 0.4342, 'grad_norm': 4.349589259638556, 'learning_rate': 5.006639830044219e-07, 'epoch': 0.8} 80%|████████ | 9865/12313 [7:23:25<1:50:27, 2.71s/it] 80%|████████ | 9866/12313 [7:23:28<1:51:34, 2.74s/it] {'loss': 0.5895, 'grad_norm': 4.143371415401599, 'learning_rate': 5.002692461031591e-07, 'epoch': 0.8} 80%|████████ | 9866/12313 [7:23:28<1:51:34, 2.74s/it] 80%|████████ | 9867/12313 [7:23:31<1:50:19, 2.71s/it] {'loss': 0.5336, 'grad_norm': 7.7254724395274055, 'learning_rate': 4.998746475726815e-07, 'epoch': 0.8} 80%|████████ | 9867/12313 [7:23:31<1:50:19, 2.71s/it] 80%|████████ | 9868/12313 [7:23:34<1:50:07, 2.70s/it] {'loss': 0.6204, 'grad_norm': 5.175670084157273, 'learning_rate': 4.994801874402918e-07, 'epoch': 0.8} 80%|████████ | 9868/12313 [7:23:34<1:50:07, 2.70s/it] 80%|████████ | 9869/12313 [7:23:37<1:54:21, 2.81s/it] {'loss': 0.3911, 'grad_norm': 6.735892978669596, 'learning_rate': 4.990858657332856e-07, 'epoch': 0.8} 80%|████████ | 9869/12313 [7:23:37<1:54:21, 2.81s/it] 80%|████████ | 9870/12313 [7:23:39<1:54:11, 2.80s/it] {'loss': 0.5035, 'grad_norm': 5.104643492256398, 'learning_rate': 4.986916824789484e-07, 'epoch': 0.8} 80%|████████ | 9870/12313 [7:23:39<1:54:11, 2.80s/it] 80%|████████ | 9871/12313 [7:23:42<1:54:50, 2.82s/it] {'loss': 0.4066, 'grad_norm': 3.5531328145211476, 'learning_rate': 4.982976377045546e-07, 'epoch': 0.8} 80%|████████ | 9871/12313 [7:23:42<1:54:50, 2.82s/it] 80%|████████ | 9872/12313 [7:23:45<1:55:05, 2.83s/it] {'loss': 0.7253, 'grad_norm': 4.292790999663058, 'learning_rate': 4.979037314373708e-07, 'epoch': 0.8} 80%|████████ | 9872/12313 [7:23:45<1:55:05, 2.83s/it] 80%|████████ | 9873/12313 [7:23:48<1:53:07, 2.78s/it] {'loss': 0.5173, 'grad_norm': 3.9126112690178934, 'learning_rate': 4.975099637046529e-07, 'epoch': 0.8} 80%|████████ | 9873/12313 [7:23:48<1:53:07, 2.78s/it] 80%|████████ | 9874/12313 [7:23:51<1:54:01, 2.81s/it] {'loss': 0.546, 'grad_norm': 4.952119344921789, 'learning_rate': 4.971163345336469e-07, 'epoch': 0.8} 80%|████████ | 9874/12313 [7:23:51<1:54:01, 2.81s/it] 80%|████████ | 9875/12313 [7:23:53<1:54:36, 2.82s/it] {'loss': 0.5014, 'grad_norm': 4.007491175094282, 'learning_rate': 4.967228439515903e-07, 'epoch': 0.8} 80%|████████ | 9875/12313 [7:23:53<1:54:36, 2.82s/it] 80%|████████ | 9876/12313 [7:23:56<1:52:07, 2.76s/it] {'loss': 0.4807, 'grad_norm': 7.4996807066549405, 'learning_rate': 4.963294919857115e-07, 'epoch': 0.8} 80%|████████ | 9876/12313 [7:23:56<1:52:07, 2.76s/it] 80%|████████ | 9877/12313 [7:23:59<1:52:38, 2.77s/it] {'loss': 0.3412, 'grad_norm': 3.8718307958429308, 'learning_rate': 4.959362786632274e-07, 'epoch': 0.8} 80%|████████ | 9877/12313 [7:23:59<1:52:38, 2.77s/it] 80%|████████ | 9878/12313 [7:24:01<1:50:00, 2.71s/it] {'loss': 0.4632, 'grad_norm': 14.309366650475116, 'learning_rate': 4.955432040113459e-07, 'epoch': 0.8} 80%|████████ | 9878/12313 [7:24:01<1:50:00, 2.71s/it] 80%|████████ | 9879/12313 [7:24:04<1:47:04, 2.64s/it] {'loss': 0.3375, 'grad_norm': 6.072346159252271, 'learning_rate': 4.95150268057267e-07, 'epoch': 0.8} 80%|████████ | 9879/12313 [7:24:04<1:47:04, 2.64s/it] 80%|████████ | 9880/12313 [7:24:07<1:48:45, 2.68s/it] {'loss': 0.4548, 'grad_norm': 3.9060655122044956, 'learning_rate': 4.947574708281788e-07, 'epoch': 0.8} 80%|████████ | 9880/12313 [7:24:07<1:48:45, 2.68s/it] 80%|████████ | 9881/12313 [7:24:09<1:44:51, 2.59s/it] {'loss': 0.5799, 'grad_norm': 4.543113575494821, 'learning_rate': 4.943648123512607e-07, 'epoch': 0.8} 80%|████████ | 9881/12313 [7:24:09<1:44:51, 2.59s/it] 80%|████████ | 9882/12313 [7:24:12<1:46:25, 2.63s/it] {'loss': 0.3684, 'grad_norm': 8.154618850928907, 'learning_rate': 4.939722926536825e-07, 'epoch': 0.8} 80%|████████ | 9882/12313 [7:24:12<1:46:25, 2.63s/it] 80%|████████ | 9883/12313 [7:24:15<1:50:55, 2.74s/it] {'loss': 0.5121, 'grad_norm': 3.3302480665945673, 'learning_rate': 4.935799117626058e-07, 'epoch': 0.8} 80%|████████ | 9883/12313 [7:24:15<1:50:55, 2.74s/it] 80%|████████ | 9884/12313 [7:24:17<1:48:06, 2.67s/it] {'loss': 0.4584, 'grad_norm': 5.010095824925431, 'learning_rate': 4.931876697051797e-07, 'epoch': 0.8} 80%|████████ | 9884/12313 [7:24:17<1:48:06, 2.67s/it] 80%|████████ | 9885/12313 [7:24:20<1:47:47, 2.66s/it] {'loss': 0.5904, 'grad_norm': 8.290730780611321, 'learning_rate': 4.927955665085466e-07, 'epoch': 0.8} 80%|████████ | 9885/12313 [7:24:20<1:47:47, 2.66s/it] 80%|████████ | 9886/12313 [7:24:23<1:50:15, 2.73s/it] {'loss': 0.6501, 'grad_norm': 6.530421557967707, 'learning_rate': 4.924036021998372e-07, 'epoch': 0.8} 80%|████████ | 9886/12313 [7:24:23<1:50:15, 2.73s/it] 80%|████████ | 9887/12313 [7:24:25<1:45:59, 2.62s/it] {'loss': 0.4631, 'grad_norm': 3.933371390397304, 'learning_rate': 4.92011776806173e-07, 'epoch': 0.8} 80%|████████ | 9887/12313 [7:24:25<1:45:59, 2.62s/it] 80%|████████ | 9888/12313 [7:24:28<1:44:54, 2.60s/it] {'loss': 0.4694, 'grad_norm': 4.630936993671216, 'learning_rate': 4.916200903546664e-07, 'epoch': 0.8} 80%|████████ | 9888/12313 [7:24:28<1:44:54, 2.60s/it] 80%|████████ | 9889/12313 [7:24:31<1:50:50, 2.74s/it] {'loss': 0.4291, 'grad_norm': 8.920541461832672, 'learning_rate': 4.912285428724214e-07, 'epoch': 0.8} 80%|████████ | 9889/12313 [7:24:31<1:50:50, 2.74s/it] 80%|████████ | 9890/12313 [7:24:34<1:51:42, 2.77s/it] {'loss': 0.5715, 'grad_norm': 4.683558801065991, 'learning_rate': 4.908371343865289e-07, 'epoch': 0.8} 80%|████████ | 9890/12313 [7:24:34<1:51:42, 2.77s/it] 80%|████████ | 9891/12313 [7:24:37<1:53:18, 2.81s/it] {'loss': 0.6187, 'grad_norm': 4.064081828276734, 'learning_rate': 4.904458649240742e-07, 'epoch': 0.8} 80%|████████ | 9891/12313 [7:24:37<1:53:18, 2.81s/it] 80%|████████ | 9892/12313 [7:24:39<1:51:11, 2.76s/it] {'loss': 0.4852, 'grad_norm': 13.844068111838718, 'learning_rate': 4.900547345121304e-07, 'epoch': 0.8} 80%|████████ | 9892/12313 [7:24:39<1:51:11, 2.76s/it] 80%|████████ | 9893/12313 [7:24:42<1:50:53, 2.75s/it] {'loss': 0.5241, 'grad_norm': 7.334487582827237, 'learning_rate': 4.896637431777607e-07, 'epoch': 0.8} 80%|████████ | 9893/12313 [7:24:42<1:50:53, 2.75s/it] 80%|████████ | 9894/12313 [7:24:45<1:50:09, 2.73s/it] {'loss': 0.4231, 'grad_norm': 4.075794059330378, 'learning_rate': 4.89272890948021e-07, 'epoch': 0.8} 80%|████████ | 9894/12313 [7:24:45<1:50:09, 2.73s/it] 80%|████████ | 9895/12313 [7:24:48<1:55:06, 2.86s/it] {'loss': 0.3873, 'grad_norm': 3.1839585809732913, 'learning_rate': 4.88882177849955e-07, 'epoch': 0.8} 80%|████████ | 9895/12313 [7:24:48<1:55:06, 2.86s/it] 80%|████████ | 9896/12313 [7:24:50<1:52:07, 2.78s/it] {'loss': 0.4194, 'grad_norm': 3.9421773033440233, 'learning_rate': 4.884916039105994e-07, 'epoch': 0.8} 80%|████████ | 9896/12313 [7:24:50<1:52:07, 2.78s/it] 80%|████████ | 9897/12313 [7:24:53<1:50:47, 2.75s/it] {'loss': 0.3529, 'grad_norm': 4.937157672903017, 'learning_rate': 4.881011691569781e-07, 'epoch': 0.8} 80%|████████ | 9897/12313 [7:24:53<1:50:47, 2.75s/it] 80%|████████ | 9898/12313 [7:24:56<1:50:52, 2.75s/it] {'loss': 0.5727, 'grad_norm': 9.136038936527344, 'learning_rate': 4.877108736161091e-07, 'epoch': 0.8} 80%|████████ | 9898/12313 [7:24:56<1:50:52, 2.75s/it] 80%|████████ | 9899/12313 [7:24:58<1:49:07, 2.71s/it] {'loss': 0.608, 'grad_norm': 4.5845637025408, 'learning_rate': 4.873207173149974e-07, 'epoch': 0.8} 80%|████████ | 9899/12313 [7:24:58<1:49:07, 2.71s/it] 80%|████████ | 9900/12313 [7:25:01<1:48:04, 2.69s/it] {'loss': 0.471, 'grad_norm': 5.642715408471168, 'learning_rate': 4.869307002806397e-07, 'epoch': 0.8} 80%|████████ | 9900/12313 [7:25:01<1:48:04, 2.69s/it] 80%|████████ | 9901/12313 [7:25:04<1:49:04, 2.71s/it] {'loss': 0.51, 'grad_norm': 6.2233329502733294, 'learning_rate': 4.865408225400234e-07, 'epoch': 0.8} 80%|████████ | 9901/12313 [7:25:04<1:49:04, 2.71s/it] 80%|████████ | 9902/12313 [7:25:06<1:48:04, 2.69s/it] {'loss': 0.3892, 'grad_norm': 7.993507725893066, 'learning_rate': 4.861510841201266e-07, 'epoch': 0.8} 80%|████████ | 9902/12313 [7:25:06<1:48:04, 2.69s/it] 80%|████████ | 9903/12313 [7:25:09<1:48:44, 2.71s/it] {'loss': 0.5246, 'grad_norm': 4.96226956095677, 'learning_rate': 4.857614850479161e-07, 'epoch': 0.8} 80%|████████ | 9903/12313 [7:25:09<1:48:44, 2.71s/it] 80%|████████ | 9904/12313 [7:25:12<1:47:36, 2.68s/it] {'loss': 0.4751, 'grad_norm': 11.603584675552659, 'learning_rate': 4.853720253503514e-07, 'epoch': 0.8} 80%|████████ | 9904/12313 [7:25:12<1:47:36, 2.68s/it] 80%|████████ | 9905/12313 [7:25:15<1:50:38, 2.76s/it] {'loss': 0.5307, 'grad_norm': 4.342808900661657, 'learning_rate': 4.849827050543801e-07, 'epoch': 0.8} 80%|████████ | 9905/12313 [7:25:15<1:50:38, 2.76s/it] 80%|████████ | 9906/12313 [7:25:17<1:49:43, 2.73s/it] {'loss': 0.4751, 'grad_norm': 6.60599795238213, 'learning_rate': 4.845935241869409e-07, 'epoch': 0.8} 80%|████████ | 9906/12313 [7:25:17<1:49:43, 2.73s/it] 80%|████████ | 9907/12313 [7:25:20<1:46:50, 2.66s/it] {'loss': 0.511, 'grad_norm': 4.155426698257137, 'learning_rate': 4.842044827749632e-07, 'epoch': 0.8} 80%|████████ | 9907/12313 [7:25:20<1:46:50, 2.66s/it] 80%|████████ | 9908/12313 [7:25:23<1:48:06, 2.70s/it] {'loss': 0.6412, 'grad_norm': 5.569956854051911, 'learning_rate': 4.838155808453676e-07, 'epoch': 0.8} 80%|████████ | 9908/12313 [7:25:23<1:48:06, 2.70s/it] 80%|████████ | 9909/12313 [7:25:25<1:44:41, 2.61s/it] {'loss': 0.3876, 'grad_norm': 8.778391998701604, 'learning_rate': 4.834268184250626e-07, 'epoch': 0.8} 80%|████████ | 9909/12313 [7:25:25<1:44:41, 2.61s/it] 80%|████████ | 9910/12313 [7:25:28<1:44:47, 2.62s/it] {'loss': 0.438, 'grad_norm': 6.809131934976907, 'learning_rate': 4.830381955409497e-07, 'epoch': 0.8} 80%|████████ | 9910/12313 [7:25:28<1:44:47, 2.62s/it] 80%|████████ | 9911/12313 [7:25:30<1:45:01, 2.62s/it] {'loss': 0.5043, 'grad_norm': 6.2072199104772885, 'learning_rate': 4.826497122199191e-07, 'epoch': 0.8} 80%|████████ | 9911/12313 [7:25:30<1:45:01, 2.62s/it] 81%|████████ | 9912/12313 [7:25:33<1:43:31, 2.59s/it] {'loss': 0.4799, 'grad_norm': 9.779692232356227, 'learning_rate': 4.822613684888519e-07, 'epoch': 0.81} 81%|████████ | 9912/12313 [7:25:33<1:43:31, 2.59s/it] 81%|████████ | 9913/12313 [7:25:36<1:44:00, 2.60s/it] {'loss': 0.5888, 'grad_norm': 5.12858071171813, 'learning_rate': 4.818731643746186e-07, 'epoch': 0.81} 81%|████████ | 9913/12313 [7:25:36<1:44:00, 2.60s/it] 81%|████████ | 9914/12313 [7:25:38<1:43:37, 2.59s/it] {'loss': 0.551, 'grad_norm': 3.86454745630488, 'learning_rate': 4.814850999040816e-07, 'epoch': 0.81} 81%|████████ | 9914/12313 [7:25:38<1:43:37, 2.59s/it] 81%|████████ | 9915/12313 [7:25:41<1:43:52, 2.60s/it] {'loss': 0.4416, 'grad_norm': 4.189470752357414, 'learning_rate': 4.810971751040932e-07, 'epoch': 0.81} 81%|████████ | 9915/12313 [7:25:41<1:43:52, 2.60s/it] 81%|████████ | 9916/12313 [7:25:43<1:41:32, 2.54s/it] {'loss': 0.3583, 'grad_norm': 8.768793347385872, 'learning_rate': 4.80709390001495e-07, 'epoch': 0.81} 81%|████████ | 9916/12313 [7:25:43<1:41:32, 2.54s/it] 81%|████████ | 9917/12313 [7:25:46<1:42:16, 2.56s/it] {'loss': 0.5113, 'grad_norm': 5.814244672682726, 'learning_rate': 4.803217446231206e-07, 'epoch': 0.81} 81%|████████ | 9917/12313 [7:25:46<1:42:16, 2.56s/it] 81%|████████ | 9918/12313 [7:25:48<1:43:10, 2.58s/it] {'loss': 0.4861, 'grad_norm': 4.585923445910094, 'learning_rate': 4.799342389957925e-07, 'epoch': 0.81} 81%|████████ | 9918/12313 [7:25:48<1:43:10, 2.58s/it] 81%|████████ | 9919/12313 [7:25:51<1:44:28, 2.62s/it] {'loss': 0.5333, 'grad_norm': 5.831696175264736, 'learning_rate': 4.795468731463232e-07, 'epoch': 0.81} 81%|████████ | 9919/12313 [7:25:51<1:44:28, 2.62s/it] 81%|████████ | 9920/12313 [7:25:54<1:48:45, 2.73s/it] {'loss': 0.5757, 'grad_norm': 4.199244928525552, 'learning_rate': 4.791596471015175e-07, 'epoch': 0.81} 81%|████████ | 9920/12313 [7:25:54<1:48:45, 2.73s/it] 81%|████████ | 9921/12313 [7:25:57<1:47:04, 2.69s/it] {'loss': 0.5091, 'grad_norm': 4.550642371506264, 'learning_rate': 4.787725608881694e-07, 'epoch': 0.81} 81%|████████ | 9921/12313 [7:25:57<1:47:04, 2.69s/it] 81%|████████ | 9922/12313 [7:25:59<1:48:00, 2.71s/it] {'loss': 0.3853, 'grad_norm': 6.904305095897743, 'learning_rate': 4.783856145330624e-07, 'epoch': 0.81} 81%|████████ | 9922/12313 [7:25:59<1:48:00, 2.71s/it] 81%|████████ | 9923/12313 [7:26:02<1:50:21, 2.77s/it] {'loss': 0.547, 'grad_norm': 5.386276895757226, 'learning_rate': 4.779988080629722e-07, 'epoch': 0.81} 81%|████████ | 9923/12313 [7:26:02<1:50:21, 2.77s/it] 81%|████████ | 9924/12313 [7:26:05<1:48:33, 2.73s/it] {'loss': 0.6014, 'grad_norm': 5.592802881169542, 'learning_rate': 4.776121415046634e-07, 'epoch': 0.81} 81%|████████ | 9924/12313 [7:26:05<1:48:33, 2.73s/it] 81%|████████ | 9925/12313 [7:26:08<1:49:49, 2.76s/it] {'loss': 0.5031, 'grad_norm': 6.5204999954371825, 'learning_rate': 4.772256148848903e-07, 'epoch': 0.81} 81%|████████ | 9925/12313 [7:26:08<1:49:49, 2.76s/it] 81%|████████ | 9926/12313 [7:26:10<1:49:00, 2.74s/it] {'loss': 0.4754, 'grad_norm': 4.641348135597619, 'learning_rate': 4.768392282303999e-07, 'epoch': 0.81} 81%|████████ | 9926/12313 [7:26:10<1:49:00, 2.74s/it] 81%|████████ | 9927/12313 [7:26:13<1:45:33, 2.65s/it] {'loss': 0.6335, 'grad_norm': 6.611205228762905, 'learning_rate': 4.7645298156792667e-07, 'epoch': 0.81} 81%|████████ | 9927/12313 [7:26:13<1:45:33, 2.65s/it] 81%|████████ | 9928/12313 [7:26:16<1:45:23, 2.65s/it] {'loss': 0.5242, 'grad_norm': 4.080391885424033, 'learning_rate': 4.7606687492419785e-07, 'epoch': 0.81} 81%|████████ | 9928/12313 [7:26:16<1:45:23, 2.65s/it] 81%|████████ | 9929/12313 [7:26:19<1:50:55, 2.79s/it] {'loss': 0.4403, 'grad_norm': 12.801912551363564, 'learning_rate': 4.7568090832593033e-07, 'epoch': 0.81} 81%|████████ | 9929/12313 [7:26:19<1:50:55, 2.79s/it] 81%|████████ | 9930/12313 [7:26:22<1:51:08, 2.80s/it] {'loss': 0.505, 'grad_norm': 6.118657536304109, 'learning_rate': 4.752950817998303e-07, 'epoch': 0.81} 81%|████████ | 9930/12313 [7:26:22<1:51:08, 2.80s/it] 81%|████████ | 9931/12313 [7:26:24<1:49:55, 2.77s/it] {'loss': 0.4619, 'grad_norm': 5.362943206753638, 'learning_rate': 4.7490939537259527e-07, 'epoch': 0.81} 81%|████████ | 9931/12313 [7:26:24<1:49:55, 2.77s/it] 81%|████████ | 9932/12313 [7:26:27<1:48:21, 2.73s/it] {'loss': 0.3267, 'grad_norm': 6.975033167992144, 'learning_rate': 4.745238490709117e-07, 'epoch': 0.81} 81%|████████ | 9932/12313 [7:26:27<1:48:21, 2.73s/it] 81%|████████ | 9933/12313 [7:26:30<1:49:56, 2.77s/it] {'loss': 0.4316, 'grad_norm': 4.258954432113065, 'learning_rate': 4.741384429214579e-07, 'epoch': 0.81} 81%|████████ | 9933/12313 [7:26:30<1:49:56, 2.77s/it] 81%|████████ | 9934/12313 [7:26:33<1:54:59, 2.90s/it] {'loss': 0.5411, 'grad_norm': 3.8337908412938972, 'learning_rate': 4.7375317695090295e-07, 'epoch': 0.81} 81%|████████ | 9934/12313 [7:26:33<1:54:59, 2.90s/it] 81%|████████ | 9935/12313 [7:26:36<1:51:29, 2.81s/it] {'loss': 0.4551, 'grad_norm': 3.8911807697106213, 'learning_rate': 4.7336805118590375e-07, 'epoch': 0.81} 81%|████████ | 9935/12313 [7:26:36<1:51:29, 2.81s/it] 81%|████████ | 9936/12313 [7:26:38<1:49:07, 2.75s/it] {'loss': 0.5383, 'grad_norm': 3.9612535233371764, 'learning_rate': 4.729830656531101e-07, 'epoch': 0.81} 81%|████████ | 9936/12313 [7:26:38<1:49:07, 2.75s/it] 81%|████████ | 9937/12313 [7:26:41<1:51:25, 2.81s/it] {'loss': 0.6442, 'grad_norm': 4.951917020944689, 'learning_rate': 4.725982203791607e-07, 'epoch': 0.81} 81%|████████ | 9937/12313 [7:26:41<1:51:25, 2.81s/it] 81%|████████ | 9938/12313 [7:26:44<1:48:29, 2.74s/it] {'loss': 0.4088, 'grad_norm': 9.313306419072807, 'learning_rate': 4.7221351539068374e-07, 'epoch': 0.81} 81%|████████ | 9938/12313 [7:26:44<1:48:29, 2.74s/it] 81%|████████ | 9939/12313 [7:26:46<1:48:10, 2.73s/it] {'loss': 0.4883, 'grad_norm': 5.792989399998622, 'learning_rate': 4.7182895071430036e-07, 'epoch': 0.81} 81%|████████ | 9939/12313 [7:26:46<1:48:10, 2.73s/it] 81%|████████ | 9940/12313 [7:26:49<1:46:44, 2.70s/it] {'loss': 0.522, 'grad_norm': 6.612743860989568, 'learning_rate': 4.7144452637661875e-07, 'epoch': 0.81} 81%|████████ | 9940/12313 [7:26:49<1:46:44, 2.70s/it] 81%|████████ | 9941/12313 [7:26:52<1:45:41, 2.67s/it] {'loss': 0.5033, 'grad_norm': 5.3380622522214916, 'learning_rate': 4.7106024240424014e-07, 'epoch': 0.81} 81%|████████ | 9941/12313 [7:26:52<1:45:41, 2.67s/it] 81%|████████ | 9942/12313 [7:26:54<1:46:33, 2.70s/it] {'loss': 0.4336, 'grad_norm': 10.643012463139256, 'learning_rate': 4.706760988237555e-07, 'epoch': 0.81} 81%|████████ | 9942/12313 [7:26:54<1:46:33, 2.70s/it] 81%|████████ | 9943/12313 [7:26:57<1:45:07, 2.66s/it] {'loss': 0.4817, 'grad_norm': 6.794117107098659, 'learning_rate': 4.702920956617446e-07, 'epoch': 0.81} 81%|████████ | 9943/12313 [7:26:57<1:45:07, 2.66s/it] 81%|████████ | 9944/12313 [7:27:00<1:46:20, 2.69s/it] {'loss': 0.3929, 'grad_norm': 10.634630209323328, 'learning_rate': 4.6990823294477795e-07, 'epoch': 0.81} 81%|████████ | 9944/12313 [7:27:00<1:46:20, 2.69s/it] 81%|████████ | 9945/12313 [7:27:03<1:48:06, 2.74s/it] {'loss': 0.4227, 'grad_norm': 6.387003003213154, 'learning_rate': 4.695245106994181e-07, 'epoch': 0.81} 81%|████████ | 9945/12313 [7:27:03<1:48:06, 2.74s/it] 81%|████████ | 9946/12313 [7:27:05<1:45:02, 2.66s/it] {'loss': 0.4674, 'grad_norm': 5.885749845817612, 'learning_rate': 4.691409289522156e-07, 'epoch': 0.81} 81%|████████ | 9946/12313 [7:27:05<1:45:02, 2.66s/it] 81%|████████ | 9947/12313 [7:27:08<1:45:44, 2.68s/it] {'loss': 0.6015, 'grad_norm': 5.744010803243097, 'learning_rate': 4.6875748772971244e-07, 'epoch': 0.81} 81%|████████ | 9947/12313 [7:27:08<1:45:44, 2.68s/it] 81%|████████ | 9948/12313 [7:27:11<1:48:32, 2.75s/it] {'loss': 0.4788, 'grad_norm': 3.2133616441041704, 'learning_rate': 4.683741870584413e-07, 'epoch': 0.81} 81%|████████ | 9948/12313 [7:27:11<1:48:32, 2.75s/it] 81%|████████ | 9949/12313 [7:27:13<1:45:10, 2.67s/it] {'loss': 0.5901, 'grad_norm': 6.807257172917265, 'learning_rate': 4.679910269649246e-07, 'epoch': 0.81} 81%|████████ | 9949/12313 [7:27:13<1:45:10, 2.67s/it] 81%|████████ | 9950/12313 [7:27:16<1:47:42, 2.73s/it] {'loss': 0.5877, 'grad_norm': 3.9285075156101823, 'learning_rate': 4.676080074756745e-07, 'epoch': 0.81} 81%|████████ | 9950/12313 [7:27:16<1:47:42, 2.73s/it] 81%|████████ | 9951/12313 [7:27:19<1:46:41, 2.71s/it] {'loss': 0.5472, 'grad_norm': 5.276676978089501, 'learning_rate': 4.6722512861719304e-07, 'epoch': 0.81} 81%|████████ | 9951/12313 [7:27:19<1:46:41, 2.71s/it] 81%|████████ | 9952/12313 [7:27:21<1:44:54, 2.67s/it] {'loss': 0.6121, 'grad_norm': 5.815263402068283, 'learning_rate': 4.6684239041597524e-07, 'epoch': 0.81} 81%|████████ | 9952/12313 [7:27:21<1:44:54, 2.67s/it] 81%|████████ | 9953/12313 [7:27:24<1:45:12, 2.67s/it] {'loss': 0.3672, 'grad_norm': 8.366540950724946, 'learning_rate': 4.6645979289850316e-07, 'epoch': 0.81} 81%|████████ | 9953/12313 [7:27:24<1:45:12, 2.67s/it] 81%|████████ | 9954/12313 [7:27:27<1:44:58, 2.67s/it] {'loss': 0.4365, 'grad_norm': 4.915630857560634, 'learning_rate': 4.66077336091251e-07, 'epoch': 0.81} 81%|████████ | 9954/12313 [7:27:27<1:44:58, 2.67s/it] 81%|████████ | 9955/12313 [7:27:29<1:44:32, 2.66s/it] {'loss': 0.3652, 'grad_norm': 14.896314047238715, 'learning_rate': 4.6569502002068336e-07, 'epoch': 0.81} 81%|████████ | 9955/12313 [7:27:29<1:44:32, 2.66s/it] 81%|████████ | 9956/12313 [7:27:32<1:47:06, 2.73s/it] {'loss': 0.4285, 'grad_norm': 5.45782242106025, 'learning_rate': 4.6531284471325375e-07, 'epoch': 0.81} 81%|████████ | 9956/12313 [7:27:32<1:47:06, 2.73s/it] 81%|████████ | 9957/12313 [7:27:35<1:47:11, 2.73s/it] {'loss': 0.5137, 'grad_norm': 4.550708998274069, 'learning_rate': 4.649308101954064e-07, 'epoch': 0.81} 81%|████████ | 9957/12313 [7:27:35<1:47:11, 2.73s/it] 81%|████████ | 9958/12313 [7:27:38<1:47:08, 2.73s/it] {'loss': 0.4994, 'grad_norm': 4.5179592494433125, 'learning_rate': 4.645489164935774e-07, 'epoch': 0.81} 81%|████████ | 9958/12313 [7:27:38<1:47:08, 2.73s/it] 81%|████████ | 9959/12313 [7:27:40<1:47:17, 2.73s/it] {'loss': 0.4727, 'grad_norm': 8.186932272267061, 'learning_rate': 4.641671636341899e-07, 'epoch': 0.81} 81%|████████ | 9959/12313 [7:27:40<1:47:17, 2.73s/it] 81%|████████ | 9960/12313 [7:27:43<1:44:25, 2.66s/it] {'loss': 0.4331, 'grad_norm': 7.7788457747580635, 'learning_rate': 4.637855516436604e-07, 'epoch': 0.81} 81%|████████ | 9960/12313 [7:27:43<1:44:25, 2.66s/it] 81%|████████ | 9961/12313 [7:27:46<1:44:46, 2.67s/it] {'loss': 0.4042, 'grad_norm': 9.523483750496569, 'learning_rate': 4.634040805483947e-07, 'epoch': 0.81} 81%|████████ | 9961/12313 [7:27:46<1:44:46, 2.67s/it] 81%|████████ | 9962/12313 [7:27:48<1:45:43, 2.70s/it] {'loss': 0.6089, 'grad_norm': 4.262182822939964, 'learning_rate': 4.6302275037478804e-07, 'epoch': 0.81} 81%|████████ | 9962/12313 [7:27:48<1:45:43, 2.70s/it] 81%|████████ | 9963/12313 [7:27:51<1:45:21, 2.69s/it] {'loss': 0.6344, 'grad_norm': 5.01145075636782, 'learning_rate': 4.6264156114922605e-07, 'epoch': 0.81} 81%|████████ | 9963/12313 [7:27:51<1:45:21, 2.69s/it] 81%|████████ | 9964/12313 [7:27:54<1:45:40, 2.70s/it] {'loss': 0.3984, 'grad_norm': 6.573486459689827, 'learning_rate': 4.622605128980862e-07, 'epoch': 0.81} 81%|████████ | 9964/12313 [7:27:54<1:45:40, 2.70s/it] 81%|████████ | 9965/12313 [7:27:56<1:46:35, 2.72s/it] {'loss': 0.3601, 'grad_norm': 6.380655374815469, 'learning_rate': 4.61879605647734e-07, 'epoch': 0.81} 81%|████████ | 9965/12313 [7:27:56<1:46:35, 2.72s/it] 81%|████████ | 9966/12313 [7:27:59<1:44:37, 2.67s/it] {'loss': 0.626, 'grad_norm': 11.839284886283753, 'learning_rate': 4.6149883942452595e-07, 'epoch': 0.81} 81%|████████ | 9966/12313 [7:27:59<1:44:37, 2.67s/it] 81%|████████ | 9967/12313 [7:28:02<1:42:18, 2.62s/it] {'loss': 0.7225, 'grad_norm': 6.373110585879932, 'learning_rate': 4.6111821425480956e-07, 'epoch': 0.81} 81%|████████ | 9967/12313 [7:28:02<1:42:18, 2.62s/it] 81%|████████ | 9968/12313 [7:28:04<1:43:04, 2.64s/it] {'loss': 0.5471, 'grad_norm': 7.7580649386331455, 'learning_rate': 4.6073773016492267e-07, 'epoch': 0.81} 81%|████████ | 9968/12313 [7:28:04<1:43:04, 2.64s/it] 81%|████████ | 9969/12313 [7:28:07<1:43:32, 2.65s/it] {'loss': 0.5152, 'grad_norm': 4.824098525076564, 'learning_rate': 4.603573871811923e-07, 'epoch': 0.81} 81%|████████ | 9969/12313 [7:28:07<1:43:32, 2.65s/it] 81%|████████ | 9970/12313 [7:28:10<1:45:44, 2.71s/it] {'loss': 0.5088, 'grad_norm': 2.881153799640226, 'learning_rate': 4.5997718532993535e-07, 'epoch': 0.81} 81%|████████ | 9970/12313 [7:28:10<1:45:44, 2.71s/it] 81%|████████ | 9971/12313 [7:28:12<1:44:26, 2.68s/it] {'loss': 0.4321, 'grad_norm': 6.514585014658824, 'learning_rate': 4.5959712463746144e-07, 'epoch': 0.81} 81%|████████ | 9971/12313 [7:28:12<1:44:26, 2.68s/it] 81%|████████ | 9972/12313 [7:28:15<1:44:39, 2.68s/it] {'loss': 0.3917, 'grad_norm': 7.85731483603247, 'learning_rate': 4.5921720513006697e-07, 'epoch': 0.81} 81%|████████ | 9972/12313 [7:28:15<1:44:39, 2.68s/it] 81%|████████ | 9973/12313 [7:28:18<1:42:24, 2.63s/it] {'loss': 0.5726, 'grad_norm': 5.1598024248620495, 'learning_rate': 4.588374268340412e-07, 'epoch': 0.81} 81%|████████ | 9973/12313 [7:28:18<1:42:24, 2.63s/it] 81%|████████ | 9974/12313 [7:28:20<1:43:09, 2.65s/it] {'loss': 0.5283, 'grad_norm': 4.512666934916995, 'learning_rate': 4.584577897756634e-07, 'epoch': 0.81} 81%|████████ | 9974/12313 [7:28:20<1:43:09, 2.65s/it] 81%|████████ | 9975/12313 [7:28:23<1:40:31, 2.58s/it] {'loss': 0.4524, 'grad_norm': 8.411566127602082, 'learning_rate': 4.58078293981202e-07, 'epoch': 0.81} 81%|████████ | 9975/12313 [7:28:23<1:40:31, 2.58s/it] 81%|████████ | 9976/12313 [7:28:25<1:41:09, 2.60s/it] {'loss': 0.4705, 'grad_norm': 8.15739766766053, 'learning_rate': 4.5769893947691517e-07, 'epoch': 0.81} 81%|████████ | 9976/12313 [7:28:25<1:41:09, 2.60s/it] 81%|████████ | 9977/12313 [7:28:28<1:40:50, 2.59s/it] {'loss': 0.4252, 'grad_norm': 6.8035924289379235, 'learning_rate': 4.5731972628905357e-07, 'epoch': 0.81} 81%|████████ | 9977/12313 [7:28:28<1:40:50, 2.59s/it] 81%|████████ | 9978/12313 [7:28:31<1:42:14, 2.63s/it] {'loss': 0.5061, 'grad_norm': 4.227519769293889, 'learning_rate': 4.5694065444385564e-07, 'epoch': 0.81} 81%|████████ | 9978/12313 [7:28:31<1:42:14, 2.63s/it] 81%|████████ | 9979/12313 [7:28:33<1:42:06, 2.63s/it] {'loss': 0.4785, 'grad_norm': 6.581692909390787, 'learning_rate': 4.5656172396755156e-07, 'epoch': 0.81} 81%|████████ | 9979/12313 [7:28:33<1:42:06, 2.63s/it] 81%|████████ | 9980/12313 [7:28:36<1:44:18, 2.68s/it] {'loss': 0.5214, 'grad_norm': 4.606311339058521, 'learning_rate': 4.561829348863622e-07, 'epoch': 0.81} 81%|████████ | 9980/12313 [7:28:36<1:44:18, 2.68s/it] 81%|████████ | 9981/12313 [7:28:39<1:43:28, 2.66s/it] {'loss': 0.2991, 'grad_norm': 6.277335985489802, 'learning_rate': 4.55804287226497e-07, 'epoch': 0.81} 81%|████████ | 9981/12313 [7:28:39<1:43:28, 2.66s/it] 81%|████████ | 9982/12313 [7:28:41<1:42:26, 2.64s/it] {'loss': 0.4301, 'grad_norm': 5.904104493248545, 'learning_rate': 4.5542578101415576e-07, 'epoch': 0.81} 81%|████████ | 9982/12313 [7:28:41<1:42:26, 2.64s/it] 81%|████████ | 9983/12313 [7:28:44<1:43:55, 2.68s/it] {'loss': 0.6069, 'grad_norm': 3.728461827087679, 'learning_rate': 4.550474162755303e-07, 'epoch': 0.81} 81%|████████ | 9983/12313 [7:28:44<1:43:55, 2.68s/it] 81%|████████ | 9984/12313 [7:28:47<1:42:28, 2.64s/it] {'loss': 0.4381, 'grad_norm': 4.995689618250689, 'learning_rate': 4.546691930368008e-07, 'epoch': 0.81} 81%|████████ | 9984/12313 [7:28:47<1:42:28, 2.64s/it] 81%|████████ | 9985/12313 [7:28:49<1:41:40, 2.62s/it] {'loss': 0.5178, 'grad_norm': 6.939274659064955, 'learning_rate': 4.5429111132413773e-07, 'epoch': 0.81} 81%|████████ | 9985/12313 [7:28:49<1:41:40, 2.62s/it] 81%|████████ | 9986/12313 [7:28:52<1:40:53, 2.60s/it] {'loss': 0.3963, 'grad_norm': 6.822025362372183, 'learning_rate': 4.539131711637032e-07, 'epoch': 0.81} 81%|████████ | 9986/12313 [7:28:52<1:40:53, 2.60s/it] 81%|████████ | 9987/12313 [7:28:54<1:38:51, 2.55s/it] {'loss': 0.504, 'grad_norm': 6.644364425158026, 'learning_rate': 4.535353725816488e-07, 'epoch': 0.81} 81%|████████ | 9987/12313 [7:28:54<1:38:51, 2.55s/it] 81%|████████ | 9988/12313 [7:28:56<1:37:19, 2.51s/it] {'loss': 0.4743, 'grad_norm': 5.480802203291707, 'learning_rate': 4.5315771560411617e-07, 'epoch': 0.81} 81%|████████ | 9988/12313 [7:28:56<1:37:19, 2.51s/it] 81%|████████ | 9989/12313 [7:28:59<1:39:52, 2.58s/it] {'loss': 0.4656, 'grad_norm': 5.912237232094158, 'learning_rate': 4.5278020025723596e-07, 'epoch': 0.81} 81%|████████ | 9989/12313 [7:28:59<1:39:52, 2.58s/it] 81%|████████ | 9990/12313 [7:29:02<1:42:58, 2.66s/it] {'loss': 0.4281, 'grad_norm': 5.0776992221582935, 'learning_rate': 4.524028265671318e-07, 'epoch': 0.81} 81%|████████ | 9990/12313 [7:29:02<1:42:58, 2.66s/it] 81%|████████ | 9991/12313 [7:29:05<1:45:52, 2.74s/it] {'loss': 0.4505, 'grad_norm': 5.3270257232616025, 'learning_rate': 4.5202559455991473e-07, 'epoch': 0.81} 81%|████████ | 9991/12313 [7:29:05<1:45:52, 2.74s/it] 81%|████████ | 9992/12313 [7:29:08<1:44:06, 2.69s/it] {'loss': 0.4437, 'grad_norm': 6.190730407880878, 'learning_rate': 4.516485042616878e-07, 'epoch': 0.81} 81%|████████ | 9992/12313 [7:29:08<1:44:06, 2.69s/it] 81%|████████ | 9993/12313 [7:29:10<1:45:19, 2.72s/it] {'loss': 0.4011, 'grad_norm': 5.542928858101199, 'learning_rate': 4.512715556985442e-07, 'epoch': 0.81} 81%|████████ | 9993/12313 [7:29:10<1:45:19, 2.72s/it] 81%|████████ | 9994/12313 [7:29:13<1:43:33, 2.68s/it] {'loss': 0.4349, 'grad_norm': 7.642739194941673, 'learning_rate': 4.508947488965662e-07, 'epoch': 0.81} 81%|████████ | 9994/12313 [7:29:13<1:43:33, 2.68s/it] 81%|████████ | 9995/12313 [7:29:15<1:40:35, 2.60s/it] {'loss': 0.5984, 'grad_norm': 7.8072925191375395, 'learning_rate': 4.505180838818263e-07, 'epoch': 0.81} 81%|████████ | 9995/12313 [7:29:15<1:40:35, 2.60s/it] 81%|████████ | 9996/12313 [7:29:18<1:41:16, 2.62s/it] {'loss': 0.4646, 'grad_norm': 4.656078397528526, 'learning_rate': 4.501415606803888e-07, 'epoch': 0.81} 81%|████████ | 9996/12313 [7:29:18<1:41:16, 2.62s/it] 81%|████████ | 9997/12313 [7:29:21<1:42:15, 2.65s/it] {'loss': 0.4135, 'grad_norm': 4.525381591463549, 'learning_rate': 4.4976517931830637e-07, 'epoch': 0.81} 81%|████████ | 9997/12313 [7:29:21<1:42:15, 2.65s/it] 81%|████████ | 9998/12313 [7:29:23<1:41:30, 2.63s/it] {'loss': 0.5532, 'grad_norm': 5.890129159202458, 'learning_rate': 4.4938893982162253e-07, 'epoch': 0.81} 81%|████████ | 9998/12313 [7:29:23<1:41:30, 2.63s/it] 81%|████████ | 9999/12313 [7:29:26<1:40:34, 2.61s/it] {'loss': 0.5935, 'grad_norm': 6.903081683256655, 'learning_rate': 4.4901284221637113e-07, 'epoch': 0.81} 81%|████████ | 9999/12313 [7:29:26<1:40:34, 2.61s/it] 81%|████████ | 10000/12313 [7:29:29<1:40:42, 2.61s/it] {'loss': 0.4778, 'grad_norm': 19.18119241326282, 'learning_rate': 4.48636886528577e-07, 'epoch': 0.81} 81%|████████ | 10000/12313 [7:29:29<1:40:42, 2.61s/it] 81%|████████ | 10001/12313 [7:29:31<1:40:36, 2.61s/it] {'loss': 0.69, 'grad_norm': 6.157895091798137, 'learning_rate': 4.482610727842532e-07, 'epoch': 0.81} 81%|████████ | 10001/12313 [7:29:31<1:40:36, 2.61s/it] 81%|████████ | 10002/12313 [7:29:34<1:40:04, 2.60s/it] {'loss': 0.4685, 'grad_norm': 5.599909239938712, 'learning_rate': 4.47885401009405e-07, 'epoch': 0.81} 81%|████████ | 10002/12313 [7:29:34<1:40:04, 2.60s/it] 81%|████████ | 10003/12313 [7:29:36<1:42:13, 2.66s/it] {'loss': 0.5125, 'grad_norm': 6.484452032525161, 'learning_rate': 4.475098712300263e-07, 'epoch': 0.81} 81%|████████ | 10003/12313 [7:29:36<1:42:13, 2.66s/it] 81%|████████ | 10004/12313 [7:29:39<1:42:33, 2.67s/it] {'loss': 0.3792, 'grad_norm': 8.535162150811436, 'learning_rate': 4.4713448347210114e-07, 'epoch': 0.81} 81%|████████ | 10004/12313 [7:29:39<1:42:33, 2.67s/it] 81%|████████▏ | 10005/12313 [7:29:42<1:42:39, 2.67s/it] {'loss': 0.4935, 'grad_norm': 13.187357588897884, 'learning_rate': 4.4675923776160533e-07, 'epoch': 0.81} 81%|████████▏ | 10005/12313 [7:29:42<1:42:39, 2.67s/it] 81%|████████▏ | 10006/12313 [7:29:45<1:42:29, 2.67s/it] {'loss': 0.6142, 'grad_norm': 7.720879346905562, 'learning_rate': 4.463841341245043e-07, 'epoch': 0.81} 81%|████████▏ | 10006/12313 [7:29:45<1:42:29, 2.67s/it] 81%|████████▏ | 10007/12313 [7:29:47<1:42:57, 2.68s/it] {'loss': 0.436, 'grad_norm': 4.332760377433552, 'learning_rate': 4.460091725867524e-07, 'epoch': 0.81} 81%|████████▏ | 10007/12313 [7:29:47<1:42:57, 2.68s/it] 81%|████████▏ | 10008/12313 [7:29:50<1:40:07, 2.61s/it] {'loss': 0.4141, 'grad_norm': 6.514614871896753, 'learning_rate': 4.456343531742946e-07, 'epoch': 0.81} 81%|████████▏ | 10008/12313 [7:29:50<1:40:07, 2.61s/it] 81%|████████▏ | 10009/12313 [7:29:52<1:40:48, 2.63s/it] {'loss': 0.4301, 'grad_norm': 5.3062919238625605, 'learning_rate': 4.4525967591306757e-07, 'epoch': 0.81} 81%|████████▏ | 10009/12313 [7:29:52<1:40:48, 2.63s/it] 81%|████████▏ | 10010/12313 [7:29:55<1:37:44, 2.55s/it] {'loss': 0.5738, 'grad_norm': 9.134817655594835, 'learning_rate': 4.448851408289964e-07, 'epoch': 0.81} 81%|████████▏ | 10010/12313 [7:29:55<1:37:44, 2.55s/it] 81%|████████▏ | 10011/12313 [7:29:58<1:45:57, 2.76s/it] {'loss': 0.304, 'grad_norm': 4.873683627050256, 'learning_rate': 4.4451074794799627e-07, 'epoch': 0.81} 81%|████████▏ | 10011/12313 [7:29:58<1:45:57, 2.76s/it] 81%|████████▏ | 10012/12313 [7:30:01<1:44:49, 2.73s/it] {'loss': 0.3923, 'grad_norm': 3.4718602879812437, 'learning_rate': 4.4413649729597386e-07, 'epoch': 0.81} 81%|████████▏ | 10012/12313 [7:30:01<1:44:49, 2.73s/it] 81%|████████▏ | 10013/12313 [7:30:04<1:50:05, 2.87s/it] {'loss': 0.4372, 'grad_norm': 4.570485265053012, 'learning_rate': 4.43762388898826e-07, 'epoch': 0.81} 81%|████████▏ | 10013/12313 [7:30:04<1:50:05, 2.87s/it] 81%|████████▏ | 10014/12313 [7:30:07<1:50:07, 2.87s/it] {'loss': 0.4041, 'grad_norm': 3.9431671288156664, 'learning_rate': 4.4338842278243784e-07, 'epoch': 0.81} 81%|████████▏ | 10014/12313 [7:30:07<1:50:07, 2.87s/it] 81%|████████▏ | 10015/12313 [7:30:09<1:44:45, 2.74s/it] {'loss': 0.4428, 'grad_norm': 9.625205916829707, 'learning_rate': 4.4301459897268695e-07, 'epoch': 0.81} 81%|████████▏ | 10015/12313 [7:30:09<1:44:45, 2.74s/it] 81%|████████▏ | 10016/12313 [7:30:12<1:46:49, 2.79s/it] {'loss': 0.5457, 'grad_norm': 7.135399774793007, 'learning_rate': 4.426409174954391e-07, 'epoch': 0.81} 81%|████████▏ | 10016/12313 [7:30:12<1:46:49, 2.79s/it] 81%|████████▏ | 10017/12313 [7:30:15<1:44:27, 2.73s/it] {'loss': 0.4478, 'grad_norm': 7.241634226766735, 'learning_rate': 4.4226737837655106e-07, 'epoch': 0.81} 81%|████████▏ | 10017/12313 [7:30:15<1:44:27, 2.73s/it] 81%|████████▏ | 10018/12313 [7:30:17<1:44:00, 2.72s/it] {'loss': 0.4813, 'grad_norm': 5.474906261114699, 'learning_rate': 4.418939816418699e-07, 'epoch': 0.81} 81%|████████▏ | 10018/12313 [7:30:17<1:44:00, 2.72s/it] 81%|████████▏ | 10019/12313 [7:30:20<1:41:55, 2.67s/it] {'loss': 0.5229, 'grad_norm': 7.248288071920974, 'learning_rate': 4.4152072731723336e-07, 'epoch': 0.81} 81%|████████▏ | 10019/12313 [7:30:20<1:41:55, 2.67s/it] 81%|████████▏ | 10020/12313 [7:30:22<1:41:16, 2.65s/it] {'loss': 0.4383, 'grad_norm': 4.68601406629194, 'learning_rate': 4.411476154284683e-07, 'epoch': 0.81} 81%|████████▏ | 10020/12313 [7:30:22<1:41:16, 2.65s/it] 81%|████████▏ | 10021/12313 [7:30:25<1:43:24, 2.71s/it] {'loss': 0.434, 'grad_norm': 8.548985633110162, 'learning_rate': 4.407746460013912e-07, 'epoch': 0.81} 81%|████████▏ | 10021/12313 [7:30:25<1:43:24, 2.71s/it] 81%|████████▏ | 10022/12313 [7:30:28<1:42:50, 2.69s/it] {'loss': 0.6293, 'grad_norm': 3.598258827038912, 'learning_rate': 4.404018190618109e-07, 'epoch': 0.81} 81%|████████▏ | 10022/12313 [7:30:28<1:42:50, 2.69s/it] 81%|████████▏ | 10023/12313 [7:30:31<1:42:59, 2.70s/it] {'loss': 0.6338, 'grad_norm': 4.9162560687761525, 'learning_rate': 4.4002913463552457e-07, 'epoch': 0.81} 81%|████████▏ | 10023/12313 [7:30:31<1:42:59, 2.70s/it] 81%|████████▏ | 10024/12313 [7:30:33<1:43:16, 2.71s/it] {'loss': 0.4134, 'grad_norm': 5.176333776815743, 'learning_rate': 4.39656592748319e-07, 'epoch': 0.81} 81%|████████▏ | 10024/12313 [7:30:33<1:43:16, 2.71s/it] 81%|████████▏ | 10025/12313 [7:30:36<1:40:21, 2.63s/it] {'loss': 0.4122, 'grad_norm': 5.309773965001737, 'learning_rate': 4.392841934259731e-07, 'epoch': 0.81} 81%|████████▏ | 10025/12313 [7:30:36<1:40:21, 2.63s/it] 81%|████████▏ | 10026/12313 [7:30:38<1:38:59, 2.60s/it] {'loss': 0.5778, 'grad_norm': 4.204661866235209, 'learning_rate': 4.3891193669425567e-07, 'epoch': 0.81} 81%|████████▏ | 10026/12313 [7:30:38<1:38:59, 2.60s/it] 81%|████████▏ | 10027/12313 [7:30:41<1:38:10, 2.58s/it] {'loss': 0.5514, 'grad_norm': 5.654465245751199, 'learning_rate': 4.3853982257892335e-07, 'epoch': 0.81} 81%|████████▏ | 10027/12313 [7:30:41<1:38:10, 2.58s/it] 81%|████████▏ | 10028/12313 [7:30:43<1:37:48, 2.57s/it] {'loss': 0.537, 'grad_norm': 4.506512234014459, 'learning_rate': 4.3816785110572554e-07, 'epoch': 0.81} 81%|████████▏ | 10028/12313 [7:30:43<1:37:48, 2.57s/it] 81%|████████▏ | 10029/12313 [7:30:47<1:45:52, 2.78s/it] {'loss': 0.4989, 'grad_norm': 3.728172094291727, 'learning_rate': 4.3779602230040075e-07, 'epoch': 0.81} 81%|████████▏ | 10029/12313 [7:30:47<1:45:52, 2.78s/it] 81%|████████▏ | 10030/12313 [7:30:49<1:44:41, 2.75s/it] {'loss': 0.4539, 'grad_norm': 7.112854813373821, 'learning_rate': 4.3742433618867623e-07, 'epoch': 0.81} 81%|████████▏ | 10030/12313 [7:30:49<1:44:41, 2.75s/it] 81%|████████▏ | 10031/12313 [7:30:52<1:42:27, 2.69s/it] {'loss': 0.6694, 'grad_norm': 4.9529582527943745, 'learning_rate': 4.370527927962717e-07, 'epoch': 0.81} 81%|████████▏ | 10031/12313 [7:30:52<1:42:27, 2.69s/it] 81%|████████▏ | 10032/12313 [7:30:55<1:41:55, 2.68s/it] {'loss': 0.5285, 'grad_norm': 5.619899087759572, 'learning_rate': 4.366813921488966e-07, 'epoch': 0.81} 81%|████████▏ | 10032/12313 [7:30:55<1:41:55, 2.68s/it] 81%|████████▏ | 10033/12313 [7:30:57<1:40:48, 2.65s/it] {'loss': 0.4412, 'grad_norm': 5.707386464605084, 'learning_rate': 4.363101342722484e-07, 'epoch': 0.81} 81%|████████▏ | 10033/12313 [7:30:57<1:40:48, 2.65s/it] 81%|████████▏ | 10034/12313 [7:31:00<1:42:53, 2.71s/it] {'loss': 0.5749, 'grad_norm': 4.86389548489138, 'learning_rate': 4.359390191920176e-07, 'epoch': 0.81} 81%|████████▏ | 10034/12313 [7:31:00<1:42:53, 2.71s/it] 81%|████████▏ | 10035/12313 [7:31:03<1:41:34, 2.68s/it] {'loss': 0.3602, 'grad_norm': 5.994879397108235, 'learning_rate': 4.35568046933883e-07, 'epoch': 0.81} 81%|████████▏ | 10035/12313 [7:31:03<1:41:34, 2.68s/it] 82%|████████▏ | 10036/12313 [7:31:05<1:42:26, 2.70s/it] {'loss': 0.448, 'grad_norm': 3.0944128092213523, 'learning_rate': 4.3519721752351305e-07, 'epoch': 0.82} 82%|████████▏ | 10036/12313 [7:31:05<1:42:26, 2.70s/it] 82%|████████▏ | 10037/12313 [7:31:08<1:44:08, 2.75s/it] {'loss': 0.5155, 'grad_norm': 6.293183554065846, 'learning_rate': 4.3482653098656764e-07, 'epoch': 0.82} 82%|████████▏ | 10037/12313 [7:31:08<1:44:08, 2.75s/it] 82%|████████▏ | 10038/12313 [7:31:11<1:43:34, 2.73s/it] {'loss': 0.5044, 'grad_norm': 6.726079260676008, 'learning_rate': 4.3445598734869725e-07, 'epoch': 0.82} 82%|████████▏ | 10038/12313 [7:31:11<1:43:34, 2.73s/it] 82%|████████▏ | 10039/12313 [7:31:14<1:41:50, 2.69s/it] {'loss': 0.3846, 'grad_norm': 5.929353123479193, 'learning_rate': 4.340855866355409e-07, 'epoch': 0.82} 82%|████████▏ | 10039/12313 [7:31:14<1:41:50, 2.69s/it] 82%|████████▏ | 10040/12313 [7:31:16<1:44:17, 2.75s/it] {'loss': 0.4431, 'grad_norm': 4.346247885330788, 'learning_rate': 4.3371532887272747e-07, 'epoch': 0.82} 82%|████████▏ | 10040/12313 [7:31:16<1:44:17, 2.75s/it] 82%|████████▏ | 10041/12313 [7:31:19<1:40:34, 2.66s/it] {'loss': 0.4179, 'grad_norm': 6.564210241083292, 'learning_rate': 4.333452140858782e-07, 'epoch': 0.82} 82%|████████▏ | 10041/12313 [7:31:19<1:40:34, 2.66s/it] 82%|████████▏ | 10042/12313 [7:31:22<1:42:04, 2.70s/it] {'loss': 0.4109, 'grad_norm': 6.160516432704893, 'learning_rate': 4.3297524230060257e-07, 'epoch': 0.82} 82%|████████▏ | 10042/12313 [7:31:22<1:42:04, 2.70s/it] 82%|████████▏ | 10043/12313 [7:31:24<1:40:10, 2.65s/it] {'loss': 0.4727, 'grad_norm': 4.046212520398475, 'learning_rate': 4.326054135425001e-07, 'epoch': 0.82} 82%|████████▏ | 10043/12313 [7:31:24<1:40:10, 2.65s/it] 82%|████████▏ | 10044/12313 [7:31:27<1:40:53, 2.67s/it] {'loss': 0.5261, 'grad_norm': 4.21989184407862, 'learning_rate': 4.322357278371614e-07, 'epoch': 0.82} 82%|████████▏ | 10044/12313 [7:31:27<1:40:53, 2.67s/it] 82%|████████▏ | 10045/12313 [7:31:30<1:41:20, 2.68s/it] {'loss': 0.4124, 'grad_norm': 5.065472356667533, 'learning_rate': 4.3186618521016745e-07, 'epoch': 0.82} 82%|████████▏ | 10045/12313 [7:31:30<1:41:20, 2.68s/it] 82%|████████▏ | 10046/12313 [7:31:32<1:40:51, 2.67s/it] {'loss': 0.439, 'grad_norm': 6.317472489620383, 'learning_rate': 4.314967856870872e-07, 'epoch': 0.82} 82%|████████▏ | 10046/12313 [7:31:32<1:40:51, 2.67s/it] 82%|████████▏ | 10047/12313 [7:31:35<1:43:29, 2.74s/it] {'loss': 0.2964, 'grad_norm': 6.622990894266353, 'learning_rate': 4.31127529293483e-07, 'epoch': 0.82} 82%|████████▏ | 10047/12313 [7:31:35<1:43:29, 2.74s/it] 82%|████████▏ | 10048/12313 [7:31:38<1:43:13, 2.73s/it] {'loss': 0.4928, 'grad_norm': 5.343326462409636, 'learning_rate': 4.3075841605490414e-07, 'epoch': 0.82} 82%|████████▏ | 10048/12313 [7:31:38<1:43:13, 2.73s/it] 82%|████████▏ | 10049/12313 [7:31:40<1:40:03, 2.65s/it] {'loss': 0.3974, 'grad_norm': 7.146378891333632, 'learning_rate': 4.3038944599689105e-07, 'epoch': 0.82} 82%|████████▏ | 10049/12313 [7:31:40<1:40:03, 2.65s/it] 82%|████████▏ | 10050/12313 [7:31:43<1:38:32, 2.61s/it] {'loss': 0.3021, 'grad_norm': 4.603775445290206, 'learning_rate': 4.300206191449749e-07, 'epoch': 0.82} 82%|████████▏ | 10050/12313 [7:31:43<1:38:32, 2.61s/it] 82%|████████▏ | 10051/12313 [7:31:46<1:39:23, 2.64s/it] {'loss': 0.4456, 'grad_norm': 6.382432355081491, 'learning_rate': 4.2965193552467753e-07, 'epoch': 0.82} 82%|████████▏ | 10051/12313 [7:31:46<1:39:23, 2.64s/it] 82%|████████▏ | 10052/12313 [7:31:48<1:39:44, 2.65s/it] {'loss': 0.3823, 'grad_norm': 5.191619470887934, 'learning_rate': 4.292833951615083e-07, 'epoch': 0.82} 82%|████████▏ | 10052/12313 [7:31:48<1:39:44, 2.65s/it] 82%|████████▏ | 10053/12313 [7:31:51<1:40:42, 2.67s/it] {'loss': 0.6601, 'grad_norm': 3.9888610071010224, 'learning_rate': 4.289149980809698e-07, 'epoch': 0.82} 82%|████████▏ | 10053/12313 [7:31:51<1:40:42, 2.67s/it] 82%|████████▏ | 10054/12313 [7:31:54<1:44:08, 2.77s/it] {'loss': 0.4692, 'grad_norm': 8.500802989835256, 'learning_rate': 4.2854674430855224e-07, 'epoch': 0.82} 82%|████████▏ | 10054/12313 [7:31:54<1:44:08, 2.77s/it] 82%|████████▏ | 10055/12313 [7:31:57<1:43:17, 2.74s/it] {'loss': 0.4196, 'grad_norm': 4.251250937241956, 'learning_rate': 4.281786338697369e-07, 'epoch': 0.82} 82%|████████▏ | 10055/12313 [7:31:57<1:43:17, 2.74s/it] 82%|████████▏ | 10056/12313 [7:31:59<1:43:39, 2.76s/it] {'loss': 0.7008, 'grad_norm': 5.93357022678489, 'learning_rate': 4.278106667899945e-07, 'epoch': 0.82} 82%|████████▏ | 10056/12313 [7:31:59<1:43:39, 2.76s/it] 82%|████████▏ | 10057/12313 [7:32:02<1:43:40, 2.76s/it] {'loss': 0.4552, 'grad_norm': 4.588729939434148, 'learning_rate': 4.274428430947872e-07, 'epoch': 0.82} 82%|████████▏ | 10057/12313 [7:32:02<1:43:40, 2.76s/it] 82%|████████▏ | 10058/12313 [7:32:05<1:42:07, 2.72s/it] {'loss': 0.3992, 'grad_norm': 6.368147028406131, 'learning_rate': 4.270751628095668e-07, 'epoch': 0.82} 82%|████████▏ | 10058/12313 [7:32:05<1:42:07, 2.72s/it] 82%|████████▏ | 10059/12313 [7:32:08<1:42:33, 2.73s/it] {'loss': 0.4615, 'grad_norm': 6.6517461490094565, 'learning_rate': 4.2670762595977356e-07, 'epoch': 0.82} 82%|████████▏ | 10059/12313 [7:32:08<1:42:33, 2.73s/it] 82%|████████▏ | 10060/12313 [7:32:10<1:40:34, 2.68s/it] {'loss': 0.5174, 'grad_norm': 8.295784988993061, 'learning_rate': 4.2634023257084074e-07, 'epoch': 0.82} 82%|████████▏ | 10060/12313 [7:32:10<1:40:34, 2.68s/it] 82%|████████▏ | 10061/12313 [7:32:13<1:37:16, 2.59s/it] {'loss': 0.3388, 'grad_norm': 5.523298295622445, 'learning_rate': 4.259729826681891e-07, 'epoch': 0.82} 82%|████████▏ | 10061/12313 [7:32:13<1:37:16, 2.59s/it] 82%|████████▏ | 10062/12313 [7:32:15<1:38:35, 2.63s/it] {'loss': 0.4553, 'grad_norm': 3.8441813036836137, 'learning_rate': 4.2560587627722973e-07, 'epoch': 0.82} 82%|████████▏ | 10062/12313 [7:32:15<1:38:35, 2.63s/it] 82%|████████▏ | 10063/12313 [7:32:18<1:37:46, 2.61s/it] {'loss': 0.5312, 'grad_norm': 6.951945297996189, 'learning_rate': 4.2523891342336506e-07, 'epoch': 0.82} 82%|████████▏ | 10063/12313 [7:32:18<1:37:46, 2.61s/it] 82%|████████▏ | 10064/12313 [7:32:20<1:37:45, 2.61s/it] {'loss': 0.3208, 'grad_norm': 4.871582562337795, 'learning_rate': 4.2487209413198784e-07, 'epoch': 0.82} 82%|████████▏ | 10064/12313 [7:32:20<1:37:45, 2.61s/it] 82%|████████▏ | 10065/12313 [7:32:23<1:38:49, 2.64s/it] {'loss': 0.4115, 'grad_norm': 7.346570602660871, 'learning_rate': 4.245054184284786e-07, 'epoch': 0.82} 82%|████████▏ | 10065/12313 [7:32:23<1:38:49, 2.64s/it] 82%|████████▏ | 10066/12313 [7:32:26<1:37:06, 2.59s/it] {'loss': 0.5406, 'grad_norm': 4.392641843655781, 'learning_rate': 4.2413888633821064e-07, 'epoch': 0.82} 82%|████████▏ | 10066/12313 [7:32:26<1:37:06, 2.59s/it] 82%|████████▏ | 10067/12313 [7:32:28<1:39:39, 2.66s/it] {'loss': 0.4598, 'grad_norm': 7.23578723028704, 'learning_rate': 4.237724978865454e-07, 'epoch': 0.82} 82%|████████▏ | 10067/12313 [7:32:28<1:39:39, 2.66s/it] 82%|████████▏ | 10068/12313 [7:32:31<1:39:39, 2.66s/it] {'loss': 0.5109, 'grad_norm': 3.553356832932114, 'learning_rate': 4.234062530988342e-07, 'epoch': 0.82} 82%|████████▏ | 10068/12313 [7:32:31<1:39:39, 2.66s/it] 82%|████████▏ | 10069/12313 [7:32:34<1:40:28, 2.69s/it] {'loss': 0.6022, 'grad_norm': 5.0655040941198255, 'learning_rate': 4.2304015200042095e-07, 'epoch': 0.82} 82%|████████▏ | 10069/12313 [7:32:34<1:40:28, 2.69s/it] 82%|████████▏ | 10070/12313 [7:32:36<1:38:28, 2.63s/it] {'loss': 0.4315, 'grad_norm': 4.3230856397717226, 'learning_rate': 4.2267419461663626e-07, 'epoch': 0.82} 82%|████████▏ | 10070/12313 [7:32:36<1:38:28, 2.63s/it] 82%|████████▏ | 10071/12313 [7:32:39<1:36:38, 2.59s/it] {'loss': 0.489, 'grad_norm': 4.6434028931530715, 'learning_rate': 4.223083809728032e-07, 'epoch': 0.82} 82%|████████▏ | 10071/12313 [7:32:39<1:36:38, 2.59s/it] 82%|████████▏ | 10072/12313 [7:32:42<1:39:08, 2.65s/it] {'loss': 0.5609, 'grad_norm': 3.9428152943512673, 'learning_rate': 4.219427110942348e-07, 'epoch': 0.82} 82%|████████▏ | 10072/12313 [7:32:42<1:39:08, 2.65s/it] 82%|████████▏ | 10073/12313 [7:32:44<1:36:37, 2.59s/it] {'loss': 0.5247, 'grad_norm': 3.9140488822139243, 'learning_rate': 4.215771850062328e-07, 'epoch': 0.82} 82%|████████▏ | 10073/12313 [7:32:44<1:36:37, 2.59s/it] 82%|████████▏ | 10074/12313 [7:32:47<1:37:03, 2.60s/it] {'loss': 0.4456, 'grad_norm': 5.179289360939204, 'learning_rate': 4.2121180273408976e-07, 'epoch': 0.82} 82%|████████▏ | 10074/12313 [7:32:47<1:37:03, 2.60s/it] 82%|████████▏ | 10075/12313 [7:32:49<1:36:49, 2.60s/it] {'loss': 0.4248, 'grad_norm': 6.433424694858643, 'learning_rate': 4.2084656430308765e-07, 'epoch': 0.82} 82%|████████▏ | 10075/12313 [7:32:49<1:36:49, 2.60s/it] 82%|████████▏ | 10076/12313 [7:32:52<1:40:58, 2.71s/it] {'loss': 0.4796, 'grad_norm': 3.365422337217589, 'learning_rate': 4.204814697384993e-07, 'epoch': 0.82} 82%|████████▏ | 10076/12313 [7:32:52<1:40:58, 2.71s/it] 82%|████████▏ | 10077/12313 [7:32:55<1:40:21, 2.69s/it] {'loss': 0.6421, 'grad_norm': 7.511774642841241, 'learning_rate': 4.2011651906558815e-07, 'epoch': 0.82} 82%|████████▏ | 10077/12313 [7:32:55<1:40:21, 2.69s/it] 82%|████████▏ | 10078/12313 [7:32:58<1:39:33, 2.67s/it] {'loss': 0.3865, 'grad_norm': 8.546191005512076, 'learning_rate': 4.1975171230960563e-07, 'epoch': 0.82} 82%|████████▏ | 10078/12313 [7:32:58<1:39:33, 2.67s/it] 82%|████████▏ | 10079/12313 [7:33:00<1:38:26, 2.64s/it] {'loss': 0.453, 'grad_norm': 4.417370933628543, 'learning_rate': 4.193870494957958e-07, 'epoch': 0.82} 82%|████████▏ | 10079/12313 [7:33:00<1:38:26, 2.64s/it] 82%|████████▏ | 10080/12313 [7:33:03<1:38:39, 2.65s/it] {'loss': 0.4526, 'grad_norm': 6.377398869498588, 'learning_rate': 4.190225306493906e-07, 'epoch': 0.82} 82%|████████▏ | 10080/12313 [7:33:03<1:38:39, 2.65s/it] 82%|████████▏ | 10081/12313 [7:33:05<1:34:58, 2.55s/it] {'loss': 0.5708, 'grad_norm': 4.132468182015889, 'learning_rate': 4.186581557956124e-07, 'epoch': 0.82} 82%|████████▏ | 10081/12313 [7:33:05<1:34:58, 2.55s/it] 82%|████████▏ | 10082/12313 [7:33:08<1:37:00, 2.61s/it] {'loss': 0.5129, 'grad_norm': 5.42105998479533, 'learning_rate': 4.1829392495967485e-07, 'epoch': 0.82} 82%|████████▏ | 10082/12313 [7:33:08<1:37:00, 2.61s/it] 82%|████████▏ | 10083/12313 [7:33:10<1:36:45, 2.60s/it] {'loss': 0.5831, 'grad_norm': 5.495216782158462, 'learning_rate': 4.1792983816677987e-07, 'epoch': 0.82} 82%|████████▏ | 10083/12313 [7:33:10<1:36:45, 2.60s/it] 82%|████████▏ | 10084/12313 [7:33:13<1:37:07, 2.61s/it] {'loss': 0.5072, 'grad_norm': 3.395888728224807, 'learning_rate': 4.175658954421208e-07, 'epoch': 0.82} 82%|████████▏ | 10084/12313 [7:33:13<1:37:07, 2.61s/it] 82%|████████▏ | 10085/12313 [7:33:16<1:40:05, 2.70s/it] {'loss': 0.5714, 'grad_norm': 5.212070130995405, 'learning_rate': 4.172020968108814e-07, 'epoch': 0.82} 82%|████████▏ | 10085/12313 [7:33:16<1:40:05, 2.70s/it] 82%|████████▏ | 10086/12313 [7:33:19<1:44:24, 2.81s/it] {'loss': 0.5516, 'grad_norm': 5.41209526251742, 'learning_rate': 4.168384422982338e-07, 'epoch': 0.82} 82%|████████▏ | 10086/12313 [7:33:19<1:44:24, 2.81s/it] 82%|████████▏ | 10087/12313 [7:33:22<1:42:49, 2.77s/it] {'loss': 0.54, 'grad_norm': 5.766740901549195, 'learning_rate': 4.164749319293404e-07, 'epoch': 0.82} 82%|████████▏ | 10087/12313 [7:33:22<1:42:49, 2.77s/it] 82%|████████▏ | 10088/12313 [7:33:24<1:39:15, 2.68s/it] {'loss': 0.5872, 'grad_norm': 12.522728877358098, 'learning_rate': 4.1611156572935545e-07, 'epoch': 0.82} 82%|████████▏ | 10088/12313 [7:33:24<1:39:15, 2.68s/it] 82%|████████▏ | 10089/12313 [7:33:27<1:38:17, 2.65s/it] {'loss': 0.3578, 'grad_norm': 7.055171169509863, 'learning_rate': 4.1574834372342053e-07, 'epoch': 0.82} 82%|████████▏ | 10089/12313 [7:33:27<1:38:17, 2.65s/it] 82%|████████▏ | 10090/12313 [7:33:29<1:37:35, 2.63s/it] {'loss': 0.4256, 'grad_norm': 10.127142685814183, 'learning_rate': 4.153852659366697e-07, 'epoch': 0.82} 82%|████████▏ | 10090/12313 [7:33:29<1:37:35, 2.63s/it] 82%|████████▏ | 10091/12313 [7:33:32<1:39:44, 2.69s/it] {'loss': 0.5271, 'grad_norm': 6.833361098639403, 'learning_rate': 4.1502233239422624e-07, 'epoch': 0.82} 82%|████████▏ | 10091/12313 [7:33:32<1:39:44, 2.69s/it] 82%|████████▏ | 10092/12313 [7:33:35<1:39:42, 2.69s/it] {'loss': 0.4952, 'grad_norm': 3.7527822726714057, 'learning_rate': 4.14659543121203e-07, 'epoch': 0.82} 82%|████████▏ | 10092/12313 [7:33:35<1:39:42, 2.69s/it] 82%|████████▏ | 10093/12313 [7:33:38<1:39:15, 2.68s/it] {'loss': 0.4493, 'grad_norm': 8.853385017010872, 'learning_rate': 4.1429689814270284e-07, 'epoch': 0.82} 82%|████████▏ | 10093/12313 [7:33:38<1:39:15, 2.68s/it] 82%|████████▏ | 10094/12313 [7:33:40<1:37:21, 2.63s/it] {'loss': 0.4145, 'grad_norm': 4.804086833503905, 'learning_rate': 4.139343974838181e-07, 'epoch': 0.82} 82%|████████▏ | 10094/12313 [7:33:40<1:37:21, 2.63s/it] 82%|████████▏ | 10095/12313 [7:33:43<1:40:54, 2.73s/it] {'loss': 0.3967, 'grad_norm': 2.9666099986165473, 'learning_rate': 4.135720411696334e-07, 'epoch': 0.82} 82%|████████▏ | 10095/12313 [7:33:43<1:40:54, 2.73s/it] 82%|████████▏ | 10096/12313 [7:33:46<1:40:39, 2.72s/it] {'loss': 0.3838, 'grad_norm': 6.498314301540783, 'learning_rate': 4.132098292252204e-07, 'epoch': 0.82} 82%|████████▏ | 10096/12313 [7:33:46<1:40:39, 2.72s/it] 82%|████████▏ | 10097/12313 [7:33:48<1:39:09, 2.68s/it] {'loss': 0.4869, 'grad_norm': 5.829495180353205, 'learning_rate': 4.128477616756432e-07, 'epoch': 0.82} 82%|████████▏ | 10097/12313 [7:33:48<1:39:09, 2.68s/it] 82%|████████▏ | 10098/12313 [7:33:51<1:37:45, 2.65s/it] {'loss': 0.4165, 'grad_norm': 5.33074506215202, 'learning_rate': 4.124858385459554e-07, 'epoch': 0.82} 82%|████████▏ | 10098/12313 [7:33:51<1:37:45, 2.65s/it] 82%|████████▏ | 10099/12313 [7:33:54<1:37:59, 2.66s/it] {'loss': 0.546, 'grad_norm': 4.833962665019804, 'learning_rate': 4.1212405986119975e-07, 'epoch': 0.82} 82%|████████▏ | 10099/12313 [7:33:54<1:37:59, 2.66s/it] 82%|████████▏ | 10100/12313 [7:33:56<1:38:33, 2.67s/it] {'loss': 0.4589, 'grad_norm': 4.555452441900101, 'learning_rate': 4.117624256464084e-07, 'epoch': 0.82} 82%|████████▏ | 10100/12313 [7:33:56<1:38:33, 2.67s/it] 82%|████████▏ | 10101/12313 [7:33:59<1:36:52, 2.63s/it] {'loss': 0.482, 'grad_norm': 3.2866233085274623, 'learning_rate': 4.114009359266061e-07, 'epoch': 0.82} 82%|████████▏ | 10101/12313 [7:33:59<1:36:52, 2.63s/it] 82%|████████▏ | 10102/12313 [7:34:02<1:39:23, 2.70s/it] {'loss': 0.3467, 'grad_norm': 5.633628638636074, 'learning_rate': 4.1103959072680446e-07, 'epoch': 0.82} 82%|████████▏ | 10102/12313 [7:34:02<1:39:23, 2.70s/it] 82%|████████▏ | 10103/12313 [7:34:04<1:38:14, 2.67s/it] {'loss': 0.5319, 'grad_norm': 2.6695213124316703, 'learning_rate': 4.106783900720074e-07, 'epoch': 0.82} 82%|████████▏ | 10103/12313 [7:34:04<1:38:14, 2.67s/it] 82%|████████▏ | 10104/12313 [7:34:07<1:37:03, 2.64s/it] {'loss': 0.4174, 'grad_norm': 6.398738658758001, 'learning_rate': 4.1031733398720906e-07, 'epoch': 0.82} 82%|████████▏ | 10104/12313 [7:34:07<1:37:03, 2.64s/it] 82%|████████▏ | 10105/12313 [7:34:10<1:38:17, 2.67s/it] {'loss': 0.4107, 'grad_norm': 10.457002673876866, 'learning_rate': 4.099564224973915e-07, 'epoch': 0.82} 82%|████████▏ | 10105/12313 [7:34:10<1:38:17, 2.67s/it] 82%|████████▏ | 10106/12313 [7:34:12<1:35:00, 2.58s/it] {'loss': 0.5535, 'grad_norm': 3.806681811308465, 'learning_rate': 4.0959565562752767e-07, 'epoch': 0.82} 82%|████████▏ | 10106/12313 [7:34:12<1:35:00, 2.58s/it] 82%|████████▏ | 10107/12313 [7:34:15<1:37:20, 2.65s/it] {'loss': 0.5205, 'grad_norm': 3.11228162293575, 'learning_rate': 4.092350334025816e-07, 'epoch': 0.82} 82%|████████▏ | 10107/12313 [7:34:15<1:37:20, 2.65s/it] 82%|████████▏ | 10108/12313 [7:34:17<1:38:13, 2.67s/it] {'loss': 0.5066, 'grad_norm': 6.448242144302385, 'learning_rate': 4.0887455584750547e-07, 'epoch': 0.82} 82%|████████▏ | 10108/12313 [7:34:17<1:38:13, 2.67s/it] 82%|████████▏ | 10109/12313 [7:34:20<1:36:51, 2.64s/it] {'loss': 0.6172, 'grad_norm': 4.566144679361284, 'learning_rate': 4.0851422298724354e-07, 'epoch': 0.82} 82%|████████▏ | 10109/12313 [7:34:20<1:36:51, 2.64s/it] 82%|████████▏ | 10110/12313 [7:34:23<1:39:50, 2.72s/it] {'loss': 0.4025, 'grad_norm': 5.487995399366438, 'learning_rate': 4.081540348467278e-07, 'epoch': 0.82} 82%|████████▏ | 10110/12313 [7:34:23<1:39:50, 2.72s/it] 82%|████████▏ | 10111/12313 [7:34:26<1:40:46, 2.75s/it] {'loss': 0.4171, 'grad_norm': 5.156550511751783, 'learning_rate': 4.0779399145088247e-07, 'epoch': 0.82} 82%|████████▏ | 10111/12313 [7:34:26<1:40:46, 2.75s/it] 82%|████████▏ | 10112/12313 [7:34:29<1:41:43, 2.77s/it] {'loss': 0.4858, 'grad_norm': 5.693851091075354, 'learning_rate': 4.074340928246201e-07, 'epoch': 0.82} 82%|████████▏ | 10112/12313 [7:34:29<1:41:43, 2.77s/it] 82%|████████▏ | 10113/12313 [7:34:31<1:39:13, 2.71s/it] {'loss': 0.4029, 'grad_norm': 10.430588612793592, 'learning_rate': 4.0707433899284333e-07, 'epoch': 0.82} 82%|████████▏ | 10113/12313 [7:34:31<1:39:13, 2.71s/it] 82%|████████▏ | 10114/12313 [7:34:35<1:49:28, 2.99s/it] {'loss': 0.4363, 'grad_norm': 6.3587031868115975, 'learning_rate': 4.067147299804458e-07, 'epoch': 0.82} 82%|████████▏ | 10114/12313 [7:34:35<1:49:28, 2.99s/it] 82%|████████▏ | 10115/12313 [7:34:37<1:43:16, 2.82s/it] {'loss': 0.5734, 'grad_norm': 6.255964758565154, 'learning_rate': 4.063552658123102e-07, 'epoch': 0.82} 82%|████████▏ | 10115/12313 [7:34:37<1:43:16, 2.82s/it] 82%|████████▏ | 10116/12313 [7:34:40<1:41:51, 2.78s/it] {'loss': 0.5963, 'grad_norm': 5.605733474613555, 'learning_rate': 4.0599594651330956e-07, 'epoch': 0.82} 82%|████████▏ | 10116/12313 [7:34:40<1:41:51, 2.78s/it] 82%|████████▏ | 10117/12313 [7:34:43<1:41:02, 2.76s/it] {'loss': 0.5508, 'grad_norm': 4.079641751194994, 'learning_rate': 4.0563677210830763e-07, 'epoch': 0.82} 82%|████████▏ | 10117/12313 [7:34:43<1:41:02, 2.76s/it] 82%|████████▏ | 10118/12313 [7:34:45<1:37:46, 2.67s/it] {'loss': 0.3412, 'grad_norm': 6.640081496102607, 'learning_rate': 4.0527774262215687e-07, 'epoch': 0.82} 82%|████████▏ | 10118/12313 [7:34:45<1:37:46, 2.67s/it] 82%|████████▏ | 10119/12313 [7:34:48<1:41:11, 2.77s/it] {'loss': 0.3926, 'grad_norm': 5.1973586672338845, 'learning_rate': 4.049188580796995e-07, 'epoch': 0.82} 82%|████████▏ | 10119/12313 [7:34:48<1:41:11, 2.77s/it] 82%|████████▏ | 10120/12313 [7:34:51<1:39:20, 2.72s/it] {'loss': 0.3678, 'grad_norm': 6.292394610949972, 'learning_rate': 4.0456011850576985e-07, 'epoch': 0.82} 82%|████████▏ | 10120/12313 [7:34:51<1:39:20, 2.72s/it] 82%|████████▏ | 10121/12313 [7:34:53<1:39:11, 2.71s/it] {'loss': 0.4533, 'grad_norm': 7.307142444437324, 'learning_rate': 4.0420152392518926e-07, 'epoch': 0.82} 82%|████████▏ | 10121/12313 [7:34:53<1:39:11, 2.71s/it] 82%|████████▏ | 10122/12313 [7:34:56<1:42:42, 2.81s/it] {'loss': 0.535, 'grad_norm': 4.1068923254830345, 'learning_rate': 4.038430743627714e-07, 'epoch': 0.82} 82%|████████▏ | 10122/12313 [7:34:56<1:42:42, 2.81s/it] 82%|████████▏ | 10123/12313 [7:34:59<1:41:02, 2.77s/it] {'loss': 0.5241, 'grad_norm': 10.080633399311306, 'learning_rate': 4.0348476984331977e-07, 'epoch': 0.82} 82%|████████▏ | 10123/12313 [7:34:59<1:41:02, 2.77s/it] 82%|████████▏ | 10124/12313 [7:35:02<1:39:05, 2.72s/it] {'loss': 0.6826, 'grad_norm': 5.11781610289055, 'learning_rate': 4.031266103916262e-07, 'epoch': 0.82} 82%|████████▏ | 10124/12313 [7:35:02<1:39:05, 2.72s/it] 82%|████████▏ | 10125/12313 [7:35:04<1:39:00, 2.72s/it] {'loss': 0.5244, 'grad_norm': 10.246403450559914, 'learning_rate': 4.0276859603247317e-07, 'epoch': 0.82} 82%|████████▏ | 10125/12313 [7:35:04<1:39:00, 2.72s/it] 82%|████████▏ | 10126/12313 [7:35:07<1:35:21, 2.62s/it] {'loss': 0.5099, 'grad_norm': 5.888899275267595, 'learning_rate': 4.0241072679063437e-07, 'epoch': 0.82} 82%|████████▏ | 10126/12313 [7:35:07<1:35:21, 2.62s/it] 82%|████████▏ | 10127/12313 [7:35:10<1:37:43, 2.68s/it] {'loss': 0.4915, 'grad_norm': 4.555118993832531, 'learning_rate': 4.02053002690872e-07, 'epoch': 0.82} 82%|████████▏ | 10127/12313 [7:35:10<1:37:43, 2.68s/it] 82%|████████▏ | 10128/12313 [7:35:12<1:37:42, 2.68s/it] {'loss': 0.6126, 'grad_norm': 6.444960720915754, 'learning_rate': 4.016954237579382e-07, 'epoch': 0.82} 82%|████████▏ | 10128/12313 [7:35:12<1:37:42, 2.68s/it] 82%|████████▏ | 10129/12313 [7:35:15<1:37:17, 2.67s/it] {'loss': 0.4286, 'grad_norm': 5.028533921984391, 'learning_rate': 4.013379900165756e-07, 'epoch': 0.82} 82%|████████▏ | 10129/12313 [7:35:15<1:37:17, 2.67s/it] 82%|████████▏ | 10130/12313 [7:35:18<1:39:02, 2.72s/it] {'loss': 0.634, 'grad_norm': 9.57264830043398, 'learning_rate': 4.009807014915179e-07, 'epoch': 0.82} 82%|████████▏ | 10130/12313 [7:35:18<1:39:02, 2.72s/it] 82%|████████▏ | 10131/12313 [7:35:20<1:37:26, 2.68s/it] {'loss': 0.519, 'grad_norm': 3.535035189189592, 'learning_rate': 4.006235582074866e-07, 'epoch': 0.82} 82%|████████▏ | 10131/12313 [7:35:20<1:37:26, 2.68s/it] 82%|████████▏ | 10132/12313 [7:35:23<1:39:08, 2.73s/it] {'loss': 0.4938, 'grad_norm': 5.20695512400172, 'learning_rate': 4.002665601891939e-07, 'epoch': 0.82} 82%|████████▏ | 10132/12313 [7:35:23<1:39:08, 2.73s/it] 82%|████████▏ | 10133/12313 [7:35:26<1:38:10, 2.70s/it] {'loss': 0.3337, 'grad_norm': 4.97816750794527, 'learning_rate': 3.9990970746134283e-07, 'epoch': 0.82} 82%|████████▏ | 10133/12313 [7:35:26<1:38:10, 2.70s/it] 82%|████████▏ | 10134/12313 [7:35:29<1:38:03, 2.70s/it] {'loss': 0.4538, 'grad_norm': 4.2871354698006945, 'learning_rate': 3.99553000048625e-07, 'epoch': 0.82} 82%|████████▏ | 10134/12313 [7:35:29<1:38:03, 2.70s/it] 82%|████████▏ | 10135/12313 [7:35:31<1:39:17, 2.74s/it] {'loss': 0.5731, 'grad_norm': 17.838493735891614, 'learning_rate': 3.991964379757232e-07, 'epoch': 0.82} 82%|████████▏ | 10135/12313 [7:35:31<1:39:17, 2.74s/it] 82%|████████▏ | 10136/12313 [7:35:34<1:38:19, 2.71s/it] {'loss': 0.5173, 'grad_norm': 5.170687504793261, 'learning_rate': 3.988400212673099e-07, 'epoch': 0.82} 82%|████████▏ | 10136/12313 [7:35:34<1:38:19, 2.71s/it] 82%|████████▏ | 10137/12313 [7:35:37<1:37:35, 2.69s/it] {'loss': 0.5226, 'grad_norm': 6.426083270641829, 'learning_rate': 3.9848374994804734e-07, 'epoch': 0.82} 82%|████████▏ | 10137/12313 [7:35:37<1:37:35, 2.69s/it] 82%|████████▏ | 10138/12313 [7:35:39<1:34:29, 2.61s/it] {'loss': 0.4277, 'grad_norm': 3.7713552353157476, 'learning_rate': 3.9812762404258605e-07, 'epoch': 0.82} 82%|████████▏ | 10138/12313 [7:35:39<1:34:29, 2.61s/it] 82%|████████▏ | 10139/12313 [7:35:42<1:33:08, 2.57s/it] {'loss': 0.4386, 'grad_norm': 6.423516094367269, 'learning_rate': 3.977716435755702e-07, 'epoch': 0.82} 82%|████████▏ | 10139/12313 [7:35:42<1:33:08, 2.57s/it] 82%|████████▏ | 10140/12313 [7:35:44<1:33:47, 2.59s/it] {'loss': 0.38, 'grad_norm': 5.7394083306414325, 'learning_rate': 3.9741580857163036e-07, 'epoch': 0.82} 82%|████████▏ | 10140/12313 [7:35:44<1:33:47, 2.59s/it] 82%|████████▏ | 10141/12313 [7:35:47<1:37:33, 2.70s/it] {'loss': 0.3539, 'grad_norm': 5.863968524558383, 'learning_rate': 3.9706011905538827e-07, 'epoch': 0.82} 82%|████████▏ | 10141/12313 [7:35:47<1:37:33, 2.70s/it] 82%|████████▏ | 10142/12313 [7:35:50<1:41:34, 2.81s/it] {'loss': 0.6156, 'grad_norm': 7.197622618993387, 'learning_rate': 3.9670457505145643e-07, 'epoch': 0.82} 82%|████████▏ | 10142/12313 [7:35:50<1:41:34, 2.81s/it] 82%|████████▏ | 10143/12313 [7:35:53<1:40:14, 2.77s/it] {'loss': 0.5406, 'grad_norm': 9.307547582177493, 'learning_rate': 3.963491765844371e-07, 'epoch': 0.82} 82%|████████▏ | 10143/12313 [7:35:53<1:40:14, 2.77s/it] 82%|████████▏ | 10144/12313 [7:35:55<1:38:12, 2.72s/it] {'loss': 0.431, 'grad_norm': 3.8307158333938185, 'learning_rate': 3.959939236789212e-07, 'epoch': 0.82} 82%|████████▏ | 10144/12313 [7:35:55<1:38:12, 2.72s/it] 82%|████████▏ | 10145/12313 [7:35:58<1:35:51, 2.65s/it] {'loss': 0.4988, 'grad_norm': 5.112333570814876, 'learning_rate': 3.9563881635948984e-07, 'epoch': 0.82} 82%|████████▏ | 10145/12313 [7:35:58<1:35:51, 2.65s/it] 82%|████████▏ | 10146/12313 [7:36:01<1:35:36, 2.65s/it] {'loss': 0.6043, 'grad_norm': 5.863847427204812, 'learning_rate': 3.9528385465071594e-07, 'epoch': 0.82} 82%|████████▏ | 10146/12313 [7:36:01<1:35:36, 2.65s/it] 82%|████████▏ | 10147/12313 [7:36:03<1:35:19, 2.64s/it] {'loss': 0.431, 'grad_norm': 5.626678027585737, 'learning_rate': 3.949290385771595e-07, 'epoch': 0.82} 82%|████████▏ | 10147/12313 [7:36:03<1:35:19, 2.64s/it] 82%|████████▏ | 10148/12313 [7:36:06<1:33:58, 2.60s/it] {'loss': 0.5401, 'grad_norm': 4.081512795131598, 'learning_rate': 3.945743681633729e-07, 'epoch': 0.82} 82%|████████▏ | 10148/12313 [7:36:06<1:33:58, 2.60s/it] 82%|████████▏ | 10149/12313 [7:36:08<1:35:04, 2.64s/it] {'loss': 0.5523, 'grad_norm': 4.533747792384772, 'learning_rate': 3.9421984343389756e-07, 'epoch': 0.82} 82%|████████▏ | 10149/12313 [7:36:08<1:35:04, 2.64s/it] 82%|████████▏ | 10150/12313 [7:36:11<1:35:19, 2.64s/it] {'loss': 0.4706, 'grad_norm': 3.743850075034431, 'learning_rate': 3.9386546441326444e-07, 'epoch': 0.82} 82%|████████▏ | 10150/12313 [7:36:11<1:35:19, 2.64s/it] 82%|████████▏ | 10151/12313 [7:36:14<1:35:15, 2.64s/it] {'loss': 0.6842, 'grad_norm': 23.913229954006653, 'learning_rate': 3.9351123112599393e-07, 'epoch': 0.82} 82%|████████▏ | 10151/12313 [7:36:14<1:35:15, 2.64s/it] 82%|████████▏ | 10152/12313 [7:36:17<1:36:22, 2.68s/it] {'loss': 0.377, 'grad_norm': 7.100866725720028, 'learning_rate': 3.931571435965986e-07, 'epoch': 0.82} 82%|████████▏ | 10152/12313 [7:36:17<1:36:22, 2.68s/it] 82%|████████▏ | 10153/12313 [7:36:19<1:34:55, 2.64s/it] {'loss': 0.5232, 'grad_norm': 8.290123785692964, 'learning_rate': 3.9280320184957864e-07, 'epoch': 0.82} 82%|████████▏ | 10153/12313 [7:36:19<1:34:55, 2.64s/it] 82%|████████▏ | 10154/12313 [7:36:21<1:32:35, 2.57s/it] {'loss': 0.4422, 'grad_norm': 4.89062234331338, 'learning_rate': 3.9244940590942413e-07, 'epoch': 0.82} 82%|████████▏ | 10154/12313 [7:36:21<1:32:35, 2.57s/it] 82%|████████▏ | 10155/12313 [7:36:24<1:31:11, 2.54s/it] {'loss': 0.2946, 'grad_norm': 7.550251223417641, 'learning_rate': 3.9209575580061663e-07, 'epoch': 0.82} 82%|████████▏ | 10155/12313 [7:36:24<1:31:11, 2.54s/it] 82%|████████▏ | 10156/12313 [7:36:26<1:30:53, 2.53s/it] {'loss': 0.6482, 'grad_norm': 10.05450235709694, 'learning_rate': 3.9174225154762766e-07, 'epoch': 0.82} 82%|████████▏ | 10156/12313 [7:36:26<1:30:53, 2.53s/it] 82%|████████▏ | 10157/12313 [7:36:29<1:32:32, 2.58s/it] {'loss': 0.5071, 'grad_norm': 5.365029681588859, 'learning_rate': 3.9138889317491656e-07, 'epoch': 0.82} 82%|████████▏ | 10157/12313 [7:36:29<1:32:32, 2.58s/it] 82%|████████▏ | 10158/12313 [7:36:32<1:37:35, 2.72s/it] {'loss': 0.5451, 'grad_norm': 5.609914610484028, 'learning_rate': 3.9103568070693485e-07, 'epoch': 0.82} 82%|████████▏ | 10158/12313 [7:36:32<1:37:35, 2.72s/it] 83%|████████▎ | 10159/12313 [7:36:35<1:36:17, 2.68s/it] {'loss': 0.532, 'grad_norm': 4.514680074383672, 'learning_rate': 3.906826141681225e-07, 'epoch': 0.83} 83%|████████▎ | 10159/12313 [7:36:35<1:36:17, 2.68s/it] 83%|████████▎ | 10160/12313 [7:36:37<1:35:58, 2.67s/it] {'loss': 0.5132, 'grad_norm': 10.206347105991203, 'learning_rate': 3.903296935829093e-07, 'epoch': 0.83} 83%|████████▎ | 10160/12313 [7:36:37<1:35:58, 2.67s/it] 83%|████████▎ | 10161/12313 [7:36:41<1:41:14, 2.82s/it] {'loss': 0.4462, 'grad_norm': 6.858018717895062, 'learning_rate': 3.8997691897571577e-07, 'epoch': 0.83} 83%|████████▎ | 10161/12313 [7:36:41<1:41:14, 2.82s/it] 83%|████████▎ | 10162/12313 [7:36:43<1:38:24, 2.75s/it] {'loss': 0.5436, 'grad_norm': 5.173227278123777, 'learning_rate': 3.896242903709532e-07, 'epoch': 0.83} 83%|████████▎ | 10162/12313 [7:36:43<1:38:24, 2.75s/it] 83%|████████▎ | 10163/12313 [7:36:46<1:34:29, 2.64s/it] {'loss': 0.4657, 'grad_norm': 11.450687404556978, 'learning_rate': 3.8927180779302076e-07, 'epoch': 0.83} 83%|████████▎ | 10163/12313 [7:36:46<1:34:29, 2.64s/it] 83%|████████▎ | 10164/12313 [7:36:48<1:36:08, 2.68s/it] {'loss': 0.4993, 'grad_norm': 4.609261708122115, 'learning_rate': 3.889194712663075e-07, 'epoch': 0.83} 83%|████████▎ | 10164/12313 [7:36:48<1:36:08, 2.68s/it] 83%|████████▎ | 10165/12313 [7:36:51<1:35:03, 2.66s/it] {'loss': 0.4571, 'grad_norm': 3.5084080535675026, 'learning_rate': 3.885672808151947e-07, 'epoch': 0.83} 83%|████████▎ | 10165/12313 [7:36:51<1:35:03, 2.66s/it] 83%|████████▎ | 10166/12313 [7:36:54<1:35:44, 2.68s/it] {'loss': 0.3259, 'grad_norm': 21.30482869821724, 'learning_rate': 3.882152364640518e-07, 'epoch': 0.83} 83%|████████▎ | 10166/12313 [7:36:54<1:35:44, 2.68s/it] 83%|████████▎ | 10167/12313 [7:36:56<1:34:31, 2.64s/it] {'loss': 0.5129, 'grad_norm': 4.848731610369329, 'learning_rate': 3.878633382372371e-07, 'epoch': 0.83} 83%|████████▎ | 10167/12313 [7:36:56<1:34:31, 2.64s/it] 83%|████████▎ | 10168/12313 [7:36:59<1:35:32, 2.67s/it] {'loss': 0.4037, 'grad_norm': 4.713993311348796, 'learning_rate': 3.875115861591014e-07, 'epoch': 0.83} 83%|████████▎ | 10168/12313 [7:36:59<1:35:32, 2.67s/it] 83%|████████▎ | 10169/12313 [7:37:02<1:35:07, 2.66s/it] {'loss': 0.2991, 'grad_norm': 6.885554383643488, 'learning_rate': 3.871599802539841e-07, 'epoch': 0.83} 83%|████████▎ | 10169/12313 [7:37:02<1:35:07, 2.66s/it] 83%|████████▎ | 10170/12313 [7:37:04<1:33:44, 2.62s/it] {'loss': 0.4264, 'grad_norm': 11.560653980725935, 'learning_rate': 3.868085205462135e-07, 'epoch': 0.83} 83%|████████▎ | 10170/12313 [7:37:04<1:33:44, 2.62s/it] 83%|████████▎ | 10171/12313 [7:37:07<1:34:03, 2.63s/it] {'loss': 0.4572, 'grad_norm': 18.94305982548915, 'learning_rate': 3.8645720706010997e-07, 'epoch': 0.83} 83%|████████▎ | 10171/12313 [7:37:07<1:34:03, 2.63s/it] 83%|████████▎ | 10172/12313 [7:37:09<1:32:20, 2.59s/it] {'loss': 0.4054, 'grad_norm': 4.3868517626424754, 'learning_rate': 3.8610603981998204e-07, 'epoch': 0.83} 83%|████████▎ | 10172/12313 [7:37:09<1:32:20, 2.59s/it] 83%|████████▎ | 10173/12313 [7:37:12<1:36:11, 2.70s/it] {'loss': 0.4782, 'grad_norm': 10.379799445814022, 'learning_rate': 3.85755018850128e-07, 'epoch': 0.83} 83%|████████▎ | 10173/12313 [7:37:12<1:36:11, 2.70s/it] 83%|████████▎ | 10174/12313 [7:37:15<1:36:32, 2.71s/it] {'loss': 0.4246, 'grad_norm': 5.464482264230086, 'learning_rate': 3.854041441748371e-07, 'epoch': 0.83} 83%|████████▎ | 10174/12313 [7:37:15<1:36:32, 2.71s/it] 83%|████████▎ | 10175/12313 [7:37:18<1:35:14, 2.67s/it] {'loss': 0.4936, 'grad_norm': 5.259022216933499, 'learning_rate': 3.8505341581838854e-07, 'epoch': 0.83} 83%|████████▎ | 10175/12313 [7:37:18<1:35:14, 2.67s/it] 83%|████████▎ | 10176/12313 [7:37:20<1:35:33, 2.68s/it] {'loss': 0.47, 'grad_norm': 3.834344996242316, 'learning_rate': 3.8470283380504987e-07, 'epoch': 0.83} 83%|████████▎ | 10176/12313 [7:37:20<1:35:33, 2.68s/it] 83%|████████▎ | 10177/12313 [7:37:23<1:35:57, 2.70s/it] {'loss': 0.4509, 'grad_norm': 9.45059948818576, 'learning_rate': 3.8435239815908077e-07, 'epoch': 0.83} 83%|████████▎ | 10177/12313 [7:37:23<1:35:57, 2.70s/it] 83%|████████▎ | 10178/12313 [7:37:26<1:34:15, 2.65s/it] {'loss': 0.3479, 'grad_norm': 4.5939757403211585, 'learning_rate': 3.8400210890472883e-07, 'epoch': 0.83} 83%|████████▎ | 10178/12313 [7:37:26<1:34:15, 2.65s/it] 83%|████████▎ | 10179/12313 [7:37:28<1:32:26, 2.60s/it] {'loss': 0.3906, 'grad_norm': 5.823737388475345, 'learning_rate': 3.836519660662313e-07, 'epoch': 0.83} 83%|████████▎ | 10179/12313 [7:37:28<1:32:26, 2.60s/it] 83%|████████▎ | 10180/12313 [7:37:31<1:33:08, 2.62s/it] {'loss': 0.5363, 'grad_norm': 4.6299433684789095, 'learning_rate': 3.8330196966781723e-07, 'epoch': 0.83} 83%|████████▎ | 10180/12313 [7:37:31<1:33:08, 2.62s/it] 83%|████████▎ | 10181/12313 [7:37:33<1:35:06, 2.68s/it] {'loss': 0.433, 'grad_norm': 6.306388306208667, 'learning_rate': 3.829521197337052e-07, 'epoch': 0.83} 83%|████████▎ | 10181/12313 [7:37:33<1:35:06, 2.68s/it] 83%|████████▎ | 10182/12313 [7:37:36<1:33:59, 2.65s/it] {'loss': 0.5346, 'grad_norm': 5.650157408413531, 'learning_rate': 3.8260241628810203e-07, 'epoch': 0.83} 83%|████████▎ | 10182/12313 [7:37:36<1:33:59, 2.65s/it] 83%|████████▎ | 10183/12313 [7:37:39<1:35:13, 2.68s/it] {'loss': 0.5034, 'grad_norm': 4.139343726577053, 'learning_rate': 3.8225285935520493e-07, 'epoch': 0.83} 83%|████████▎ | 10183/12313 [7:37:39<1:35:13, 2.68s/it] 83%|████████▎ | 10184/12313 [7:37:41<1:35:01, 2.68s/it] {'loss': 0.4205, 'grad_norm': 4.27842292241429, 'learning_rate': 3.8190344895920246e-07, 'epoch': 0.83} 83%|████████▎ | 10184/12313 [7:37:42<1:35:01, 2.68s/it] 83%|████████▎ | 10185/12313 [7:37:44<1:34:27, 2.66s/it] {'loss': 0.5121, 'grad_norm': 4.60689072938466, 'learning_rate': 3.815541851242713e-07, 'epoch': 0.83} 83%|████████▎ | 10185/12313 [7:37:44<1:34:27, 2.66s/it] 83%|████████▎ | 10186/12313 [7:37:47<1:32:18, 2.60s/it] {'loss': 0.3954, 'grad_norm': 14.92261155989774, 'learning_rate': 3.812050678745785e-07, 'epoch': 0.83} 83%|████████▎ | 10186/12313 [7:37:47<1:32:18, 2.60s/it] 83%|████████▎ | 10187/12313 [7:37:49<1:34:26, 2.67s/it] {'loss': 0.5392, 'grad_norm': 4.53044644344942, 'learning_rate': 3.808560972342812e-07, 'epoch': 0.83} 83%|████████▎ | 10187/12313 [7:37:49<1:34:26, 2.67s/it] 83%|████████▎ | 10188/12313 [7:37:52<1:35:24, 2.69s/it] {'loss': 0.5574, 'grad_norm': 5.540716412331475, 'learning_rate': 3.8050727322752726e-07, 'epoch': 0.83} 83%|████████▎ | 10188/12313 [7:37:52<1:35:24, 2.69s/it] 83%|████████▎ | 10189/12313 [7:37:55<1:34:15, 2.66s/it] {'loss': 0.3885, 'grad_norm': 8.417967136559637, 'learning_rate': 3.8015859587845233e-07, 'epoch': 0.83} 83%|████████▎ | 10189/12313 [7:37:55<1:34:15, 2.66s/it] 83%|████████▎ | 10190/12313 [7:37:57<1:34:55, 2.68s/it] {'loss': 0.4015, 'grad_norm': 53.802360428112415, 'learning_rate': 3.798100652111839e-07, 'epoch': 0.83} 83%|████████▎ | 10190/12313 [7:37:57<1:34:55, 2.68s/it] 83%|████████▎ | 10191/12313 [7:38:00<1:34:19, 2.67s/it] {'loss': 0.3768, 'grad_norm': 4.877120364803925, 'learning_rate': 3.7946168124983776e-07, 'epoch': 0.83} 83%|████████▎ | 10191/12313 [7:38:00<1:34:19, 2.67s/it] 83%|████████▎ | 10192/12313 [7:38:03<1:35:03, 2.69s/it] {'loss': 0.5635, 'grad_norm': 9.495267916803392, 'learning_rate': 3.791134440185201e-07, 'epoch': 0.83} 83%|████████▎ | 10192/12313 [7:38:03<1:35:03, 2.69s/it] 83%|████████▎ | 10193/12313 [7:38:06<1:35:31, 2.70s/it] {'loss': 0.4829, 'grad_norm': 5.733855636295819, 'learning_rate': 3.787653535413277e-07, 'epoch': 0.83} 83%|████████▎ | 10193/12313 [7:38:06<1:35:31, 2.70s/it] 83%|████████▎ | 10194/12313 [7:38:08<1:36:26, 2.73s/it] {'loss': 0.5228, 'grad_norm': 5.873682615843861, 'learning_rate': 3.784174098423465e-07, 'epoch': 0.83} 83%|████████▎ | 10194/12313 [7:38:08<1:36:26, 2.73s/it] 83%|████████▎ | 10195/12313 [7:38:11<1:35:14, 2.70s/it] {'loss': 0.5421, 'grad_norm': 4.833822246575063, 'learning_rate': 3.780696129456521e-07, 'epoch': 0.83} 83%|████████▎ | 10195/12313 [7:38:11<1:35:14, 2.70s/it] 83%|████████▎ | 10196/12313 [7:38:14<1:38:00, 2.78s/it] {'loss': 0.5473, 'grad_norm': 5.208263826542225, 'learning_rate': 3.7772196287531066e-07, 'epoch': 0.83} 83%|████████▎ | 10196/12313 [7:38:14<1:38:00, 2.78s/it] 83%|████████▎ | 10197/12313 [7:38:16<1:34:51, 2.69s/it] {'loss': 0.4702, 'grad_norm': 8.378333393225612, 'learning_rate': 3.773744596553774e-07, 'epoch': 0.83} 83%|████████▎ | 10197/12313 [7:38:16<1:34:51, 2.69s/it] 83%|████████▎ | 10198/12313 [7:38:19<1:32:17, 2.62s/it] {'loss': 0.5652, 'grad_norm': 3.4663578157064583, 'learning_rate': 3.7702710330989765e-07, 'epoch': 0.83} 83%|████████▎ | 10198/12313 [7:38:19<1:32:17, 2.62s/it] 83%|████████▎ | 10199/12313 [7:38:22<1:32:34, 2.63s/it] {'loss': 0.4027, 'grad_norm': 6.827267196321197, 'learning_rate': 3.766798938629063e-07, 'epoch': 0.83} 83%|████████▎ | 10199/12313 [7:38:22<1:32:34, 2.63s/it] 83%|████████▎ | 10200/12313 [7:38:24<1:30:13, 2.56s/it] {'loss': 0.5462, 'grad_norm': 9.444190295092335, 'learning_rate': 3.7633283133842845e-07, 'epoch': 0.83} 83%|████████▎ | 10200/12313 [7:38:24<1:30:13, 2.56s/it] 83%|████████▎ | 10201/12313 [7:38:26<1:28:38, 2.52s/it] {'loss': 0.5488, 'grad_norm': 6.929524668677438, 'learning_rate': 3.7598591576048e-07, 'epoch': 0.83} 83%|████████▎ | 10201/12313 [7:38:26<1:28:38, 2.52s/it] 83%|████████▎ | 10202/12313 [7:38:29<1:30:34, 2.57s/it] {'loss': 0.556, 'grad_norm': 5.6038151431290935, 'learning_rate': 3.756391471530646e-07, 'epoch': 0.83} 83%|████████▎ | 10202/12313 [7:38:29<1:30:34, 2.57s/it] 83%|████████▎ | 10203/12313 [7:38:32<1:32:40, 2.64s/it] {'loss': 0.5079, 'grad_norm': 4.185345966650429, 'learning_rate': 3.7529252554017765e-07, 'epoch': 0.83} 83%|████████▎ | 10203/12313 [7:38:32<1:32:40, 2.64s/it] 83%|████████▎ | 10204/12313 [7:38:35<1:33:02, 2.65s/it] {'loss': 0.427, 'grad_norm': 6.844019960274571, 'learning_rate': 3.7494605094580305e-07, 'epoch': 0.83} 83%|████████▎ | 10204/12313 [7:38:35<1:33:02, 2.65s/it] 83%|████████▎ | 10205/12313 [7:38:37<1:34:55, 2.70s/it] {'loss': 0.797, 'grad_norm': 5.36417941344362, 'learning_rate': 3.7459972339391445e-07, 'epoch': 0.83} 83%|████████▎ | 10205/12313 [7:38:37<1:34:55, 2.70s/it] 83%|████████▎ | 10206/12313 [7:38:40<1:33:32, 2.66s/it] {'loss': 0.3947, 'grad_norm': 5.328404842131845, 'learning_rate': 3.742535429084765e-07, 'epoch': 0.83} 83%|████████▎ | 10206/12313 [7:38:40<1:33:32, 2.66s/it] 83%|████████▎ | 10207/12313 [7:38:43<1:34:41, 2.70s/it] {'loss': 0.4567, 'grad_norm': 5.223058995984122, 'learning_rate': 3.739075095134437e-07, 'epoch': 0.83} 83%|████████▎ | 10207/12313 [7:38:43<1:34:41, 2.70s/it] 83%|████████▎ | 10208/12313 [7:38:45<1:33:11, 2.66s/it] {'loss': 0.4547, 'grad_norm': 4.706033163133879, 'learning_rate': 3.735616232327582e-07, 'epoch': 0.83} 83%|████████▎ | 10208/12313 [7:38:45<1:33:11, 2.66s/it] 83%|████████▎ | 10209/12313 [7:38:48<1:34:15, 2.69s/it] {'loss': 0.5567, 'grad_norm': 5.7396745415353845, 'learning_rate': 3.732158840903552e-07, 'epoch': 0.83} 83%|████████▎ | 10209/12313 [7:38:48<1:34:15, 2.69s/it] 83%|████████▎ | 10210/12313 [7:38:51<1:36:00, 2.74s/it] {'loss': 0.5436, 'grad_norm': 4.958530850319359, 'learning_rate': 3.728702921101571e-07, 'epoch': 0.83} 83%|████████▎ | 10210/12313 [7:38:51<1:36:00, 2.74s/it] 83%|████████▎ | 10211/12313 [7:38:54<1:35:53, 2.74s/it] {'loss': 0.5996, 'grad_norm': 4.861206028038226, 'learning_rate': 3.725248473160764e-07, 'epoch': 0.83} 83%|████████▎ | 10211/12313 [7:38:54<1:35:53, 2.74s/it] 83%|████████▎ | 10212/12313 [7:38:56<1:32:40, 2.65s/it] {'loss': 0.5646, 'grad_norm': 4.682688015412765, 'learning_rate': 3.721795497320174e-07, 'epoch': 0.83} 83%|████████▎ | 10212/12313 [7:38:56<1:32:40, 2.65s/it] 83%|████████▎ | 10213/12313 [7:38:59<1:32:12, 2.63s/it] {'loss': 0.5358, 'grad_norm': 5.262659812480152, 'learning_rate': 3.718343993818718e-07, 'epoch': 0.83} 83%|████████▎ | 10213/12313 [7:38:59<1:32:12, 2.63s/it] 83%|████████▎ | 10214/12313 [7:39:01<1:33:02, 2.66s/it] {'loss': 0.4423, 'grad_norm': 9.851823169517866, 'learning_rate': 3.7148939628952246e-07, 'epoch': 0.83} 83%|████████▎ | 10214/12313 [7:39:01<1:33:02, 2.66s/it] 83%|████████▎ | 10215/12313 [7:39:04<1:32:24, 2.64s/it] {'loss': 0.3799, 'grad_norm': 5.776223337431074, 'learning_rate': 3.7114454047884247e-07, 'epoch': 0.83} 83%|████████▎ | 10215/12313 [7:39:04<1:32:24, 2.64s/it] 83%|████████▎ | 10216/12313 [7:39:06<1:30:01, 2.58s/it] {'loss': 0.4953, 'grad_norm': 8.26924554342392, 'learning_rate': 3.707998319736936e-07, 'epoch': 0.83} 83%|████████▎ | 10216/12313 [7:39:06<1:30:01, 2.58s/it] 83%|████████▎ | 10217/12313 [7:39:09<1:31:08, 2.61s/it] {'loss': 0.593, 'grad_norm': 5.982564098065872, 'learning_rate': 3.7045527079792753e-07, 'epoch': 0.83} 83%|████████▎ | 10217/12313 [7:39:09<1:31:08, 2.61s/it] 83%|████████▎ | 10218/12313 [7:39:12<1:32:27, 2.65s/it] {'loss': 0.4198, 'grad_norm': 5.7895130886280635, 'learning_rate': 3.7011085697538587e-07, 'epoch': 0.83} 83%|████████▎ | 10218/12313 [7:39:12<1:32:27, 2.65s/it] 83%|████████▎ | 10219/12313 [7:39:15<1:35:07, 2.73s/it] {'loss': 0.6089, 'grad_norm': 4.460178263466723, 'learning_rate': 3.6976659052990056e-07, 'epoch': 0.83} 83%|████████▎ | 10219/12313 [7:39:15<1:35:07, 2.73s/it] 83%|████████▎ | 10220/12313 [7:39:17<1:34:35, 2.71s/it] {'loss': 0.405, 'grad_norm': 5.700291972867315, 'learning_rate': 3.694224714852937e-07, 'epoch': 0.83} 83%|████████▎ | 10220/12313 [7:39:17<1:34:35, 2.71s/it] 83%|████████▎ | 10221/12313 [7:39:20<1:32:44, 2.66s/it] {'loss': 0.4625, 'grad_norm': 7.327954495107345, 'learning_rate': 3.6907849986537516e-07, 'epoch': 0.83} 83%|████████▎ | 10221/12313 [7:39:20<1:32:44, 2.66s/it] 83%|████████▎ | 10222/12313 [7:39:23<1:33:12, 2.67s/it] {'loss': 0.4958, 'grad_norm': 5.605599034700318, 'learning_rate': 3.687346756939475e-07, 'epoch': 0.83} 83%|████████▎ | 10222/12313 [7:39:23<1:33:12, 2.67s/it] 83%|████████▎ | 10223/12313 [7:39:25<1:33:15, 2.68s/it] {'loss': 0.3742, 'grad_norm': 4.754692475561762, 'learning_rate': 3.6839099899480033e-07, 'epoch': 0.83} 83%|████████▎ | 10223/12313 [7:39:25<1:33:15, 2.68s/it] 83%|████████▎ | 10224/12313 [7:39:28<1:35:02, 2.73s/it] {'loss': 0.6079, 'grad_norm': 3.9081424392387327, 'learning_rate': 3.680474697917144e-07, 'epoch': 0.83} 83%|████████▎ | 10224/12313 [7:39:28<1:35:02, 2.73s/it] 83%|████████▎ | 10225/12313 [7:39:31<1:35:55, 2.76s/it] {'loss': 0.4596, 'grad_norm': 4.983983156364833, 'learning_rate': 3.677040881084609e-07, 'epoch': 0.83} 83%|████████▎ | 10225/12313 [7:39:31<1:35:55, 2.76s/it] 83%|████████▎ | 10226/12313 [7:39:34<1:35:36, 2.75s/it] {'loss': 0.368, 'grad_norm': 6.779953311673761, 'learning_rate': 3.6736085396879896e-07, 'epoch': 0.83} 83%|████████▎ | 10226/12313 [7:39:34<1:35:36, 2.75s/it] 83%|████████▎ | 10227/12313 [7:39:37<1:36:31, 2.78s/it] {'loss': 0.5043, 'grad_norm': 3.867084328558502, 'learning_rate': 3.6701776739647893e-07, 'epoch': 0.83} 83%|████████▎ | 10227/12313 [7:39:37<1:36:31, 2.78s/it] 83%|████████▎ | 10228/12313 [7:39:39<1:36:00, 2.76s/it] {'loss': 0.6709, 'grad_norm': 5.763370281929799, 'learning_rate': 3.666748284152413e-07, 'epoch': 0.83} 83%|████████▎ | 10228/12313 [7:39:39<1:36:00, 2.76s/it] 83%|████████▎ | 10229/12313 [7:39:42<1:35:31, 2.75s/it] {'loss': 0.4427, 'grad_norm': 5.039857359159814, 'learning_rate': 3.663320370488152e-07, 'epoch': 0.83} 83%|████████▎ | 10229/12313 [7:39:42<1:35:31, 2.75s/it] 83%|████████▎ | 10230/12313 [7:39:45<1:35:03, 2.74s/it] {'loss': 0.5353, 'grad_norm': 3.9366691516556678, 'learning_rate': 3.659893933209191e-07, 'epoch': 0.83} 83%|████████▎ | 10230/12313 [7:39:45<1:35:03, 2.74s/it] 83%|████████▎ | 10231/12313 [7:39:48<1:35:18, 2.75s/it] {'loss': 0.455, 'grad_norm': 16.823744196747114, 'learning_rate': 3.6564689725526377e-07, 'epoch': 0.83} 83%|████████▎ | 10231/12313 [7:39:48<1:35:18, 2.75s/it] 83%|████████▎ | 10232/12313 [7:39:51<1:39:30, 2.87s/it] {'loss': 0.6269, 'grad_norm': 3.5430059865653996, 'learning_rate': 3.6530454887554636e-07, 'epoch': 0.83} 83%|████████▎ | 10232/12313 [7:39:51<1:39:30, 2.87s/it] 83%|████████▎ | 10233/12313 [7:39:53<1:37:20, 2.81s/it] {'loss': 0.4448, 'grad_norm': 4.922946722027889, 'learning_rate': 3.649623482054565e-07, 'epoch': 0.83} 83%|████████▎ | 10233/12313 [7:39:53<1:37:20, 2.81s/it] 83%|████████▎ | 10234/12313 [7:39:56<1:35:32, 2.76s/it] {'loss': 0.3769, 'grad_norm': 5.418331879921831, 'learning_rate': 3.6462029526867335e-07, 'epoch': 0.83} 83%|████████▎ | 10234/12313 [7:39:56<1:35:32, 2.76s/it] 83%|████████▎ | 10235/12313 [7:39:59<1:35:00, 2.74s/it] {'loss': 0.4543, 'grad_norm': 6.912181302002352, 'learning_rate': 3.642783900888644e-07, 'epoch': 0.83} 83%|████████▎ | 10235/12313 [7:39:59<1:35:00, 2.74s/it] 83%|████████▎ | 10236/12313 [7:40:01<1:34:54, 2.74s/it] {'loss': 0.4988, 'grad_norm': 7.405882641111188, 'learning_rate': 3.639366326896876e-07, 'epoch': 0.83} 83%|████████▎ | 10236/12313 [7:40:01<1:34:54, 2.74s/it] 83%|████████▎ | 10237/12313 [7:40:04<1:32:10, 2.66s/it] {'loss': 0.4483, 'grad_norm': 6.548258048190026, 'learning_rate': 3.635950230947902e-07, 'epoch': 0.83} 83%|████████▎ | 10237/12313 [7:40:04<1:32:10, 2.66s/it] 83%|████████▎ | 10238/12313 [7:40:07<1:32:40, 2.68s/it] {'loss': 0.4406, 'grad_norm': 7.4684554933443925, 'learning_rate': 3.632535613278107e-07, 'epoch': 0.83} 83%|████████▎ | 10238/12313 [7:40:07<1:32:40, 2.68s/it] 83%|████████▎ | 10239/12313 [7:40:09<1:34:08, 2.72s/it] {'loss': 0.4396, 'grad_norm': 3.8512394389587703, 'learning_rate': 3.629122474123767e-07, 'epoch': 0.83} 83%|████████▎ | 10239/12313 [7:40:09<1:34:08, 2.72s/it] 83%|████████▎ | 10240/12313 [7:40:12<1:35:48, 2.77s/it] {'loss': 0.37, 'grad_norm': 6.464656562967151, 'learning_rate': 3.6257108137210396e-07, 'epoch': 0.83} 83%|████████▎ | 10240/12313 [7:40:12<1:35:48, 2.77s/it] 83%|████████▎ | 10241/12313 [7:40:15<1:37:00, 2.81s/it] {'loss': 0.3829, 'grad_norm': 5.940372384619711, 'learning_rate': 3.622300632306011e-07, 'epoch': 0.83} 83%|████████▎ | 10241/12313 [7:40:15<1:37:00, 2.81s/it] 83%|████████▎ | 10242/12313 [7:40:18<1:35:21, 2.76s/it] {'loss': 0.4114, 'grad_norm': 7.830539757818605, 'learning_rate': 3.6188919301146375e-07, 'epoch': 0.83} 83%|████████▎ | 10242/12313 [7:40:18<1:35:21, 2.76s/it] 83%|████████▎ | 10243/12313 [7:40:21<1:34:41, 2.74s/it] {'loss': 0.5915, 'grad_norm': 4.839330335943199, 'learning_rate': 3.615484707382777e-07, 'epoch': 0.83} 83%|████████▎ | 10243/12313 [7:40:21<1:34:41, 2.74s/it] 83%|████████▎ | 10244/12313 [7:40:23<1:34:23, 2.74s/it] {'loss': 0.5409, 'grad_norm': 3.688765962182601, 'learning_rate': 3.6120789643462053e-07, 'epoch': 0.83} 83%|████████▎ | 10244/12313 [7:40:23<1:34:23, 2.74s/it] 83%|████████▎ | 10245/12313 [7:40:26<1:31:48, 2.66s/it] {'loss': 0.3567, 'grad_norm': 6.537020216802781, 'learning_rate': 3.608674701240572e-07, 'epoch': 0.83} 83%|████████▎ | 10245/12313 [7:40:26<1:31:48, 2.66s/it] 83%|████████▎ | 10246/12313 [7:40:28<1:30:51, 2.64s/it] {'loss': 0.5638, 'grad_norm': 4.589211449298252, 'learning_rate': 3.605271918301434e-07, 'epoch': 0.83} 83%|████████▎ | 10246/12313 [7:40:28<1:30:51, 2.64s/it] 83%|████████▎ | 10247/12313 [7:40:31<1:30:15, 2.62s/it] {'loss': 0.5068, 'grad_norm': 6.29504528521005, 'learning_rate': 3.601870615764258e-07, 'epoch': 0.83} 83%|████████▎ | 10247/12313 [7:40:31<1:30:15, 2.62s/it] 83%|████████▎ | 10248/12313 [7:40:34<1:31:09, 2.65s/it] {'loss': 0.4094, 'grad_norm': 4.262119801386439, 'learning_rate': 3.5984707938643864e-07, 'epoch': 0.83} 83%|████████▎ | 10248/12313 [7:40:34<1:31:09, 2.65s/it] 83%|████████▎ | 10249/12313 [7:40:36<1:30:36, 2.63s/it] {'loss': 0.4699, 'grad_norm': 4.841197784154483, 'learning_rate': 3.5950724528370615e-07, 'epoch': 0.83} 83%|████████▎ | 10249/12313 [7:40:36<1:30:36, 2.63s/it] 83%|████████▎ | 10250/12313 [7:40:39<1:32:08, 2.68s/it] {'loss': 0.4166, 'grad_norm': 5.42937376505632, 'learning_rate': 3.591675592917449e-07, 'epoch': 0.83} 83%|████████▎ | 10250/12313 [7:40:39<1:32:08, 2.68s/it] 83%|████████▎ | 10251/12313 [7:40:42<1:32:57, 2.70s/it] {'loss': 0.449, 'grad_norm': 6.296856237155061, 'learning_rate': 3.5882802143405755e-07, 'epoch': 0.83} 83%|████████▎ | 10251/12313 [7:40:42<1:32:57, 2.70s/it] 83%|████████▎ | 10252/12313 [7:40:45<1:32:59, 2.71s/it] {'loss': 0.4694, 'grad_norm': 7.636038816244131, 'learning_rate': 3.584886317341396e-07, 'epoch': 0.83} 83%|████████▎ | 10252/12313 [7:40:45<1:32:59, 2.71s/it] 83%|████████▎ | 10253/12313 [7:40:47<1:32:39, 2.70s/it] {'loss': 0.5688, 'grad_norm': 21.61523367073273, 'learning_rate': 3.58149390215474e-07, 'epoch': 0.83} 83%|████████▎ | 10253/12313 [7:40:47<1:32:39, 2.70s/it] 83%|████████▎ | 10254/12313 [7:40:50<1:30:00, 2.62s/it] {'loss': 0.5595, 'grad_norm': 5.877406518099175, 'learning_rate': 3.5781029690153567e-07, 'epoch': 0.83} 83%|████████▎ | 10254/12313 [7:40:50<1:30:00, 2.62s/it] 83%|████████▎ | 10255/12313 [7:40:52<1:31:03, 2.65s/it] {'loss': 0.5255, 'grad_norm': 4.889479170900449, 'learning_rate': 3.574713518157874e-07, 'epoch': 0.83} 83%|████████▎ | 10255/12313 [7:40:52<1:31:03, 2.65s/it] 83%|████████▎ | 10256/12313 [7:40:55<1:31:24, 2.67s/it] {'loss': 0.3982, 'grad_norm': 6.633095888789909, 'learning_rate': 3.571325549816818e-07, 'epoch': 0.83} 83%|████████▎ | 10256/12313 [7:40:55<1:31:24, 2.67s/it] 83%|████████▎ | 10257/12313 [7:40:58<1:29:57, 2.63s/it] {'loss': 0.5486, 'grad_norm': 4.06737934876213, 'learning_rate': 3.56793906422663e-07, 'epoch': 0.83} 83%|████████▎ | 10257/12313 [7:40:58<1:29:57, 2.63s/it] 83%|████████▎ | 10258/12313 [7:41:00<1:29:39, 2.62s/it] {'loss': 0.4597, 'grad_norm': 4.0549895036371, 'learning_rate': 3.564554061621625e-07, 'epoch': 0.83} 83%|████████▎ | 10258/12313 [7:41:00<1:29:39, 2.62s/it] 83%|████████▎ | 10259/12313 [7:41:03<1:27:55, 2.57s/it] {'loss': 0.491, 'grad_norm': 8.500590717560147, 'learning_rate': 3.5611705422360335e-07, 'epoch': 0.83} 83%|████████▎ | 10259/12313 [7:41:03<1:27:55, 2.57s/it] 83%|████████▎ | 10260/12313 [7:41:05<1:26:58, 2.54s/it] {'loss': 0.3571, 'grad_norm': 5.841348510557204, 'learning_rate': 3.557788506303986e-07, 'epoch': 0.83} 83%|████████▎ | 10260/12313 [7:41:05<1:26:58, 2.54s/it] 83%|████████▎ | 10261/12313 [7:41:08<1:31:41, 2.68s/it] {'loss': 0.5252, 'grad_norm': 8.428998076590059, 'learning_rate': 3.5544079540594884e-07, 'epoch': 0.83} 83%|████████▎ | 10261/12313 [7:41:08<1:31:41, 2.68s/it] 83%|████████▎ | 10262/12313 [7:41:11<1:30:16, 2.64s/it] {'loss': 0.5061, 'grad_norm': 6.0161944380779255, 'learning_rate': 3.551028885736457e-07, 'epoch': 0.83} 83%|████████▎ | 10262/12313 [7:41:11<1:30:16, 2.64s/it] 83%|████████▎ | 10263/12313 [7:41:13<1:27:57, 2.57s/it] {'loss': 0.4221, 'grad_norm': 26.06934398233002, 'learning_rate': 3.5476513015687136e-07, 'epoch': 0.83} 83%|████████▎ | 10263/12313 [7:41:13<1:27:57, 2.57s/it] 83%|████████▎ | 10264/12313 [7:41:16<1:28:22, 2.59s/it] {'loss': 0.5557, 'grad_norm': 6.725770937675891, 'learning_rate': 3.5442752017899625e-07, 'epoch': 0.83} 83%|████████▎ | 10264/12313 [7:41:16<1:28:22, 2.59s/it] 83%|████████▎ | 10265/12313 [7:41:19<1:32:07, 2.70s/it] {'loss': 0.4044, 'grad_norm': 5.63216646803754, 'learning_rate': 3.5409005866338134e-07, 'epoch': 0.83} 83%|████████▎ | 10265/12313 [7:41:19<1:32:07, 2.70s/it] 83%|████████▎ | 10266/12313 [7:41:21<1:30:31, 2.65s/it] {'loss': 0.4748, 'grad_norm': 5.540976364008194, 'learning_rate': 3.537527456333778e-07, 'epoch': 0.83} 83%|████████▎ | 10266/12313 [7:41:21<1:30:31, 2.65s/it] 83%|████████▎ | 10267/12313 [7:41:24<1:28:55, 2.61s/it] {'loss': 0.5088, 'grad_norm': 4.118260128090111, 'learning_rate': 3.5341558111232547e-07, 'epoch': 0.83} 83%|████████▎ | 10267/12313 [7:41:24<1:28:55, 2.61s/it] 83%|████████▎ | 10268/12313 [7:41:27<1:30:18, 2.65s/it] {'loss': 0.408, 'grad_norm': 4.454821659286079, 'learning_rate': 3.5307856512355354e-07, 'epoch': 0.83} 83%|████████▎ | 10268/12313 [7:41:27<1:30:18, 2.65s/it] 83%|████████▎ | 10269/12313 [7:41:29<1:29:50, 2.64s/it] {'loss': 0.4369, 'grad_norm': 7.906091232376149, 'learning_rate': 3.527416976903833e-07, 'epoch': 0.83} 83%|████████▎ | 10269/12313 [7:41:29<1:29:50, 2.64s/it] 83%|████████▎ | 10270/12313 [7:41:32<1:30:48, 2.67s/it] {'loss': 0.5383, 'grad_norm': 4.946330213556263, 'learning_rate': 3.5240497883612333e-07, 'epoch': 0.83} 83%|████████▎ | 10270/12313 [7:41:32<1:30:48, 2.67s/it] 83%|████████▎ | 10271/12313 [7:41:34<1:30:14, 2.65s/it] {'loss': 0.5159, 'grad_norm': 3.921389705656694, 'learning_rate': 3.5206840858407225e-07, 'epoch': 0.83} 83%|████████▎ | 10271/12313 [7:41:34<1:30:14, 2.65s/it] 83%|████████▎ | 10272/12313 [7:41:37<1:29:49, 2.64s/it] {'loss': 0.3524, 'grad_norm': 9.466775431263171, 'learning_rate': 3.517319869575195e-07, 'epoch': 0.83} 83%|████████▎ | 10272/12313 [7:41:37<1:29:49, 2.64s/it] 83%|████████▎ | 10273/12313 [7:41:40<1:30:28, 2.66s/it] {'loss': 0.5091, 'grad_norm': 7.781642113894127, 'learning_rate': 3.5139571397974416e-07, 'epoch': 0.83} 83%|████████▎ | 10273/12313 [7:41:40<1:30:28, 2.66s/it] 83%|████████▎ | 10274/12313 [7:41:42<1:30:32, 2.66s/it] {'loss': 0.3777, 'grad_norm': 5.031639260527779, 'learning_rate': 3.5105958967401404e-07, 'epoch': 0.83} 83%|████████▎ | 10274/12313 [7:41:42<1:30:32, 2.66s/it] 83%|████████▎ | 10275/12313 [7:41:45<1:31:21, 2.69s/it] {'loss': 0.5142, 'grad_norm': 4.936172284543063, 'learning_rate': 3.5072361406358696e-07, 'epoch': 0.83} 83%|████████▎ | 10275/12313 [7:41:45<1:31:21, 2.69s/it] 83%|████████▎ | 10276/12313 [7:41:48<1:32:50, 2.73s/it] {'loss': 0.5169, 'grad_norm': 5.62972700250256, 'learning_rate': 3.5038778717171123e-07, 'epoch': 0.83} 83%|████████▎ | 10276/12313 [7:41:48<1:32:50, 2.73s/it] 83%|████████▎ | 10277/12313 [7:41:51<1:34:13, 2.78s/it] {'loss': 0.5267, 'grad_norm': 4.876718013101768, 'learning_rate': 3.500521090216233e-07, 'epoch': 0.83} 83%|████████▎ | 10277/12313 [7:41:51<1:34:13, 2.78s/it] 83%|████████▎ | 10278/12313 [7:41:53<1:30:06, 2.66s/it] {'loss': 0.6079, 'grad_norm': 5.004111684696393, 'learning_rate': 3.497165796365512e-07, 'epoch': 0.83} 83%|████████▎ | 10278/12313 [7:41:53<1:30:06, 2.66s/it] 83%|████████▎ | 10279/12313 [7:41:56<1:30:17, 2.66s/it] {'loss': 0.3386, 'grad_norm': 5.227016363738208, 'learning_rate': 3.4938119903971195e-07, 'epoch': 0.83} 83%|████████▎ | 10279/12313 [7:41:56<1:30:17, 2.66s/it] 83%|████████▎ | 10280/12313 [7:41:59<1:29:16, 2.63s/it] {'loss': 0.4524, 'grad_norm': 4.70841750512664, 'learning_rate': 3.49045967254312e-07, 'epoch': 0.83} 83%|████████▎ | 10280/12313 [7:41:59<1:29:16, 2.63s/it] 83%|████████▎ | 10281/12313 [7:42:01<1:30:05, 2.66s/it] {'loss': 0.4655, 'grad_norm': 4.398665081558391, 'learning_rate': 3.487108843035467e-07, 'epoch': 0.83} 83%|████████▎ | 10281/12313 [7:42:01<1:30:05, 2.66s/it] 84%|████████▎ | 10282/12313 [7:42:04<1:30:34, 2.68s/it] {'loss': 0.4796, 'grad_norm': 4.351497987088346, 'learning_rate': 3.4837595021060296e-07, 'epoch': 0.84} 84%|████████▎ | 10282/12313 [7:42:04<1:30:34, 2.68s/it] 84%|████████▎ | 10283/12313 [7:42:06<1:27:46, 2.59s/it] {'loss': 0.3933, 'grad_norm': 4.803405743264782, 'learning_rate': 3.480411649986565e-07, 'epoch': 0.84} 84%|████████▎ | 10283/12313 [7:42:06<1:27:46, 2.59s/it] 84%|████████▎ | 10284/12313 [7:42:09<1:28:44, 2.62s/it] {'loss': 0.407, 'grad_norm': 4.927789971086148, 'learning_rate': 3.477065286908715e-07, 'epoch': 0.84} 84%|████████▎ | 10284/12313 [7:42:09<1:28:44, 2.62s/it] 84%|████████▎ | 10285/12313 [7:42:12<1:30:06, 2.67s/it] {'loss': 0.5534, 'grad_norm': 9.580044334289088, 'learning_rate': 3.4737204131040397e-07, 'epoch': 0.84} 84%|████████▎ | 10285/12313 [7:42:12<1:30:06, 2.67s/it] 84%|████████▎ | 10286/12313 [7:42:14<1:29:52, 2.66s/it] {'loss': 0.4651, 'grad_norm': 7.407853324575902, 'learning_rate': 3.470377028803992e-07, 'epoch': 0.84} 84%|████████▎ | 10286/12313 [7:42:14<1:29:52, 2.66s/it] 84%|████████▎ | 10287/12313 [7:42:17<1:30:52, 2.69s/it] {'loss': 0.5232, 'grad_norm': 4.484117272427251, 'learning_rate': 3.46703513423991e-07, 'epoch': 0.84} 84%|████████▎ | 10287/12313 [7:42:17<1:30:52, 2.69s/it] 84%|████████▎ | 10288/12313 [7:42:20<1:30:47, 2.69s/it] {'loss': 0.4451, 'grad_norm': 4.330327720269298, 'learning_rate': 3.4636947296430274e-07, 'epoch': 0.84} 84%|████████▎ | 10288/12313 [7:42:20<1:30:47, 2.69s/it] 84%|████████▎ | 10289/12313 [7:42:22<1:28:59, 2.64s/it] {'loss': 0.5906, 'grad_norm': 5.965597862522399, 'learning_rate': 3.460355815244498e-07, 'epoch': 0.84} 84%|████████▎ | 10289/12313 [7:42:22<1:28:59, 2.64s/it] 84%|████████▎ | 10290/12313 [7:42:25<1:28:57, 2.64s/it] {'loss': 0.3985, 'grad_norm': 8.22500497243567, 'learning_rate': 3.457018391275341e-07, 'epoch': 0.84} 84%|████████▎ | 10290/12313 [7:42:25<1:28:57, 2.64s/it] 84%|████████▎ | 10291/12313 [7:42:28<1:27:29, 2.60s/it] {'loss': 0.4722, 'grad_norm': 5.544151542569365, 'learning_rate': 3.4536824579665007e-07, 'epoch': 0.84} 84%|████████▎ | 10291/12313 [7:42:28<1:27:29, 2.60s/it] 84%|████████▎ | 10292/12313 [7:42:30<1:26:23, 2.56s/it] {'loss': 0.5343, 'grad_norm': 5.097926053416765, 'learning_rate': 3.4503480155488044e-07, 'epoch': 0.84} 84%|████████▎ | 10292/12313 [7:42:30<1:26:23, 2.56s/it] 84%|████████▎ | 10293/12313 [7:42:33<1:26:23, 2.57s/it] {'loss': 0.3995, 'grad_norm': 11.549757808229108, 'learning_rate': 3.447015064252976e-07, 'epoch': 0.84} 84%|████████▎ | 10293/12313 [7:42:33<1:26:23, 2.57s/it] 84%|████████▎ | 10294/12313 [7:42:35<1:28:44, 2.64s/it] {'loss': 0.4749, 'grad_norm': 4.072804987437322, 'learning_rate': 3.443683604309633e-07, 'epoch': 0.84} 84%|████████▎ | 10294/12313 [7:42:35<1:28:44, 2.64s/it] 84%|████████▎ | 10295/12313 [7:42:38<1:31:25, 2.72s/it] {'loss': 0.4529, 'grad_norm': 5.09382553179545, 'learning_rate': 3.4403536359493034e-07, 'epoch': 0.84} 84%|████████▎ | 10295/12313 [7:42:38<1:31:25, 2.72s/it] 84%|████████▎ | 10296/12313 [7:42:41<1:31:21, 2.72s/it] {'loss': 0.58, 'grad_norm': 4.720065781209744, 'learning_rate': 3.437025159402399e-07, 'epoch': 0.84} 84%|████████▎ | 10296/12313 [7:42:41<1:31:21, 2.72s/it] 84%|████████▎ | 10297/12313 [7:42:44<1:31:11, 2.71s/it] {'loss': 0.4097, 'grad_norm': 4.694987719383441, 'learning_rate': 3.43369817489923e-07, 'epoch': 0.84} 84%|████████▎ | 10297/12313 [7:42:44<1:31:11, 2.71s/it] 84%|████████▎ | 10298/12313 [7:42:46<1:30:17, 2.69s/it] {'loss': 0.4231, 'grad_norm': 5.4566621726960145, 'learning_rate': 3.430372682670008e-07, 'epoch': 0.84} 84%|████████▎ | 10298/12313 [7:42:46<1:30:17, 2.69s/it] 84%|████████▎ | 10299/12313 [7:42:49<1:32:52, 2.77s/it] {'loss': 0.4286, 'grad_norm': 8.027749216651072, 'learning_rate': 3.4270486829448476e-07, 'epoch': 0.84} 84%|████████▎ | 10299/12313 [7:42:49<1:32:52, 2.77s/it] 84%|████████▎ | 10300/12313 [7:42:52<1:31:06, 2.72s/it] {'loss': 0.4205, 'grad_norm': 5.684585620616569, 'learning_rate': 3.423726175953737e-07, 'epoch': 0.84} 84%|████████▎ | 10300/12313 [7:42:52<1:31:06, 2.72s/it] 84%|████████▎ | 10301/12313 [7:42:54<1:28:39, 2.64s/it] {'loss': 0.4315, 'grad_norm': 3.9316108403026178, 'learning_rate': 3.4204051619265905e-07, 'epoch': 0.84} 84%|████████▎ | 10301/12313 [7:42:54<1:28:39, 2.64s/it] 84%|████████▎ | 10302/12313 [7:42:57<1:28:25, 2.64s/it] {'loss': 0.3852, 'grad_norm': 5.5134529395223915, 'learning_rate': 3.4170856410931986e-07, 'epoch': 0.84} 84%|████████▎ | 10302/12313 [7:42:57<1:28:25, 2.64s/it] 84%|████████▎ | 10303/12313 [7:43:00<1:29:26, 2.67s/it] {'loss': 0.544, 'grad_norm': 6.0566636394615285, 'learning_rate': 3.41376761368325e-07, 'epoch': 0.84} 84%|████████▎ | 10303/12313 [7:43:00<1:29:26, 2.67s/it] 84%|████████▎ | 10304/12313 [7:43:02<1:29:12, 2.66s/it] {'loss': 0.4424, 'grad_norm': 5.68024482168099, 'learning_rate': 3.4104510799263356e-07, 'epoch': 0.84} 84%|████████▎ | 10304/12313 [7:43:02<1:29:12, 2.66s/it] 84%|████████▎ | 10305/12313 [7:43:05<1:32:21, 2.76s/it] {'loss': 0.4358, 'grad_norm': 8.931124333462172, 'learning_rate': 3.407136040051953e-07, 'epoch': 0.84} 84%|████████▎ | 10305/12313 [7:43:05<1:32:21, 2.76s/it] 84%|████████▎ | 10306/12313 [7:43:08<1:32:49, 2.78s/it] {'loss': 0.6703, 'grad_norm': 6.202340802104501, 'learning_rate': 3.40382249428948e-07, 'epoch': 0.84} 84%|████████▎ | 10306/12313 [7:43:08<1:32:49, 2.78s/it] 84%|████████▎ | 10307/12313 [7:43:11<1:33:12, 2.79s/it] {'loss': 0.5796, 'grad_norm': 5.36293568403821, 'learning_rate': 3.400510442868185e-07, 'epoch': 0.84} 84%|████████▎ | 10307/12313 [7:43:11<1:33:12, 2.79s/it] 84%|████████▎ | 10308/12313 [7:43:14<1:32:00, 2.75s/it] {'loss': 0.5902, 'grad_norm': 5.206542300226407, 'learning_rate': 3.3971998860172605e-07, 'epoch': 0.84} 84%|████████▎ | 10308/12313 [7:43:14<1:32:00, 2.75s/it] 84%|████████▎ | 10309/12313 [7:43:17<1:34:05, 2.82s/it] {'loss': 0.4523, 'grad_norm': 3.560373786193406, 'learning_rate': 3.393890823965768e-07, 'epoch': 0.84} 84%|████████▎ | 10309/12313 [7:43:17<1:34:05, 2.82s/it] 84%|████████▎ | 10310/12313 [7:43:19<1:32:40, 2.78s/it] {'loss': 0.5946, 'grad_norm': 5.239048746596322, 'learning_rate': 3.390583256942681e-07, 'epoch': 0.84} 84%|████████▎ | 10310/12313 [7:43:19<1:32:40, 2.78s/it] 84%|████████▎ | 10311/12313 [7:43:22<1:29:27, 2.68s/it] {'loss': 0.4644, 'grad_norm': 4.838192804018576, 'learning_rate': 3.3872771851768737e-07, 'epoch': 0.84} 84%|████████▎ | 10311/12313 [7:43:22<1:29:27, 2.68s/it] 84%|████████▎ | 10312/12313 [7:43:25<1:30:25, 2.71s/it] {'loss': 0.4878, 'grad_norm': 7.90289534921277, 'learning_rate': 3.383972608897099e-07, 'epoch': 0.84} 84%|████████▎ | 10312/12313 [7:43:25<1:30:25, 2.71s/it] 84%|████████▍ | 10313/12313 [7:43:27<1:26:53, 2.61s/it] {'loss': 0.3547, 'grad_norm': 3.8504253330468092, 'learning_rate': 3.3806695283320145e-07, 'epoch': 0.84} 84%|████████▍ | 10313/12313 [7:43:27<1:26:53, 2.61s/it] 84%|████████▍ | 10314/12313 [7:43:30<1:32:28, 2.78s/it] {'loss': 0.4281, 'grad_norm': 11.90866871558082, 'learning_rate': 3.377367943710183e-07, 'epoch': 0.84} 84%|████████▍ | 10314/12313 [7:43:30<1:32:28, 2.78s/it] 84%|████████▍ | 10315/12313 [7:43:33<1:34:11, 2.83s/it] {'loss': 0.4421, 'grad_norm': 4.797876474078643, 'learning_rate': 3.374067855260055e-07, 'epoch': 0.84} 84%|████████▍ | 10315/12313 [7:43:33<1:34:11, 2.83s/it] 84%|████████▍ | 10316/12313 [7:43:36<1:32:48, 2.79s/it] {'loss': 0.3877, 'grad_norm': 4.661745082017848, 'learning_rate': 3.370769263209975e-07, 'epoch': 0.84} 84%|████████▍ | 10316/12313 [7:43:36<1:32:48, 2.79s/it] 84%|████████▍ | 10317/12313 [7:43:38<1:29:51, 2.70s/it] {'loss': 0.5798, 'grad_norm': 4.545258248925443, 'learning_rate': 3.3674721677881853e-07, 'epoch': 0.84} 84%|████████▍ | 10317/12313 [7:43:38<1:29:51, 2.70s/it] 84%|████████▍ | 10318/12313 [7:43:41<1:30:05, 2.71s/it] {'loss': 0.5197, 'grad_norm': 6.305655433495236, 'learning_rate': 3.364176569222843e-07, 'epoch': 0.84} 84%|████████▍ | 10318/12313 [7:43:41<1:30:05, 2.71s/it] 84%|████████▍ | 10319/12313 [7:43:44<1:28:26, 2.66s/it] {'loss': 0.4127, 'grad_norm': 5.2525205813121945, 'learning_rate': 3.360882467741969e-07, 'epoch': 0.84} 84%|████████▍ | 10319/12313 [7:43:44<1:28:26, 2.66s/it] 84%|████████▍ | 10320/12313 [7:43:46<1:29:41, 2.70s/it] {'loss': 0.6077, 'grad_norm': 4.370136047848057, 'learning_rate': 3.35758986357351e-07, 'epoch': 0.84} 84%|████████▍ | 10320/12313 [7:43:46<1:29:41, 2.70s/it] 84%|████████▍ | 10321/12313 [7:43:49<1:28:39, 2.67s/it] {'loss': 0.3249, 'grad_norm': 10.028481598001118, 'learning_rate': 3.354298756945293e-07, 'epoch': 0.84} 84%|████████▍ | 10321/12313 [7:43:49<1:28:39, 2.67s/it] 84%|████████▍ | 10322/12313 [7:43:52<1:30:37, 2.73s/it] {'loss': 0.5867, 'grad_norm': 4.972555184812446, 'learning_rate': 3.351009148085038e-07, 'epoch': 0.84} 84%|████████▍ | 10322/12313 [7:43:52<1:30:37, 2.73s/it] 84%|████████▍ | 10323/12313 [7:43:54<1:29:17, 2.69s/it] {'loss': 0.4575, 'grad_norm': 3.9492347620594224, 'learning_rate': 3.347721037220372e-07, 'epoch': 0.84} 84%|████████▍ | 10323/12313 [7:43:54<1:29:17, 2.69s/it] 84%|████████▍ | 10324/12313 [7:43:57<1:28:25, 2.67s/it] {'loss': 0.5544, 'grad_norm': 24.137740084826742, 'learning_rate': 3.344434424578824e-07, 'epoch': 0.84} 84%|████████▍ | 10324/12313 [7:43:57<1:28:25, 2.67s/it] 84%|████████▍ | 10325/12313 [7:44:00<1:28:22, 2.67s/it] {'loss': 0.4401, 'grad_norm': 7.000852410859473, 'learning_rate': 3.3411493103878036e-07, 'epoch': 0.84} 84%|████████▍ | 10325/12313 [7:44:00<1:28:22, 2.67s/it] 84%|████████▍ | 10326/12313 [7:44:03<1:30:52, 2.74s/it] {'loss': 0.4975, 'grad_norm': 6.577988762799303, 'learning_rate': 3.3378656948746176e-07, 'epoch': 0.84} 84%|████████▍ | 10326/12313 [7:44:03<1:30:52, 2.74s/it] 84%|████████▍ | 10327/12313 [7:44:06<1:32:28, 2.79s/it] {'loss': 0.489, 'grad_norm': 5.455565928416991, 'learning_rate': 3.334583578266487e-07, 'epoch': 0.84} 84%|████████▍ | 10327/12313 [7:44:06<1:32:28, 2.79s/it] 84%|████████▍ | 10328/12313 [7:44:08<1:30:19, 2.73s/it] {'loss': 0.394, 'grad_norm': 4.806945898095566, 'learning_rate': 3.3313029607905087e-07, 'epoch': 0.84} 84%|████████▍ | 10328/12313 [7:44:08<1:30:19, 2.73s/it] 84%|████████▍ | 10329/12313 [7:44:11<1:29:28, 2.71s/it] {'loss': 0.5171, 'grad_norm': 5.601127834993915, 'learning_rate': 3.328023842673678e-07, 'epoch': 0.84} 84%|████████▍ | 10329/12313 [7:44:11<1:29:28, 2.71s/it] 84%|████████▍ | 10330/12313 [7:44:13<1:28:04, 2.66s/it] {'loss': 0.3461, 'grad_norm': 8.23233143842366, 'learning_rate': 3.324746224142902e-07, 'epoch': 0.84} 84%|████████▍ | 10330/12313 [7:44:13<1:28:04, 2.66s/it] 84%|████████▍ | 10331/12313 [7:44:16<1:27:26, 2.65s/it] {'loss': 0.4056, 'grad_norm': 5.534926193999312, 'learning_rate': 3.321470105424979e-07, 'epoch': 0.84} 84%|████████▍ | 10331/12313 [7:44:16<1:27:26, 2.65s/it] 84%|████████▍ | 10332/12313 [7:44:18<1:25:55, 2.60s/it] {'loss': 0.5831, 'grad_norm': 4.70029893588722, 'learning_rate': 3.3181954867465864e-07, 'epoch': 0.84} 84%|████████▍ | 10332/12313 [7:44:18<1:25:55, 2.60s/it] 84%|████████▍ | 10333/12313 [7:44:21<1:27:26, 2.65s/it] {'loss': 0.4913, 'grad_norm': 4.310642438062851, 'learning_rate': 3.314922368334322e-07, 'epoch': 0.84} 84%|████████▍ | 10333/12313 [7:44:21<1:27:26, 2.65s/it] 84%|████████▍ | 10334/12313 [7:44:24<1:27:12, 2.64s/it] {'loss': 0.5738, 'grad_norm': 3.3683032390188012, 'learning_rate': 3.3116507504146633e-07, 'epoch': 0.84} 84%|████████▍ | 10334/12313 [7:44:24<1:27:12, 2.64s/it] 84%|████████▍ | 10335/12313 [7:44:26<1:27:06, 2.64s/it] {'loss': 0.3906, 'grad_norm': 4.726658134420679, 'learning_rate': 3.3083806332139837e-07, 'epoch': 0.84} 84%|████████▍ | 10335/12313 [7:44:26<1:27:06, 2.64s/it] 84%|████████▍ | 10336/12313 [7:44:29<1:25:49, 2.60s/it] {'loss': 0.3441, 'grad_norm': 4.793027997624646, 'learning_rate': 3.305112016958562e-07, 'epoch': 0.84} 84%|████████▍ | 10336/12313 [7:44:29<1:25:49, 2.60s/it] 84%|████████▍ | 10337/12313 [7:44:32<1:24:56, 2.58s/it] {'loss': 0.4705, 'grad_norm': 15.53691157648671, 'learning_rate': 3.3018449018745765e-07, 'epoch': 0.84} 84%|████████▍ | 10337/12313 [7:44:32<1:24:56, 2.58s/it] 84%|████████▍ | 10338/12313 [7:44:34<1:26:15, 2.62s/it] {'loss': 0.5973, 'grad_norm': 5.292837088334354, 'learning_rate': 3.298579288188081e-07, 'epoch': 0.84} 84%|████████▍ | 10338/12313 [7:44:34<1:26:15, 2.62s/it] 84%|████████▍ | 10339/12313 [7:44:37<1:25:07, 2.59s/it] {'loss': 0.4081, 'grad_norm': 7.13356115130351, 'learning_rate': 3.2953151761250526e-07, 'epoch': 0.84} 84%|████████▍ | 10339/12313 [7:44:37<1:25:07, 2.59s/it] 84%|████████▍ | 10340/12313 [7:44:39<1:23:44, 2.55s/it] {'loss': 0.3476, 'grad_norm': 4.657968605497615, 'learning_rate': 3.292052565911344e-07, 'epoch': 0.84} 84%|████████▍ | 10340/12313 [7:44:39<1:23:44, 2.55s/it] 84%|████████▍ | 10341/12313 [7:44:42<1:25:03, 2.59s/it] {'loss': 0.4242, 'grad_norm': 5.343433901408026, 'learning_rate': 3.288791457772708e-07, 'epoch': 0.84} 84%|████████▍ | 10341/12313 [7:44:42<1:25:03, 2.59s/it] 84%|████████▍ | 10342/12313 [7:44:44<1:25:05, 2.59s/it] {'loss': 0.3954, 'grad_norm': 7.943551058264865, 'learning_rate': 3.2855318519347924e-07, 'epoch': 0.84} 84%|████████▍ | 10342/12313 [7:44:44<1:25:05, 2.59s/it] 84%|████████▍ | 10343/12313 [7:44:47<1:23:38, 2.55s/it] {'loss': 0.6146, 'grad_norm': 4.253480461006163, 'learning_rate': 3.282273748623152e-07, 'epoch': 0.84} 84%|████████▍ | 10343/12313 [7:44:47<1:23:38, 2.55s/it] 84%|████████▍ | 10344/12313 [7:44:49<1:21:36, 2.49s/it] {'loss': 0.4316, 'grad_norm': 3.649658017480646, 'learning_rate': 3.279017148063235e-07, 'epoch': 0.84} 84%|████████▍ | 10344/12313 [7:44:49<1:21:36, 2.49s/it] 84%|████████▍ | 10345/12313 [7:44:52<1:23:08, 2.53s/it] {'loss': 0.6303, 'grad_norm': 6.247382983100074, 'learning_rate': 3.275762050480369e-07, 'epoch': 0.84} 84%|████████▍ | 10345/12313 [7:44:52<1:23:08, 2.53s/it] 84%|████████▍ | 10346/12313 [7:44:54<1:23:24, 2.54s/it] {'loss': 0.3951, 'grad_norm': 5.081757914347488, 'learning_rate': 3.272508456099799e-07, 'epoch': 0.84} 84%|████████▍ | 10346/12313 [7:44:54<1:23:24, 2.54s/it] 84%|████████▍ | 10347/12313 [7:44:57<1:22:10, 2.51s/it] {'loss': 0.4885, 'grad_norm': 4.2849275181266515, 'learning_rate': 3.269256365146653e-07, 'epoch': 0.84} 84%|████████▍ | 10347/12313 [7:44:57<1:22:10, 2.51s/it] 84%|████████▍ | 10348/12313 [7:44:59<1:21:28, 2.49s/it] {'loss': 0.5593, 'grad_norm': 5.408392743532504, 'learning_rate': 3.2660057778459513e-07, 'epoch': 0.84} 84%|████████▍ | 10348/12313 [7:44:59<1:21:28, 2.49s/it] 84%|████████▍ | 10349/12313 [7:45:02<1:25:50, 2.62s/it] {'loss': 0.4464, 'grad_norm': 16.566283426785315, 'learning_rate': 3.262756694422628e-07, 'epoch': 0.84} 84%|████████▍ | 10349/12313 [7:45:02<1:25:50, 2.62s/it] 84%|████████▍ | 10350/12313 [7:45:05<1:28:26, 2.70s/it] {'loss': 0.4379, 'grad_norm': 9.235201670008626, 'learning_rate': 3.2595091151015e-07, 'epoch': 0.84} 84%|████████▍ | 10350/12313 [7:45:05<1:28:26, 2.70s/it] 84%|████████▍ | 10351/12313 [7:45:08<1:25:54, 2.63s/it] {'loss': 0.3997, 'grad_norm': 3.706535071305654, 'learning_rate': 3.2562630401072796e-07, 'epoch': 0.84} 84%|████████▍ | 10351/12313 [7:45:08<1:25:54, 2.63s/it] 84%|████████▍ | 10352/12313 [7:45:10<1:27:30, 2.68s/it] {'loss': 0.3929, 'grad_norm': 6.291131527025278, 'learning_rate': 3.2530184696645846e-07, 'epoch': 0.84} 84%|████████▍ | 10352/12313 [7:45:10<1:27:30, 2.68s/it] 84%|████████▍ | 10353/12313 [7:45:13<1:25:53, 2.63s/it] {'loss': 0.5271, 'grad_norm': 6.059066502473035, 'learning_rate': 3.249775403997915e-07, 'epoch': 0.84} 84%|████████▍ | 10353/12313 [7:45:13<1:25:53, 2.63s/it] 84%|████████▍ | 10354/12313 [7:45:16<1:25:14, 2.61s/it] {'loss': 0.4112, 'grad_norm': 5.828503261223924, 'learning_rate': 3.24653384333167e-07, 'epoch': 0.84} 84%|████████▍ | 10354/12313 [7:45:16<1:25:14, 2.61s/it] 84%|████████▍ | 10355/12313 [7:45:18<1:25:25, 2.62s/it] {'loss': 0.5535, 'grad_norm': 5.607819748297354, 'learning_rate': 3.243293787890162e-07, 'epoch': 0.84} 84%|████████▍ | 10355/12313 [7:45:18<1:25:25, 2.62s/it] 84%|████████▍ | 10356/12313 [7:45:21<1:24:51, 2.60s/it] {'loss': 0.5297, 'grad_norm': 4.947445738957033, 'learning_rate': 3.2400552378975744e-07, 'epoch': 0.84} 84%|████████▍ | 10356/12313 [7:45:21<1:24:51, 2.60s/it] 84%|████████▍ | 10357/12313 [7:45:23<1:25:48, 2.63s/it] {'loss': 0.5609, 'grad_norm': 12.539916176915135, 'learning_rate': 3.236818193577998e-07, 'epoch': 0.84} 84%|████████▍ | 10357/12313 [7:45:23<1:25:48, 2.63s/it] 84%|████████▍ | 10358/12313 [7:45:26<1:27:29, 2.69s/it] {'loss': 0.4367, 'grad_norm': 7.269085157428185, 'learning_rate': 3.233582655155429e-07, 'epoch': 0.84} 84%|████████▍ | 10358/12313 [7:45:26<1:27:29, 2.69s/it] 84%|████████▍ | 10359/12313 [7:45:29<1:27:44, 2.69s/it] {'loss': 0.5678, 'grad_norm': 7.3313861075084725, 'learning_rate': 3.2303486228537436e-07, 'epoch': 0.84} 84%|████████▍ | 10359/12313 [7:45:29<1:27:44, 2.69s/it] 84%|████████▍ | 10360/12313 [7:45:31<1:26:01, 2.64s/it] {'loss': 0.4094, 'grad_norm': 6.798734073899363, 'learning_rate': 3.227116096896718e-07, 'epoch': 0.84} 84%|████████▍ | 10360/12313 [7:45:31<1:26:01, 2.64s/it] 84%|████████▍ | 10361/12313 [7:45:34<1:24:14, 2.59s/it] {'loss': 0.5598, 'grad_norm': 5.446235513662214, 'learning_rate': 3.223885077508024e-07, 'epoch': 0.84} 84%|████████▍ | 10361/12313 [7:45:34<1:24:14, 2.59s/it] 84%|████████▍ | 10362/12313 [7:45:37<1:24:42, 2.61s/it] {'loss': 0.5195, 'grad_norm': 5.464432211318365, 'learning_rate': 3.220655564911232e-07, 'epoch': 0.84} 84%|████████▍ | 10362/12313 [7:45:37<1:24:42, 2.61s/it] 84%|████████▍ | 10363/12313 [7:45:39<1:25:04, 2.62s/it] {'loss': 0.4573, 'grad_norm': 3.4734065266272767, 'learning_rate': 3.217427559329814e-07, 'epoch': 0.84} 84%|████████▍ | 10363/12313 [7:45:39<1:25:04, 2.62s/it] 84%|████████▍ | 10364/12313 [7:45:42<1:25:47, 2.64s/it] {'loss': 0.4274, 'grad_norm': 4.612719279186468, 'learning_rate': 3.2142010609871236e-07, 'epoch': 0.84} 84%|████████▍ | 10364/12313 [7:45:42<1:25:47, 2.64s/it] 84%|████████▍ | 10365/12313 [7:45:44<1:25:05, 2.62s/it] {'loss': 0.4346, 'grad_norm': 5.548612820561558, 'learning_rate': 3.2109760701064227e-07, 'epoch': 0.84} 84%|████████▍ | 10365/12313 [7:45:44<1:25:05, 2.62s/it] 84%|████████▍ | 10366/12313 [7:45:47<1:24:56, 2.62s/it] {'loss': 0.4898, 'grad_norm': 7.436228501727278, 'learning_rate': 3.207752586910862e-07, 'epoch': 0.84} 84%|████████▍ | 10366/12313 [7:45:47<1:24:56, 2.62s/it] 84%|████████▍ | 10367/12313 [7:45:50<1:25:51, 2.65s/it] {'loss': 0.5607, 'grad_norm': 7.589666228758146, 'learning_rate': 3.2045306116234824e-07, 'epoch': 0.84} 84%|████████▍ | 10367/12313 [7:45:50<1:25:51, 2.65s/it] 84%|████████▍ | 10368/12313 [7:45:53<1:26:31, 2.67s/it] {'loss': 0.4397, 'grad_norm': 5.181080777503049, 'learning_rate': 3.2013101444672345e-07, 'epoch': 0.84} 84%|████████▍ | 10368/12313 [7:45:53<1:26:31, 2.67s/it] 84%|████████▍ | 10369/12313 [7:45:55<1:24:52, 2.62s/it] {'loss': 0.5543, 'grad_norm': 7.40645836332802, 'learning_rate': 3.198091185664964e-07, 'epoch': 0.84} 84%|████████▍ | 10369/12313 [7:45:55<1:24:52, 2.62s/it] 84%|████████▍ | 10370/12313 [7:45:58<1:24:40, 2.61s/it] {'loss': 0.394, 'grad_norm': 4.3886565655110035, 'learning_rate': 3.194873735439391e-07, 'epoch': 0.84} 84%|████████▍ | 10370/12313 [7:45:58<1:24:40, 2.61s/it] 84%|████████▍ | 10371/12313 [7:46:00<1:23:47, 2.59s/it] {'loss': 0.6436, 'grad_norm': 5.3852633366108655, 'learning_rate': 3.1916577940131585e-07, 'epoch': 0.84} 84%|████████▍ | 10371/12313 [7:46:00<1:23:47, 2.59s/it] 84%|████████▍ | 10372/12313 [7:46:03<1:26:28, 2.67s/it] {'loss': 0.5485, 'grad_norm': 4.336066931433411, 'learning_rate': 3.188443361608787e-07, 'epoch': 0.84} 84%|████████▍ | 10372/12313 [7:46:03<1:26:28, 2.67s/it] 84%|████████▍ | 10373/12313 [7:46:06<1:25:40, 2.65s/it] {'loss': 0.5496, 'grad_norm': 7.090559367854015, 'learning_rate': 3.185230438448694e-07, 'epoch': 0.84} 84%|████████▍ | 10373/12313 [7:46:06<1:25:40, 2.65s/it] 84%|████████▍ | 10374/12313 [7:46:08<1:23:29, 2.58s/it] {'loss': 0.5178, 'grad_norm': 5.001322891880096, 'learning_rate': 3.182019024755209e-07, 'epoch': 0.84} 84%|████████▍ | 10374/12313 [7:46:08<1:23:29, 2.58s/it] 84%|████████▍ | 10375/12313 [7:46:11<1:22:10, 2.54s/it] {'loss': 0.5462, 'grad_norm': 8.355308016625518, 'learning_rate': 3.1788091207505285e-07, 'epoch': 0.84} 84%|████████▍ | 10375/12313 [7:46:11<1:22:10, 2.54s/it] 84%|████████▍ | 10376/12313 [7:46:13<1:22:16, 2.55s/it] {'loss': 0.3143, 'grad_norm': 5.941932747298044, 'learning_rate': 3.175600726656772e-07, 'epoch': 0.84} 84%|████████▍ | 10376/12313 [7:46:13<1:22:16, 2.55s/it] 84%|████████▍ | 10377/12313 [7:46:16<1:26:02, 2.67s/it] {'loss': 0.5841, 'grad_norm': 13.808992644334168, 'learning_rate': 3.172393842695948e-07, 'epoch': 0.84} 84%|████████▍ | 10377/12313 [7:46:16<1:26:02, 2.67s/it] 84%|████████▍ | 10378/12313 [7:46:18<1:24:02, 2.61s/it] {'loss': 0.6052, 'grad_norm': 4.737960132276757, 'learning_rate': 3.169188469089945e-07, 'epoch': 0.84} 84%|████████▍ | 10378/12313 [7:46:18<1:24:02, 2.61s/it] 84%|████████▍ | 10379/12313 [7:46:21<1:25:09, 2.64s/it] {'loss': 0.4312, 'grad_norm': 6.490398809975363, 'learning_rate': 3.165984606060565e-07, 'epoch': 0.84} 84%|████████▍ | 10379/12313 [7:46:21<1:25:09, 2.64s/it] 84%|████████▍ | 10380/12313 [7:46:24<1:25:24, 2.65s/it] {'loss': 0.381, 'grad_norm': 6.469336726754756, 'learning_rate': 3.1627822538294883e-07, 'epoch': 0.84} 84%|████████▍ | 10380/12313 [7:46:24<1:25:24, 2.65s/it] 84%|████████▍ | 10381/12313 [7:46:26<1:24:49, 2.63s/it] {'loss': 0.488, 'grad_norm': 4.644733440027643, 'learning_rate': 3.159581412618309e-07, 'epoch': 0.84} 84%|████████▍ | 10381/12313 [7:46:26<1:24:49, 2.63s/it] 84%|████████▍ | 10382/12313 [7:46:29<1:24:13, 2.62s/it] {'loss': 0.4021, 'grad_norm': 5.02222613398405, 'learning_rate': 3.1563820826485127e-07, 'epoch': 0.84} 84%|████████▍ | 10382/12313 [7:46:29<1:24:13, 2.62s/it] 84%|████████▍ | 10383/12313 [7:46:32<1:24:43, 2.63s/it] {'loss': 0.4034, 'grad_norm': 5.6081490476021285, 'learning_rate': 3.153184264141465e-07, 'epoch': 0.84} 84%|████████▍ | 10383/12313 [7:46:32<1:24:43, 2.63s/it] 84%|████████▍ | 10384/12313 [7:46:35<1:30:52, 2.83s/it] {'loss': 0.4546, 'grad_norm': 12.62989950175683, 'learning_rate': 3.1499879573184486e-07, 'epoch': 0.84} 84%|████████▍ | 10384/12313 [7:46:35<1:30:52, 2.83s/it] 84%|████████▍ | 10385/12313 [7:46:38<1:29:58, 2.80s/it] {'loss': 0.3961, 'grad_norm': 4.275321566511599, 'learning_rate': 3.146793162400627e-07, 'epoch': 0.84} 84%|████████▍ | 10385/12313 [7:46:38<1:29:58, 2.80s/it] 84%|████████▍ | 10386/12313 [7:46:41<1:36:48, 3.01s/it] {'loss': 0.4436, 'grad_norm': 5.910594667196357, 'learning_rate': 3.143599879609055e-07, 'epoch': 0.84} 84%|████████▍ | 10386/12313 [7:46:41<1:36:48, 3.01s/it] 84%|████████▍ | 10387/12313 [7:46:44<1:36:31, 3.01s/it] {'loss': 0.5959, 'grad_norm': 4.218477387154232, 'learning_rate': 3.1404081091647027e-07, 'epoch': 0.84} 84%|████████▍ | 10387/12313 [7:46:44<1:36:31, 3.01s/it] 84%|████████▍ | 10388/12313 [7:46:47<1:36:38, 3.01s/it] {'loss': 0.6363, 'grad_norm': 7.34082449004889, 'learning_rate': 3.1372178512884154e-07, 'epoch': 0.84} 84%|████████▍ | 10388/12313 [7:46:47<1:36:38, 3.01s/it] 84%|████████▍ | 10389/12313 [7:46:50<1:33:37, 2.92s/it] {'loss': 0.5443, 'grad_norm': 5.370383548087424, 'learning_rate': 3.1340291062009446e-07, 'epoch': 0.84} 84%|████████▍ | 10389/12313 [7:46:50<1:33:37, 2.92s/it] 84%|████████▍ | 10390/12313 [7:46:53<1:31:23, 2.85s/it] {'loss': 0.4581, 'grad_norm': 3.265960200811652, 'learning_rate': 3.130841874122942e-07, 'epoch': 0.84} 84%|████████▍ | 10390/12313 [7:46:53<1:31:23, 2.85s/it] 84%|████████▍ | 10391/12313 [7:46:55<1:30:53, 2.84s/it] {'loss': 0.5283, 'grad_norm': 8.155289410608034, 'learning_rate': 3.1276561552749415e-07, 'epoch': 0.84} 84%|████████▍ | 10391/12313 [7:46:55<1:30:53, 2.84s/it] 84%|████████▍ | 10392/12313 [7:46:58<1:29:35, 2.80s/it] {'loss': 0.4737, 'grad_norm': 4.910342387320026, 'learning_rate': 3.1244719498773693e-07, 'epoch': 0.84} 84%|████████▍ | 10392/12313 [7:46:58<1:29:35, 2.80s/it] 84%|████████▍ | 10393/12313 [7:47:01<1:28:46, 2.77s/it] {'loss': 0.5164, 'grad_norm': 7.838611981148996, 'learning_rate': 3.1212892581505697e-07, 'epoch': 0.84} 84%|████████▍ | 10393/12313 [7:47:01<1:28:46, 2.77s/it] 84%|████████▍ | 10394/12313 [7:47:04<1:29:21, 2.79s/it] {'loss': 0.6133, 'grad_norm': 3.137926778602015, 'learning_rate': 3.118108080314758e-07, 'epoch': 0.84} 84%|████████▍ | 10394/12313 [7:47:04<1:29:21, 2.79s/it] 84%|████████▍ | 10395/12313 [7:47:06<1:28:48, 2.78s/it] {'loss': 0.3985, 'grad_norm': 6.8077151459524, 'learning_rate': 3.1149284165900627e-07, 'epoch': 0.84} 84%|████████▍ | 10395/12313 [7:47:06<1:28:48, 2.78s/it] 84%|████████▍ | 10396/12313 [7:47:10<1:34:29, 2.96s/it] {'loss': 0.6132, 'grad_norm': 4.575748337907764, 'learning_rate': 3.111750267196492e-07, 'epoch': 0.84} 84%|████████▍ | 10396/12313 [7:47:10<1:34:29, 2.96s/it] 84%|████████▍ | 10397/12313 [7:47:13<1:31:42, 2.87s/it] {'loss': 0.584, 'grad_norm': 3.4885659741848833, 'learning_rate': 3.1085736323539647e-07, 'epoch': 0.84} 84%|████████▍ | 10397/12313 [7:47:13<1:31:42, 2.87s/it] 84%|████████▍ | 10398/12313 [7:47:15<1:31:16, 2.86s/it] {'loss': 0.6819, 'grad_norm': 5.967423076508702, 'learning_rate': 3.1053985122822844e-07, 'epoch': 0.84} 84%|████████▍ | 10398/12313 [7:47:15<1:31:16, 2.86s/it] 84%|████████▍ | 10399/12313 [7:47:18<1:32:17, 2.89s/it] {'loss': 0.5207, 'grad_norm': 4.9263444027598124, 'learning_rate': 3.1022249072011455e-07, 'epoch': 0.84} 84%|████████▍ | 10399/12313 [7:47:18<1:32:17, 2.89s/it] 84%|████████▍ | 10400/12313 [7:47:21<1:30:43, 2.85s/it] {'loss': 0.4711, 'grad_norm': 7.840385643435142, 'learning_rate': 3.0990528173301557e-07, 'epoch': 0.84} 84%|████████▍ | 10400/12313 [7:47:21<1:30:43, 2.85s/it] 84%|████████▍ | 10401/12313 [7:47:24<1:29:08, 2.80s/it] {'loss': 0.4817, 'grad_norm': 6.013199743602192, 'learning_rate': 3.095882242888795e-07, 'epoch': 0.84} 84%|████████▍ | 10401/12313 [7:47:24<1:29:08, 2.80s/it] 84%|████████▍ | 10402/12313 [7:47:26<1:26:44, 2.72s/it] {'loss': 0.4581, 'grad_norm': 6.574555556543331, 'learning_rate': 3.09271318409646e-07, 'epoch': 0.84} 84%|████████▍ | 10402/12313 [7:47:26<1:26:44, 2.72s/it] 84%|████████▍ | 10403/12313 [7:47:29<1:26:13, 2.71s/it] {'loss': 0.549, 'grad_norm': 3.106518820205016, 'learning_rate': 3.089545641172434e-07, 'epoch': 0.84} 84%|████████▍ | 10403/12313 [7:47:29<1:26:13, 2.71s/it] 84%|████████▍ | 10404/12313 [7:47:32<1:25:20, 2.68s/it] {'loss': 0.3649, 'grad_norm': 4.4853372930685715, 'learning_rate': 3.086379614335891e-07, 'epoch': 0.84} 84%|████████▍ | 10404/12313 [7:47:32<1:25:20, 2.68s/it] 85%|████████▍ | 10405/12313 [7:47:34<1:23:37, 2.63s/it] {'loss': 0.5595, 'grad_norm': 6.692195853008346, 'learning_rate': 3.083215103805895e-07, 'epoch': 0.85} 85%|████████▍ | 10405/12313 [7:47:34<1:23:37, 2.63s/it] 85%|████████▍ | 10406/12313 [7:47:37<1:23:22, 2.62s/it] {'loss': 0.4905, 'grad_norm': 3.961593687618352, 'learning_rate': 3.080052109801429e-07, 'epoch': 0.85} 85%|████████▍ | 10406/12313 [7:47:37<1:23:22, 2.62s/it] 85%|████████▍ | 10407/12313 [7:47:39<1:23:17, 2.62s/it] {'loss': 0.304, 'grad_norm': 5.657024124857219, 'learning_rate': 3.0768906325413404e-07, 'epoch': 0.85} 85%|████████▍ | 10407/12313 [7:47:39<1:23:17, 2.62s/it] 85%|████████▍ | 10408/12313 [7:47:42<1:23:46, 2.64s/it] {'loss': 0.5874, 'grad_norm': 4.947735022977894, 'learning_rate': 3.073730672244393e-07, 'epoch': 0.85} 85%|████████▍ | 10408/12313 [7:47:42<1:23:46, 2.64s/it] 85%|████████▍ | 10409/12313 [7:47:45<1:24:58, 2.68s/it] {'loss': 0.5255, 'grad_norm': 4.804603939591969, 'learning_rate': 3.0705722291292457e-07, 'epoch': 0.85} 85%|████████▍ | 10409/12313 [7:47:45<1:24:58, 2.68s/it] 85%|████████▍ | 10410/12313 [7:47:47<1:23:06, 2.62s/it] {'loss': 0.5203, 'grad_norm': 8.27849842792289, 'learning_rate': 3.067415303414442e-07, 'epoch': 0.85} 85%|████████▍ | 10410/12313 [7:47:47<1:23:06, 2.62s/it] 85%|████████▍ | 10411/12313 [7:47:50<1:23:12, 2.63s/it] {'loss': 0.4933, 'grad_norm': 10.32641648689632, 'learning_rate': 3.0642598953184164e-07, 'epoch': 0.85} 85%|████████▍ | 10411/12313 [7:47:50<1:23:12, 2.63s/it] 85%|████████▍ | 10412/12313 [7:47:53<1:25:32, 2.70s/it] {'loss': 0.5453, 'grad_norm': 3.764527737605284, 'learning_rate': 3.0611060050595166e-07, 'epoch': 0.85} 85%|████████▍ | 10412/12313 [7:47:53<1:25:32, 2.70s/it] 85%|████████▍ | 10413/12313 [7:47:55<1:23:12, 2.63s/it] {'loss': 0.5636, 'grad_norm': 5.190043324766475, 'learning_rate': 3.057953632855973e-07, 'epoch': 0.85} 85%|████████▍ | 10413/12313 [7:47:55<1:23:12, 2.63s/it] 85%|████████▍ | 10414/12313 [7:47:58<1:25:43, 2.71s/it] {'loss': 0.5088, 'grad_norm': 4.262101482407405, 'learning_rate': 3.0548027789259057e-07, 'epoch': 0.85} 85%|████████▍ | 10414/12313 [7:47:58<1:25:43, 2.71s/it] 85%|████████▍ | 10415/12313 [7:48:01<1:25:59, 2.72s/it] {'loss': 0.4839, 'grad_norm': 4.496146141780251, 'learning_rate': 3.05165344348734e-07, 'epoch': 0.85} 85%|████████▍ | 10415/12313 [7:48:01<1:25:59, 2.72s/it] 85%|████████▍ | 10416/12313 [7:48:03<1:23:26, 2.64s/it] {'loss': 0.6495, 'grad_norm': 5.9108085665989645, 'learning_rate': 3.0485056267582054e-07, 'epoch': 0.85} 85%|████████▍ | 10416/12313 [7:48:03<1:23:26, 2.64s/it] 85%|████████▍ | 10417/12313 [7:48:06<1:25:28, 2.70s/it] {'loss': 0.4643, 'grad_norm': 5.4963646098218355, 'learning_rate': 3.0453593289563015e-07, 'epoch': 0.85} 85%|████████▍ | 10417/12313 [7:48:06<1:25:28, 2.70s/it] 85%|████████▍ | 10418/12313 [7:48:09<1:25:53, 2.72s/it] {'loss': 0.4108, 'grad_norm': 11.437063436049456, 'learning_rate': 3.0422145502993355e-07, 'epoch': 0.85} 85%|████████▍ | 10418/12313 [7:48:09<1:25:53, 2.72s/it] 85%|████████▍ | 10419/12313 [7:48:11<1:22:35, 2.62s/it] {'loss': 0.3901, 'grad_norm': 6.788003957398547, 'learning_rate': 3.0390712910049166e-07, 'epoch': 0.85} 85%|████████▍ | 10419/12313 [7:48:11<1:22:35, 2.62s/it] 85%|████████▍ | 10420/12313 [7:48:14<1:23:01, 2.63s/it] {'loss': 0.3811, 'grad_norm': 4.716129840076831, 'learning_rate': 3.035929551290534e-07, 'epoch': 0.85} 85%|████████▍ | 10420/12313 [7:48:14<1:23:01, 2.63s/it] 85%|████████▍ | 10421/12313 [7:48:17<1:23:35, 2.65s/it] {'loss': 0.3569, 'grad_norm': 4.429133558600282, 'learning_rate': 3.0327893313735814e-07, 'epoch': 0.85} 85%|████████▍ | 10421/12313 [7:48:17<1:23:35, 2.65s/it] 85%|████████▍ | 10422/12313 [7:48:19<1:21:32, 2.59s/it] {'loss': 0.4368, 'grad_norm': 5.2938219799131865, 'learning_rate': 3.0296506314713534e-07, 'epoch': 0.85} 85%|████████▍ | 10422/12313 [7:48:19<1:21:32, 2.59s/it] 85%|████████▍ | 10423/12313 [7:48:22<1:21:57, 2.60s/it] {'loss': 0.5749, 'grad_norm': 4.19248350579702, 'learning_rate': 3.0265134518010274e-07, 'epoch': 0.85} 85%|████████▍ | 10423/12313 [7:48:22<1:21:57, 2.60s/it] 85%|████████▍ | 10424/12313 [7:48:24<1:23:02, 2.64s/it] {'loss': 0.4146, 'grad_norm': 8.957100181001996, 'learning_rate': 3.0233777925796683e-07, 'epoch': 0.85} 85%|████████▍ | 10424/12313 [7:48:24<1:23:02, 2.64s/it] 85%|████████▍ | 10425/12313 [7:48:27<1:23:24, 2.65s/it] {'loss': 0.3864, 'grad_norm': 4.999603068660096, 'learning_rate': 3.020243654024266e-07, 'epoch': 0.85} 85%|████████▍ | 10425/12313 [7:48:27<1:23:24, 2.65s/it] 85%|████████▍ | 10426/12313 [7:48:30<1:23:34, 2.66s/it] {'loss': 0.4653, 'grad_norm': 18.067549039426147, 'learning_rate': 3.017111036351672e-07, 'epoch': 0.85} 85%|████████▍ | 10426/12313 [7:48:30<1:23:34, 2.66s/it] 85%|████████▍ | 10427/12313 [7:48:33<1:27:44, 2.79s/it] {'loss': 0.4479, 'grad_norm': 6.599994188483718, 'learning_rate': 3.01397993977865e-07, 'epoch': 0.85} 85%|████████▍ | 10427/12313 [7:48:33<1:27:44, 2.79s/it] 85%|████████▍ | 10428/12313 [7:48:36<1:26:43, 2.76s/it] {'loss': 0.5203, 'grad_norm': 5.582107888533948, 'learning_rate': 3.010850364521853e-07, 'epoch': 0.85} 85%|████████▍ | 10428/12313 [7:48:36<1:26:43, 2.76s/it] 85%|████████▍ | 10429/12313 [7:48:39<1:29:19, 2.84s/it] {'loss': 0.463, 'grad_norm': 4.2550759235692555, 'learning_rate': 3.007722310797842e-07, 'epoch': 0.85} 85%|████████▍ | 10429/12313 [7:48:39<1:29:19, 2.84s/it] 85%|████████▍ | 10430/12313 [7:48:42<1:30:56, 2.90s/it] {'loss': 0.6158, 'grad_norm': 4.496695788396444, 'learning_rate': 3.004595778823055e-07, 'epoch': 0.85} 85%|████████▍ | 10430/12313 [7:48:42<1:30:56, 2.90s/it] 85%|████████▍ | 10431/12313 [7:48:44<1:29:41, 2.86s/it] {'loss': 0.504, 'grad_norm': 4.611503155703553, 'learning_rate': 3.0014707688138244e-07, 'epoch': 0.85} 85%|████████▍ | 10431/12313 [7:48:44<1:29:41, 2.86s/it] 85%|████████▍ | 10432/12313 [7:48:47<1:26:03, 2.75s/it] {'loss': 0.5022, 'grad_norm': 5.19185166511966, 'learning_rate': 2.9983472809863996e-07, 'epoch': 0.85} 85%|████████▍ | 10432/12313 [7:48:47<1:26:03, 2.75s/it] 85%|████████▍ | 10433/12313 [7:48:50<1:25:08, 2.72s/it] {'loss': 0.5097, 'grad_norm': 4.311886074274771, 'learning_rate': 2.995225315556891e-07, 'epoch': 0.85} 85%|████████▍ | 10433/12313 [7:48:50<1:25:08, 2.72s/it] 85%|████████▍ | 10434/12313 [7:48:53<1:30:00, 2.87s/it] {'loss': 0.4966, 'grad_norm': 5.401400848063594, 'learning_rate': 2.992104872741336e-07, 'epoch': 0.85} 85%|████████▍ | 10434/12313 [7:48:53<1:30:00, 2.87s/it] 85%|████████▍ | 10435/12313 [7:48:55<1:26:44, 2.77s/it] {'loss': 0.5447, 'grad_norm': 4.369156116773576, 'learning_rate': 2.9889859527556517e-07, 'epoch': 0.85} 85%|████████▍ | 10435/12313 [7:48:55<1:26:44, 2.77s/it] 85%|████████▍ | 10436/12313 [7:48:58<1:26:39, 2.77s/it] {'loss': 0.5355, 'grad_norm': 4.166723822178224, 'learning_rate': 2.985868555815646e-07, 'epoch': 0.85} 85%|████████▍ | 10436/12313 [7:48:58<1:26:39, 2.77s/it] 85%|████████▍ | 10437/12313 [7:49:01<1:26:57, 2.78s/it] {'loss': 0.6095, 'grad_norm': 6.75124539957616, 'learning_rate': 2.9827526821370274e-07, 'epoch': 0.85} 85%|████████▍ | 10437/12313 [7:49:01<1:26:57, 2.78s/it] 85%|████████▍ | 10438/12313 [7:49:04<1:31:50, 2.94s/it] {'loss': 0.4393, 'grad_norm': 31.475529417596338, 'learning_rate': 2.9796383319353997e-07, 'epoch': 0.85} 85%|████████▍ | 10438/12313 [7:49:04<1:31:50, 2.94s/it] 85%|████████▍ | 10439/12313 [7:49:07<1:29:37, 2.87s/it] {'loss': 0.546, 'grad_norm': 5.167153691651564, 'learning_rate': 2.976525505426253e-07, 'epoch': 0.85} 85%|████████▍ | 10439/12313 [7:49:07<1:29:37, 2.87s/it] 85%|████████▍ | 10440/12313 [7:49:11<1:36:08, 3.08s/it] {'loss': 0.5173, 'grad_norm': 5.678318484729858, 'learning_rate': 2.9734142028249867e-07, 'epoch': 0.85} 85%|████████▍ | 10440/12313 [7:49:11<1:36:08, 3.08s/it] 85%|████████▍ | 10441/12313 [7:49:13<1:33:06, 2.98s/it] {'loss': 0.4497, 'grad_norm': 3.293870862181777, 'learning_rate': 2.970304424346887e-07, 'epoch': 0.85} 85%|████████▍ | 10441/12313 [7:49:13<1:33:06, 2.98s/it] 85%|████████▍ | 10442/12313 [7:49:16<1:29:16, 2.86s/it] {'loss': 0.4407, 'grad_norm': 7.221626978308798, 'learning_rate': 2.9671961702071314e-07, 'epoch': 0.85} 85%|████████▍ | 10442/12313 [7:49:16<1:29:16, 2.86s/it] 85%|████████▍ | 10443/12313 [7:49:19<1:27:57, 2.82s/it] {'loss': 0.5855, 'grad_norm': 17.363810182049495, 'learning_rate': 2.9640894406207875e-07, 'epoch': 0.85} 85%|████████▍ | 10443/12313 [7:49:19<1:27:57, 2.82s/it] 85%|████████▍ | 10444/12313 [7:49:21<1:27:08, 2.80s/it] {'loss': 0.5255, 'grad_norm': 74.21346545661238, 'learning_rate': 2.960984235802836e-07, 'epoch': 0.85} 85%|████████▍ | 10444/12313 [7:49:21<1:27:08, 2.80s/it] 85%|████████▍ | 10445/12313 [7:49:24<1:29:37, 2.88s/it] {'loss': 0.6829, 'grad_norm': 5.175532691743031, 'learning_rate': 2.957880555968137e-07, 'epoch': 0.85} 85%|████████▍ | 10445/12313 [7:49:24<1:29:37, 2.88s/it] 85%|████████▍ | 10446/12313 [7:49:27<1:27:32, 2.81s/it] {'loss': 0.4693, 'grad_norm': 6.761383142028467, 'learning_rate': 2.95477840133144e-07, 'epoch': 0.85} 85%|████████▍ | 10446/12313 [7:49:27<1:27:32, 2.81s/it] 85%|████████▍ | 10447/12313 [7:49:30<1:27:24, 2.81s/it] {'loss': 0.4897, 'grad_norm': 3.230874015786307, 'learning_rate': 2.951677772107406e-07, 'epoch': 0.85} 85%|████████▍ | 10447/12313 [7:49:30<1:27:24, 2.81s/it] 85%|████████▍ | 10448/12313 [7:49:33<1:26:22, 2.78s/it] {'loss': 0.3499, 'grad_norm': 4.35547207648812, 'learning_rate': 2.9485786685105876e-07, 'epoch': 0.85} 85%|████████▍ | 10448/12313 [7:49:33<1:26:22, 2.78s/it] 85%|████████▍ | 10449/12313 [7:49:35<1:24:39, 2.73s/it] {'loss': 0.5062, 'grad_norm': 4.106258252858656, 'learning_rate': 2.945481090755417e-07, 'epoch': 0.85} 85%|████████▍ | 10449/12313 [7:49:35<1:24:39, 2.73s/it] 85%|████████▍ | 10450/12313 [7:49:38<1:22:57, 2.67s/it] {'loss': 0.5064, 'grad_norm': 7.391399136928735, 'learning_rate': 2.942385039056231e-07, 'epoch': 0.85} 85%|████████▍ | 10450/12313 [7:49:38<1:22:57, 2.67s/it] 85%|████████▍ | 10451/12313 [7:49:40<1:22:21, 2.65s/it] {'loss': 0.5534, 'grad_norm': 4.053993280844301, 'learning_rate': 2.939290513627266e-07, 'epoch': 0.85} 85%|████████▍ | 10451/12313 [7:49:40<1:22:21, 2.65s/it] 85%|████████▍ | 10452/12313 [7:49:43<1:23:27, 2.69s/it] {'loss': 0.5145, 'grad_norm': 7.209839450243032, 'learning_rate': 2.936197514682637e-07, 'epoch': 0.85} 85%|████████▍ | 10452/12313 [7:49:43<1:23:27, 2.69s/it] 85%|████████▍ | 10453/12313 [7:49:46<1:21:48, 2.64s/it] {'loss': 0.5653, 'grad_norm': 4.068133193825577, 'learning_rate': 2.933106042436368e-07, 'epoch': 0.85} 85%|████████▍ | 10453/12313 [7:49:46<1:21:48, 2.64s/it] 85%|████████▍ | 10454/12313 [7:49:48<1:23:17, 2.69s/it] {'loss': 0.4394, 'grad_norm': 7.446752715397537, 'learning_rate': 2.930016097102378e-07, 'epoch': 0.85} 85%|████████▍ | 10454/12313 [7:49:48<1:23:17, 2.69s/it] 85%|████████▍ | 10455/12313 [7:49:51<1:21:10, 2.62s/it] {'loss': 0.4768, 'grad_norm': 8.151515842737028, 'learning_rate': 2.9269276788944726e-07, 'epoch': 0.85} 85%|████████▍ | 10455/12313 [7:49:51<1:21:10, 2.62s/it] 85%|████████▍ | 10456/12313 [7:49:53<1:20:46, 2.61s/it] {'loss': 0.5837, 'grad_norm': 4.210590324590446, 'learning_rate': 2.923840788026347e-07, 'epoch': 0.85} 85%|████████▍ | 10456/12313 [7:49:53<1:20:46, 2.61s/it] 85%|████████▍ | 10457/12313 [7:49:56<1:20:43, 2.61s/it] {'loss': 0.4728, 'grad_norm': 4.153598656728726, 'learning_rate': 2.9207554247116047e-07, 'epoch': 0.85} 85%|████████▍ | 10457/12313 [7:49:56<1:20:43, 2.61s/it] 85%|████████▍ | 10458/12313 [7:49:59<1:21:37, 2.64s/it] {'loss': 0.5434, 'grad_norm': 4.033479881839166, 'learning_rate': 2.917671589163737e-07, 'epoch': 0.85} 85%|████████▍ | 10458/12313 [7:49:59<1:21:37, 2.64s/it] 85%|████████▍ | 10459/12313 [7:50:02<1:22:31, 2.67s/it] {'loss': 0.3945, 'grad_norm': 4.9531557828779045, 'learning_rate': 2.9145892815961194e-07, 'epoch': 0.85} 85%|████████▍ | 10459/12313 [7:50:02<1:22:31, 2.67s/it] 85%|████████▍ | 10460/12313 [7:50:04<1:22:18, 2.67s/it] {'loss': 0.6322, 'grad_norm': 5.232476687979792, 'learning_rate': 2.911508502222041e-07, 'epoch': 0.85} 85%|████████▍ | 10460/12313 [7:50:04<1:22:18, 2.67s/it] 85%|████████▍ | 10461/12313 [7:50:07<1:22:28, 2.67s/it] {'loss': 0.4652, 'grad_norm': 2.92678651363929, 'learning_rate': 2.908429251254674e-07, 'epoch': 0.85} 85%|████████▍ | 10461/12313 [7:50:07<1:22:28, 2.67s/it] 85%|████████▍ | 10462/12313 [7:50:10<1:22:57, 2.69s/it] {'loss': 0.4732, 'grad_norm': 6.325678937933874, 'learning_rate': 2.90535152890708e-07, 'epoch': 0.85} 85%|████████▍ | 10462/12313 [7:50:10<1:22:57, 2.69s/it] 85%|████████▍ | 10463/12313 [7:50:12<1:23:05, 2.69s/it] {'loss': 0.6602, 'grad_norm': 5.685417644326699, 'learning_rate': 2.902275335392232e-07, 'epoch': 0.85} 85%|████████▍ | 10463/12313 [7:50:12<1:23:05, 2.69s/it] 85%|████████▍ | 10464/12313 [7:50:15<1:23:24, 2.71s/it] {'loss': 0.3988, 'grad_norm': 4.163035284682951, 'learning_rate': 2.8992006709229803e-07, 'epoch': 0.85} 85%|████████▍ | 10464/12313 [7:50:15<1:23:24, 2.71s/it] 85%|████████▍ | 10465/12313 [7:50:18<1:22:23, 2.67s/it] {'loss': 0.4362, 'grad_norm': 4.57874278542473, 'learning_rate': 2.8961275357120704e-07, 'epoch': 0.85} 85%|████████▍ | 10465/12313 [7:50:18<1:22:23, 2.67s/it] 85%|████████▍ | 10466/12313 [7:50:20<1:22:30, 2.68s/it] {'loss': 0.4934, 'grad_norm': 5.572554269055086, 'learning_rate': 2.893055929972152e-07, 'epoch': 0.85} 85%|████████▍ | 10466/12313 [7:50:20<1:22:30, 2.68s/it] 85%|████████▌ | 10467/12313 [7:50:23<1:22:12, 2.67s/it] {'loss': 0.5137, 'grad_norm': 10.65165020002116, 'learning_rate': 2.8899858539157694e-07, 'epoch': 0.85} 85%|████████▌ | 10467/12313 [7:50:23<1:22:12, 2.67s/it] 85%|████████▌ | 10468/12313 [7:50:26<1:22:02, 2.67s/it] {'loss': 0.4341, 'grad_norm': 9.776998737199031, 'learning_rate': 2.886917307755349e-07, 'epoch': 0.85} 85%|████████▌ | 10468/12313 [7:50:26<1:22:02, 2.67s/it] 85%|████████▌ | 10469/12313 [7:50:28<1:21:18, 2.65s/it] {'loss': 0.4922, 'grad_norm': 6.586810382710384, 'learning_rate': 2.8838502917032136e-07, 'epoch': 0.85} 85%|████████▌ | 10469/12313 [7:50:28<1:21:18, 2.65s/it] 85%|████████▌ | 10470/12313 [7:50:31<1:23:03, 2.70s/it] {'loss': 0.3545, 'grad_norm': 5.154039748490755, 'learning_rate': 2.880784805971595e-07, 'epoch': 0.85} 85%|████████▌ | 10470/12313 [7:50:31<1:23:03, 2.70s/it] 85%|████████▌ | 10471/12313 [7:50:34<1:23:40, 2.73s/it] {'loss': 0.4876, 'grad_norm': 3.9568608557183516, 'learning_rate': 2.8777208507726056e-07, 'epoch': 0.85} 85%|████████▌ | 10471/12313 [7:50:34<1:23:40, 2.73s/it] 85%|████████▌ | 10472/12313 [7:50:36<1:20:50, 2.63s/it] {'loss': 0.4853, 'grad_norm': 4.288857280998075, 'learning_rate': 2.874658426318244e-07, 'epoch': 0.85} 85%|████████▌ | 10472/12313 [7:50:36<1:20:50, 2.63s/it] 85%|████████▌ | 10473/12313 [7:50:39<1:19:03, 2.58s/it] {'loss': 0.4401, 'grad_norm': 4.368402834061044, 'learning_rate': 2.871597532820425e-07, 'epoch': 0.85} 85%|████████▌ | 10473/12313 [7:50:39<1:19:03, 2.58s/it] 85%|████████▌ | 10474/12313 [7:50:41<1:19:09, 2.58s/it] {'loss': 0.5619, 'grad_norm': 4.033924407915728, 'learning_rate': 2.86853817049095e-07, 'epoch': 0.85} 85%|████████▌ | 10474/12313 [7:50:41<1:19:09, 2.58s/it] 85%|████████▌ | 10475/12313 [7:50:44<1:17:50, 2.54s/it] {'loss': 0.4642, 'grad_norm': 5.6815820407123585, 'learning_rate': 2.865480339541496e-07, 'epoch': 0.85} 85%|████████▌ | 10475/12313 [7:50:44<1:17:50, 2.54s/it] 85%|████████▌ | 10476/12313 [7:50:46<1:18:27, 2.56s/it] {'loss': 0.4468, 'grad_norm': 3.948873040109591, 'learning_rate': 2.8624240401836647e-07, 'epoch': 0.85} 85%|████████▌ | 10476/12313 [7:50:46<1:18:27, 2.56s/it] 85%|████████▌ | 10477/12313 [7:50:49<1:19:25, 2.60s/it] {'loss': 0.532, 'grad_norm': 3.2353533973872084, 'learning_rate': 2.859369272628928e-07, 'epoch': 0.85} 85%|████████▌ | 10477/12313 [7:50:49<1:19:25, 2.60s/it] 85%|████████▌ | 10478/12313 [7:50:52<1:20:42, 2.64s/it] {'loss': 0.4157, 'grad_norm': 3.308823715003122, 'learning_rate': 2.856316037088655e-07, 'epoch': 0.85} 85%|████████▌ | 10478/12313 [7:50:52<1:20:42, 2.64s/it] 85%|████████▌ | 10479/12313 [7:50:54<1:21:02, 2.65s/it] {'loss': 0.4894, 'grad_norm': 13.811883074908797, 'learning_rate': 2.8532643337741195e-07, 'epoch': 0.85} 85%|████████▌ | 10479/12313 [7:50:54<1:21:02, 2.65s/it] 85%|████████▌ | 10480/12313 [7:50:57<1:19:22, 2.60s/it] {'loss': 0.4306, 'grad_norm': 3.2224568567146945, 'learning_rate': 2.8502141628964836e-07, 'epoch': 0.85} 85%|████████▌ | 10480/12313 [7:50:57<1:19:22, 2.60s/it] 85%|████████▌ | 10481/12313 [7:51:00<1:19:54, 2.62s/it] {'loss': 0.5575, 'grad_norm': 3.9763076127585313, 'learning_rate': 2.8471655246668007e-07, 'epoch': 0.85} 85%|████████▌ | 10481/12313 [7:51:00<1:19:54, 2.62s/it] 85%|████████▌ | 10482/12313 [7:51:02<1:20:12, 2.63s/it] {'loss': 0.4055, 'grad_norm': 6.740981385096001, 'learning_rate': 2.844118419296024e-07, 'epoch': 0.85} 85%|████████▌ | 10482/12313 [7:51:02<1:20:12, 2.63s/it] 85%|████████▌ | 10483/12313 [7:51:05<1:19:27, 2.60s/it] {'loss': 0.3875, 'grad_norm': 20.280874148042784, 'learning_rate': 2.841072846994994e-07, 'epoch': 0.85} 85%|████████▌ | 10483/12313 [7:51:05<1:19:27, 2.60s/it] 85%|████████▌ | 10484/12313 [7:51:07<1:19:51, 2.62s/it] {'loss': 0.3133, 'grad_norm': 9.735322905143065, 'learning_rate': 2.8380288079744494e-07, 'epoch': 0.85} 85%|████████▌ | 10484/12313 [7:51:07<1:19:51, 2.62s/it] 85%|████████▌ | 10485/12313 [7:51:10<1:21:19, 2.67s/it] {'loss': 0.4564, 'grad_norm': 4.313636812248141, 'learning_rate': 2.8349863024450143e-07, 'epoch': 0.85} 85%|████████▌ | 10485/12313 [7:51:10<1:21:19, 2.67s/it] 85%|████████▌ | 10486/12313 [7:51:13<1:22:07, 2.70s/it] {'loss': 0.396, 'grad_norm': 6.764374322762755, 'learning_rate': 2.8319453306172225e-07, 'epoch': 0.85} 85%|████████▌ | 10486/12313 [7:51:13<1:22:07, 2.70s/it] 85%|████████▌ | 10487/12313 [7:51:16<1:21:31, 2.68s/it] {'loss': 0.5508, 'grad_norm': 5.159578175212937, 'learning_rate': 2.8289058927014944e-07, 'epoch': 0.85} 85%|████████▌ | 10487/12313 [7:51:16<1:21:31, 2.68s/it] 85%|████████▌ | 10488/12313 [7:51:18<1:20:20, 2.64s/it] {'loss': 0.4523, 'grad_norm': 9.297209132292254, 'learning_rate': 2.8258679889081346e-07, 'epoch': 0.85} 85%|████████▌ | 10488/12313 [7:51:18<1:20:20, 2.64s/it] 85%|████████▌ | 10489/12313 [7:51:21<1:21:01, 2.67s/it] {'loss': 0.4515, 'grad_norm': 3.958399617586093, 'learning_rate': 2.8228316194473607e-07, 'epoch': 0.85} 85%|████████▌ | 10489/12313 [7:51:21<1:21:01, 2.67s/it] 85%|████████▌ | 10490/12313 [7:51:24<1:21:19, 2.68s/it] {'loss': 0.5989, 'grad_norm': 6.059112536774565, 'learning_rate': 2.8197967845292687e-07, 'epoch': 0.85} 85%|████████▌ | 10490/12313 [7:51:24<1:21:19, 2.68s/it] 85%|████████▌ | 10491/12313 [7:51:26<1:21:22, 2.68s/it] {'loss': 0.3805, 'grad_norm': 7.811507918796633, 'learning_rate': 2.8167634843638434e-07, 'epoch': 0.85} 85%|████████▌ | 10491/12313 [7:51:26<1:21:22, 2.68s/it] 85%|████████▌ | 10492/12313 [7:51:29<1:20:59, 2.67s/it] {'loss': 0.3511, 'grad_norm': 6.346071562908303, 'learning_rate': 2.8137317191609864e-07, 'epoch': 0.85} 85%|████████▌ | 10492/12313 [7:51:29<1:20:59, 2.67s/it] 85%|████████▌ | 10493/12313 [7:51:31<1:19:51, 2.63s/it] {'loss': 0.3602, 'grad_norm': 8.889395270642837, 'learning_rate': 2.810701489130477e-07, 'epoch': 0.85} 85%|████████▌ | 10493/12313 [7:51:31<1:19:51, 2.63s/it] 85%|████████▌ | 10494/12313 [7:51:34<1:19:02, 2.61s/it] {'loss': 0.3249, 'grad_norm': 5.77337789424289, 'learning_rate': 2.807672794481986e-07, 'epoch': 0.85} 85%|████████▌ | 10494/12313 [7:51:34<1:19:02, 2.61s/it] 85%|████████▌ | 10495/12313 [7:51:36<1:17:11, 2.55s/it] {'loss': 0.4549, 'grad_norm': 5.856871874864799, 'learning_rate': 2.804645635425091e-07, 'epoch': 0.85} 85%|████████▌ | 10495/12313 [7:51:36<1:17:11, 2.55s/it] 85%|████████▌ | 10496/12313 [7:51:39<1:18:51, 2.60s/it] {'loss': 0.4752, 'grad_norm': 3.9708905797147924, 'learning_rate': 2.801620012169251e-07, 'epoch': 0.85} 85%|████████▌ | 10496/12313 [7:51:39<1:18:51, 2.60s/it] 85%|████████▌ | 10497/12313 [7:51:42<1:18:27, 2.59s/it] {'loss': 0.5097, 'grad_norm': 5.58347667715412, 'learning_rate': 2.7985959249238165e-07, 'epoch': 0.85} 85%|████████▌ | 10497/12313 [7:51:42<1:18:27, 2.59s/it] 85%|████████▌ | 10498/12313 [7:51:45<1:24:50, 2.80s/it] {'loss': 0.3949, 'grad_norm': 5.356550864085213, 'learning_rate': 2.7955733738980443e-07, 'epoch': 0.85} 85%|████████▌ | 10498/12313 [7:51:45<1:24:50, 2.80s/it] 85%|████████▌ | 10499/12313 [7:51:48<1:23:26, 2.76s/it] {'loss': 0.4579, 'grad_norm': 5.227735405567097, 'learning_rate': 2.792552359301087e-07, 'epoch': 0.85} 85%|████████▌ | 10499/12313 [7:51:48<1:23:26, 2.76s/it] 85%|████████▌ | 10500/12313 [7:51:51<1:25:50, 2.84s/it] {'loss': 0.5133, 'grad_norm': 10.79645556327445, 'learning_rate': 2.789532881341969e-07, 'epoch': 0.85} 85%|████████▌ | 10500/12313 [7:51:51<1:25:50, 2.84s/it] 85%|████████▌ | 10501/12313 [7:51:53<1:22:56, 2.75s/it] {'loss': 0.4597, 'grad_norm': 9.986241964497967, 'learning_rate': 2.786514940229634e-07, 'epoch': 0.85} 85%|████████▌ | 10501/12313 [7:51:53<1:22:56, 2.75s/it] 85%|████████▌ | 10502/12313 [7:51:56<1:21:19, 2.69s/it] {'loss': 0.312, 'grad_norm': 5.528139225360127, 'learning_rate': 2.7834985361728987e-07, 'epoch': 0.85} 85%|████████▌ | 10502/12313 [7:51:56<1:21:19, 2.69s/it] 85%|████████▌ | 10503/12313 [7:51:59<1:22:50, 2.75s/it] {'loss': 0.6169, 'grad_norm': 5.4346809701316685, 'learning_rate': 2.7804836693804905e-07, 'epoch': 0.85} 85%|████████▌ | 10503/12313 [7:51:59<1:22:50, 2.75s/it] 85%|████████▌ | 10504/12313 [7:52:01<1:19:30, 2.64s/it] {'loss': 0.5074, 'grad_norm': 4.421996352718119, 'learning_rate': 2.7774703400610086e-07, 'epoch': 0.85} 85%|████████▌ | 10504/12313 [7:52:01<1:19:30, 2.64s/it] 85%|████████▌ | 10505/12313 [7:52:04<1:20:24, 2.67s/it] {'loss': 0.2709, 'grad_norm': 12.764502226614324, 'learning_rate': 2.7744585484229674e-07, 'epoch': 0.85} 85%|████████▌ | 10505/12313 [7:52:04<1:20:24, 2.67s/it] 85%|████████▌ | 10506/12313 [7:52:07<1:21:21, 2.70s/it] {'loss': 0.4455, 'grad_norm': 5.85671022027329, 'learning_rate': 2.771448294674775e-07, 'epoch': 0.85} 85%|████████▌ | 10506/12313 [7:52:07<1:21:21, 2.70s/it] 85%|████████▌ | 10507/12313 [7:52:09<1:20:40, 2.68s/it] {'loss': 0.5653, 'grad_norm': 8.599669196683427, 'learning_rate': 2.768439579024712e-07, 'epoch': 0.85} 85%|████████▌ | 10507/12313 [7:52:09<1:20:40, 2.68s/it] 85%|████████▌ | 10508/12313 [7:52:13<1:26:48, 2.89s/it] {'loss': 0.4152, 'grad_norm': 6.732667207212168, 'learning_rate': 2.7654324016809757e-07, 'epoch': 0.85} 85%|████████▌ | 10508/12313 [7:52:13<1:26:48, 2.89s/it] 85%|████████▌ | 10509/12313 [7:52:15<1:23:51, 2.79s/it] {'loss': 0.4589, 'grad_norm': 4.779828616945185, 'learning_rate': 2.7624267628516445e-07, 'epoch': 0.85} 85%|████████▌ | 10509/12313 [7:52:15<1:23:51, 2.79s/it] 85%|████████▌ | 10510/12313 [7:52:18<1:22:48, 2.76s/it] {'loss': 0.3779, 'grad_norm': 11.113748713116502, 'learning_rate': 2.759422662744682e-07, 'epoch': 0.85} 85%|████████▌ | 10510/12313 [7:52:18<1:22:48, 2.76s/it] 85%|████████▌ | 10511/12313 [7:52:21<1:22:46, 2.76s/it] {'loss': 0.4627, 'grad_norm': 4.481185872009129, 'learning_rate': 2.7564201015679664e-07, 'epoch': 0.85} 85%|████████▌ | 10511/12313 [7:52:21<1:22:46, 2.76s/it] 85%|████████▌ | 10512/12313 [7:52:23<1:23:49, 2.79s/it] {'loss': 0.5168, 'grad_norm': 4.723745824336893, 'learning_rate': 2.7534190795292626e-07, 'epoch': 0.85} 85%|████████▌ | 10512/12313 [7:52:23<1:23:49, 2.79s/it] 85%|████████▌ | 10513/12313 [7:52:26<1:22:02, 2.73s/it] {'loss': 0.5351, 'grad_norm': 4.5772924241630895, 'learning_rate': 2.750419596836215e-07, 'epoch': 0.85} 85%|████████▌ | 10513/12313 [7:52:26<1:22:02, 2.73s/it] 85%|████████▌ | 10514/12313 [7:52:29<1:20:17, 2.68s/it] {'loss': 0.358, 'grad_norm': 4.890888601625891, 'learning_rate': 2.7474216536963803e-07, 'epoch': 0.85} 85%|████████▌ | 10514/12313 [7:52:29<1:20:17, 2.68s/it] 85%|████████▌ | 10515/12313 [7:52:31<1:19:59, 2.67s/it] {'loss': 0.3687, 'grad_norm': 9.34358416833541, 'learning_rate': 2.744425250317201e-07, 'epoch': 0.85} 85%|████████▌ | 10515/12313 [7:52:31<1:19:59, 2.67s/it] 85%|████████▌ | 10516/12313 [7:52:34<1:22:06, 2.74s/it] {'loss': 0.5659, 'grad_norm': 4.739733132295761, 'learning_rate': 2.7414303869059994e-07, 'epoch': 0.85} 85%|████████▌ | 10516/12313 [7:52:34<1:22:06, 2.74s/it] 85%|████████▌ | 10517/12313 [7:52:37<1:22:53, 2.77s/it] {'loss': 0.4997, 'grad_norm': 4.5314061539867145, 'learning_rate': 2.7384370636700187e-07, 'epoch': 0.85} 85%|████████▌ | 10517/12313 [7:52:37<1:22:53, 2.77s/it] 85%|████████▌ | 10518/12313 [7:52:40<1:22:06, 2.74s/it] {'loss': 0.3698, 'grad_norm': 10.166709743253572, 'learning_rate': 2.735445280816373e-07, 'epoch': 0.85} 85%|████████▌ | 10518/12313 [7:52:40<1:22:06, 2.74s/it] 85%|████████▌ | 10519/12313 [7:52:42<1:19:11, 2.65s/it] {'loss': 0.4164, 'grad_norm': 6.233844850352306, 'learning_rate': 2.7324550385520844e-07, 'epoch': 0.85} 85%|████████▌ | 10519/12313 [7:52:42<1:19:11, 2.65s/it] 85%|████████▌ | 10520/12313 [7:52:45<1:19:51, 2.67s/it] {'loss': 0.5892, 'grad_norm': 4.1121635178190665, 'learning_rate': 2.72946633708405e-07, 'epoch': 0.85} 85%|████████▌ | 10520/12313 [7:52:45<1:19:51, 2.67s/it] 85%|████████▌ | 10521/12313 [7:52:48<1:20:34, 2.70s/it] {'loss': 0.5313, 'grad_norm': 10.454864690447497, 'learning_rate': 2.726479176619087e-07, 'epoch': 0.85} 85%|████████▌ | 10521/12313 [7:52:48<1:20:34, 2.70s/it] 85%|████████▌ | 10522/12313 [7:52:51<1:26:25, 2.90s/it] {'loss': 0.6448, 'grad_norm': 3.298486331376736, 'learning_rate': 2.723493557363885e-07, 'epoch': 0.85} 85%|████████▌ | 10522/12313 [7:52:51<1:26:25, 2.90s/it] 85%|████████▌ | 10523/12313 [7:52:53<1:21:15, 2.72s/it] {'loss': 0.5786, 'grad_norm': 4.677659267745939, 'learning_rate': 2.720509479525027e-07, 'epoch': 0.85} 85%|████████▌ | 10523/12313 [7:52:53<1:21:15, 2.72s/it] 85%|████████▌ | 10524/12313 [7:52:56<1:20:04, 2.69s/it] {'loss': 0.4443, 'grad_norm': 4.621472290562395, 'learning_rate': 2.7175269433089984e-07, 'epoch': 0.85} 85%|████████▌ | 10524/12313 [7:52:56<1:20:04, 2.69s/it] 85%|████████▌ | 10525/12313 [7:52:58<1:18:56, 2.65s/it] {'loss': 0.4766, 'grad_norm': 4.075558864406144, 'learning_rate': 2.7145459489221845e-07, 'epoch': 0.85} 85%|████████▌ | 10525/12313 [7:52:58<1:18:56, 2.65s/it] 85%|████████▌ | 10526/12313 [7:53:01<1:16:30, 2.57s/it] {'loss': 0.5365, 'grad_norm': 3.9376155282925427, 'learning_rate': 2.7115664965708387e-07, 'epoch': 0.85} 85%|████████▌ | 10526/12313 [7:53:01<1:16:30, 2.57s/it] 85%|████████▌ | 10527/12313 [7:53:04<1:18:39, 2.64s/it] {'loss': 0.5849, 'grad_norm': 4.9988126907241135, 'learning_rate': 2.708588586461139e-07, 'epoch': 0.85} 85%|████████▌ | 10527/12313 [7:53:04<1:18:39, 2.64s/it] 86%|████████▌ | 10528/12313 [7:53:06<1:18:42, 2.65s/it] {'loss': 0.5862, 'grad_norm': 4.260322647357954, 'learning_rate': 2.7056122187991306e-07, 'epoch': 0.86} 86%|████████▌ | 10528/12313 [7:53:06<1:18:42, 2.65s/it] 86%|████████▌ | 10529/12313 [7:53:09<1:16:42, 2.58s/it] {'loss': 0.6337, 'grad_norm': 4.653669694516675, 'learning_rate': 2.7026373937907636e-07, 'epoch': 0.86} 86%|████████▌ | 10529/12313 [7:53:09<1:16:42, 2.58s/it] 86%|████████▌ | 10530/12313 [7:53:12<1:18:32, 2.64s/it] {'loss': 0.3838, 'grad_norm': 16.425837953856785, 'learning_rate': 2.6996641116418863e-07, 'epoch': 0.86} 86%|████████▌ | 10530/12313 [7:53:12<1:18:32, 2.64s/it] 86%|████████▌ | 10531/12313 [7:53:14<1:17:51, 2.62s/it] {'loss': 0.3394, 'grad_norm': 8.7333194311729, 'learning_rate': 2.696692372558224e-07, 'epoch': 0.86} 86%|████████▌ | 10531/12313 [7:53:14<1:17:51, 2.62s/it] 86%|████████▌ | 10532/12313 [7:53:17<1:22:00, 2.76s/it] {'loss': 0.4557, 'grad_norm': 6.271008188155812, 'learning_rate': 2.6937221767454086e-07, 'epoch': 0.86} 86%|████████▌ | 10532/12313 [7:53:17<1:22:00, 2.76s/it] 86%|████████▌ | 10533/12313 [7:53:20<1:20:46, 2.72s/it] {'loss': 0.4263, 'grad_norm': 4.33027681898429, 'learning_rate': 2.690753524408973e-07, 'epoch': 0.86} 86%|████████▌ | 10533/12313 [7:53:20<1:20:46, 2.72s/it] 86%|████████▌ | 10534/12313 [7:53:22<1:18:16, 2.64s/it] {'loss': 0.4403, 'grad_norm': 8.869308357187672, 'learning_rate': 2.6877864157543204e-07, 'epoch': 0.86} 86%|████████▌ | 10534/12313 [7:53:22<1:18:16, 2.64s/it] 86%|████████▌ | 10535/12313 [7:53:25<1:18:08, 2.64s/it] {'loss': 0.5313, 'grad_norm': 2.978988832273135, 'learning_rate': 2.684820850986758e-07, 'epoch': 0.86} 86%|████████▌ | 10535/12313 [7:53:25<1:18:08, 2.64s/it] 86%|████████▌ | 10536/12313 [7:53:28<1:18:24, 2.65s/it] {'loss': 0.422, 'grad_norm': 8.360644599601576, 'learning_rate': 2.6818568303114967e-07, 'epoch': 0.86} 86%|████████▌ | 10536/12313 [7:53:28<1:18:24, 2.65s/it] 86%|████████▌ | 10537/12313 [7:53:30<1:17:51, 2.63s/it] {'loss': 0.3573, 'grad_norm': 11.386127329062182, 'learning_rate': 2.67889435393362e-07, 'epoch': 0.86} 86%|████████▌ | 10537/12313 [7:53:30<1:17:51, 2.63s/it] 86%|████████▌ | 10538/12313 [7:53:33<1:19:07, 2.67s/it] {'loss': 0.6315, 'grad_norm': 5.2062587599165475, 'learning_rate': 2.6759334220581273e-07, 'epoch': 0.86} 86%|████████▌ | 10538/12313 [7:53:33<1:19:07, 2.67s/it] 86%|████████▌ | 10539/12313 [7:53:36<1:21:13, 2.75s/it] {'loss': 0.4583, 'grad_norm': 4.445713015076414, 'learning_rate': 2.6729740348898886e-07, 'epoch': 0.86} 86%|████████▌ | 10539/12313 [7:53:36<1:21:13, 2.75s/it] 86%|████████▌ | 10540/12313 [7:53:38<1:19:02, 2.67s/it] {'loss': 0.4175, 'grad_norm': 5.948969244528773, 'learning_rate': 2.670016192633687e-07, 'epoch': 0.86} 86%|████████▌ | 10540/12313 [7:53:38<1:19:02, 2.67s/it] 86%|████████▌ | 10541/12313 [7:53:41<1:18:42, 2.67s/it] {'loss': 0.4651, 'grad_norm': 7.049516828972635, 'learning_rate': 2.667059895494184e-07, 'epoch': 0.86} 86%|████████▌ | 10541/12313 [7:53:41<1:18:42, 2.67s/it] 86%|████████▌ | 10542/12313 [7:53:44<1:18:59, 2.68s/it] {'loss': 0.397, 'grad_norm': 12.341170101279898, 'learning_rate': 2.6641051436759353e-07, 'epoch': 0.86} 86%|████████▌ | 10542/12313 [7:53:44<1:18:59, 2.68s/it] 86%|████████▌ | 10543/12313 [7:53:46<1:19:53, 2.71s/it] {'loss': 0.5794, 'grad_norm': 4.024117199833696, 'learning_rate': 2.6611519373834076e-07, 'epoch': 0.86} 86%|████████▌ | 10543/12313 [7:53:46<1:19:53, 2.71s/it] 86%|████████▌ | 10544/12313 [7:53:49<1:19:36, 2.70s/it] {'loss': 0.4269, 'grad_norm': 9.173352219765476, 'learning_rate': 2.6582002768209326e-07, 'epoch': 0.86} 86%|████████▌ | 10544/12313 [7:53:49<1:19:36, 2.70s/it] 86%|████████▌ | 10545/12313 [7:53:52<1:19:29, 2.70s/it] {'loss': 0.5124, 'grad_norm': 6.424827613404485, 'learning_rate': 2.6552501621927544e-07, 'epoch': 0.86} 86%|████████▌ | 10545/12313 [7:53:52<1:19:29, 2.70s/it] 86%|████████▌ | 10546/12313 [7:53:54<1:18:37, 2.67s/it] {'loss': 0.4117, 'grad_norm': 4.601328438451911, 'learning_rate': 2.6523015937030136e-07, 'epoch': 0.86} 86%|████████▌ | 10546/12313 [7:53:54<1:18:37, 2.67s/it] 86%|████████▌ | 10547/12313 [7:53:57<1:19:05, 2.69s/it] {'loss': 0.3709, 'grad_norm': 5.439497421746022, 'learning_rate': 2.649354571555729e-07, 'epoch': 0.86} 86%|████████▌ | 10547/12313 [7:53:57<1:19:05, 2.69s/it] 86%|████████▌ | 10548/12313 [7:54:00<1:18:08, 2.66s/it] {'loss': 0.4981, 'grad_norm': 5.6866827995608755, 'learning_rate': 2.6464090959548135e-07, 'epoch': 0.86} 86%|████████▌ | 10548/12313 [7:54:00<1:18:08, 2.66s/it] 86%|████████▌ | 10549/12313 [7:54:03<1:21:38, 2.78s/it] {'loss': 0.4026, 'grad_norm': 8.576641000068408, 'learning_rate': 2.6434651671040894e-07, 'epoch': 0.86} 86%|████████▌ | 10549/12313 [7:54:03<1:21:38, 2.78s/it] 86%|████████▌ | 10550/12313 [7:54:06<1:23:43, 2.85s/it] {'loss': 0.5568, 'grad_norm': 4.32299331991155, 'learning_rate': 2.6405227852072504e-07, 'epoch': 0.86} 86%|████████▌ | 10550/12313 [7:54:06<1:23:43, 2.85s/it] 86%|████████▌ | 10551/12313 [7:54:08<1:21:40, 2.78s/it] {'loss': 0.7008, 'grad_norm': 6.515668343832238, 'learning_rate': 2.637581950467896e-07, 'epoch': 0.86} 86%|████████▌ | 10551/12313 [7:54:08<1:21:40, 2.78s/it] 86%|████████▌ | 10552/12313 [7:54:11<1:22:07, 2.80s/it] {'loss': 0.5308, 'grad_norm': 4.115866997613363, 'learning_rate': 2.634642663089529e-07, 'epoch': 0.86} 86%|████████▌ | 10552/12313 [7:54:11<1:22:07, 2.80s/it] 86%|████████▌ | 10553/12313 [7:54:14<1:20:17, 2.74s/it] {'loss': 0.4151, 'grad_norm': 5.714701058196075, 'learning_rate': 2.6317049232755185e-07, 'epoch': 0.86} 86%|████████▌ | 10553/12313 [7:54:14<1:20:17, 2.74s/it] 86%|████████▌ | 10554/12313 [7:54:16<1:18:09, 2.67s/it] {'loss': 0.3542, 'grad_norm': 6.258820706442769, 'learning_rate': 2.628768731229142e-07, 'epoch': 0.86} 86%|████████▌ | 10554/12313 [7:54:16<1:18:09, 2.67s/it] 86%|████████▌ | 10555/12313 [7:54:19<1:18:55, 2.69s/it] {'loss': 0.3555, 'grad_norm': 7.447014078813493, 'learning_rate': 2.6258340871535753e-07, 'epoch': 0.86} 86%|████████▌ | 10555/12313 [7:54:19<1:18:55, 2.69s/it] 86%|████████▌ | 10556/12313 [7:54:22<1:19:27, 2.71s/it] {'loss': 0.3792, 'grad_norm': 4.687623344122876, 'learning_rate': 2.6229009912518754e-07, 'epoch': 0.86} 86%|████████▌ | 10556/12313 [7:54:22<1:19:27, 2.71s/it] 86%|████████▌ | 10557/12313 [7:54:24<1:17:50, 2.66s/it] {'loss': 0.5262, 'grad_norm': 4.765486433369644, 'learning_rate': 2.619969443726994e-07, 'epoch': 0.86} 86%|████████▌ | 10557/12313 [7:54:24<1:17:50, 2.66s/it] 86%|████████▌ | 10558/12313 [7:54:27<1:17:07, 2.64s/it] {'loss': 0.3549, 'grad_norm': 4.429572901564648, 'learning_rate': 2.6170394447817824e-07, 'epoch': 0.86} 86%|████████▌ | 10558/12313 [7:54:27<1:17:07, 2.64s/it] 86%|████████▌ | 10559/12313 [7:54:30<1:18:10, 2.67s/it] {'loss': 0.6322, 'grad_norm': 5.054699713881342, 'learning_rate': 2.6141109946189874e-07, 'epoch': 0.86} 86%|████████▌ | 10559/12313 [7:54:30<1:18:10, 2.67s/it] 86%|████████▌ | 10560/12313 [7:54:32<1:16:27, 2.62s/it] {'loss': 0.4139, 'grad_norm': 5.352387291113756, 'learning_rate': 2.611184093441232e-07, 'epoch': 0.86} 86%|████████▌ | 10560/12313 [7:54:32<1:16:27, 2.62s/it] 86%|████████▌ | 10561/12313 [7:54:35<1:16:04, 2.61s/it] {'loss': 0.6806, 'grad_norm': 7.045189522256243, 'learning_rate': 2.608258741451045e-07, 'epoch': 0.86} 86%|████████▌ | 10561/12313 [7:54:35<1:16:04, 2.61s/it] 86%|████████▌ | 10562/12313 [7:54:37<1:15:54, 2.60s/it] {'loss': 0.5799, 'grad_norm': 4.4588250538525935, 'learning_rate': 2.605334938850851e-07, 'epoch': 0.86} 86%|████████▌ | 10562/12313 [7:54:37<1:15:54, 2.60s/it] 86%|████████▌ | 10563/12313 [7:54:40<1:15:00, 2.57s/it] {'loss': 0.4641, 'grad_norm': 4.741068158876179, 'learning_rate': 2.6024126858429503e-07, 'epoch': 0.86} 86%|████████▌ | 10563/12313 [7:54:40<1:15:00, 2.57s/it] 86%|████████▌ | 10564/12313 [7:54:43<1:16:37, 2.63s/it] {'loss': 0.4024, 'grad_norm': 6.387856164555808, 'learning_rate': 2.599491982629554e-07, 'epoch': 0.86} 86%|████████▌ | 10564/12313 [7:54:43<1:16:37, 2.63s/it] 86%|████████▌ | 10565/12313 [7:54:45<1:17:29, 2.66s/it] {'loss': 0.5682, 'grad_norm': 11.111320718396847, 'learning_rate': 2.596572829412766e-07, 'epoch': 0.86} 86%|████████▌ | 10565/12313 [7:54:45<1:17:29, 2.66s/it] 86%|████████▌ | 10566/12313 [7:54:48<1:20:19, 2.76s/it] {'loss': 0.4614, 'grad_norm': 3.9432874147413366, 'learning_rate': 2.59365522639457e-07, 'epoch': 0.86} 86%|████████▌ | 10566/12313 [7:54:48<1:20:19, 2.76s/it] 86%|████████▌ | 10567/12313 [7:54:51<1:18:31, 2.70s/it] {'loss': 0.5058, 'grad_norm': 4.644099435339523, 'learning_rate': 2.590739173776841e-07, 'epoch': 0.86} 86%|████████▌ | 10567/12313 [7:54:51<1:18:31, 2.70s/it] 86%|████████▌ | 10568/12313 [7:54:54<1:20:18, 2.76s/it] {'loss': 0.3661, 'grad_norm': 6.2728581984877865, 'learning_rate': 2.5878246717613684e-07, 'epoch': 0.86} 86%|████████▌ | 10568/12313 [7:54:54<1:20:18, 2.76s/it] 86%|████████▌ | 10569/12313 [7:54:57<1:19:21, 2.73s/it] {'loss': 0.6458, 'grad_norm': 4.322705528890997, 'learning_rate': 2.5849117205498096e-07, 'epoch': 0.86} 86%|████████▌ | 10569/12313 [7:54:57<1:19:21, 2.73s/it] 86%|████████▌ | 10570/12313 [7:54:59<1:18:24, 2.70s/it] {'loss': 0.6983, 'grad_norm': 3.749986780079194, 'learning_rate': 2.582000320343728e-07, 'epoch': 0.86} 86%|████████▌ | 10570/12313 [7:54:59<1:18:24, 2.70s/it] 86%|████████▌ | 10571/12313 [7:55:02<1:18:58, 2.72s/it] {'loss': 0.4619, 'grad_norm': 7.654907749368039, 'learning_rate': 2.579090471344584e-07, 'epoch': 0.86} 86%|████████▌ | 10571/12313 [7:55:02<1:18:58, 2.72s/it] 86%|████████▌ | 10572/12313 [7:55:05<1:21:03, 2.79s/it] {'loss': 0.4562, 'grad_norm': 7.384217550856351, 'learning_rate': 2.576182173753719e-07, 'epoch': 0.86} 86%|████████▌ | 10572/12313 [7:55:05<1:21:03, 2.79s/it] 86%|████████▌ | 10573/12313 [7:55:08<1:20:13, 2.77s/it] {'loss': 0.5025, 'grad_norm': 6.675039979352513, 'learning_rate': 2.5732754277723703e-07, 'epoch': 0.86} 86%|████████▌ | 10573/12313 [7:55:08<1:20:13, 2.77s/it] 86%|████████▌ | 10574/12313 [7:55:10<1:18:29, 2.71s/it] {'loss': 0.4889, 'grad_norm': 4.025631310651602, 'learning_rate': 2.5703702336016654e-07, 'epoch': 0.86} 86%|████████▌ | 10574/12313 [7:55:10<1:18:29, 2.71s/it] 86%|████████▌ | 10575/12313 [7:55:13<1:18:18, 2.70s/it] {'loss': 0.3889, 'grad_norm': 4.195667201107953, 'learning_rate': 2.567466591442638e-07, 'epoch': 0.86} 86%|████████▌ | 10575/12313 [7:55:13<1:18:18, 2.70s/it] 86%|████████▌ | 10576/12313 [7:55:15<1:16:40, 2.65s/it] {'loss': 0.4681, 'grad_norm': 5.444485183310965, 'learning_rate': 2.5645645014961947e-07, 'epoch': 0.86} 86%|████████▌ | 10576/12313 [7:55:15<1:16:40, 2.65s/it] 86%|████████▌ | 10577/12313 [7:55:18<1:16:06, 2.63s/it] {'loss': 0.558, 'grad_norm': 4.670632142056175, 'learning_rate': 2.561663963963151e-07, 'epoch': 0.86} 86%|████████▌ | 10577/12313 [7:55:18<1:16:06, 2.63s/it] 86%|████████▌ | 10578/12313 [7:55:21<1:16:22, 2.64s/it] {'loss': 0.5758, 'grad_norm': 5.0298060033672645, 'learning_rate': 2.558764979044212e-07, 'epoch': 0.86} 86%|████████▌ | 10578/12313 [7:55:21<1:16:22, 2.64s/it] 86%|████████▌ | 10579/12313 [7:55:24<1:19:35, 2.75s/it] {'loss': 0.4184, 'grad_norm': 3.3846837426199845, 'learning_rate': 2.555867546939969e-07, 'epoch': 0.86} 86%|████████▌ | 10579/12313 [7:55:24<1:19:35, 2.75s/it] 86%|████████▌ | 10580/12313 [7:55:26<1:19:50, 2.76s/it] {'loss': 0.3676, 'grad_norm': 3.2790619896944957, 'learning_rate': 2.5529716678509007e-07, 'epoch': 0.86} 86%|████████▌ | 10580/12313 [7:55:26<1:19:50, 2.76s/it] 86%|████████▌ | 10581/12313 [7:55:29<1:18:36, 2.72s/it] {'loss': 0.5089, 'grad_norm': 8.215528066018926, 'learning_rate': 2.5500773419774e-07, 'epoch': 0.86} 86%|████████▌ | 10581/12313 [7:55:29<1:18:36, 2.72s/it] 86%|████████▌ | 10582/12313 [7:55:32<1:20:48, 2.80s/it] {'loss': 0.5952, 'grad_norm': 7.985001045047634, 'learning_rate': 2.547184569519728e-07, 'epoch': 0.86} 86%|████████▌ | 10582/12313 [7:55:32<1:20:48, 2.80s/it] 86%|████████▌ | 10583/12313 [7:55:35<1:18:44, 2.73s/it] {'loss': 0.4201, 'grad_norm': 9.533391213718112, 'learning_rate': 2.5442933506780536e-07, 'epoch': 0.86} 86%|████████▌ | 10583/12313 [7:55:35<1:18:44, 2.73s/it] 86%|████████▌ | 10584/12313 [7:55:37<1:15:55, 2.63s/it] {'loss': 0.5651, 'grad_norm': 3.854806058643972, 'learning_rate': 2.541403685652438e-07, 'epoch': 0.86} 86%|████████▌ | 10584/12313 [7:55:37<1:15:55, 2.63s/it] 86%|████████▌ | 10585/12313 [7:55:40<1:15:49, 2.63s/it] {'loss': 0.4844, 'grad_norm': 10.475802887731723, 'learning_rate': 2.53851557464283e-07, 'epoch': 0.86} 86%|████████▌ | 10585/12313 [7:55:40<1:15:49, 2.63s/it] 86%|████████▌ | 10586/12313 [7:55:42<1:15:37, 2.63s/it] {'loss': 0.4356, 'grad_norm': 6.07171775635679, 'learning_rate': 2.535629017849062e-07, 'epoch': 0.86} 86%|████████▌ | 10586/12313 [7:55:42<1:15:37, 2.63s/it] 86%|████████▌ | 10587/12313 [7:55:45<1:15:33, 2.63s/it] {'loss': 0.3976, 'grad_norm': 9.000523149669734, 'learning_rate': 2.532744015470878e-07, 'epoch': 0.86} 86%|████████▌ | 10587/12313 [7:55:45<1:15:33, 2.63s/it] 86%|████████▌ | 10588/12313 [7:55:48<1:15:04, 2.61s/it] {'loss': 0.4646, 'grad_norm': 3.9919734198421164, 'learning_rate': 2.529860567707904e-07, 'epoch': 0.86} 86%|████████▌ | 10588/12313 [7:55:48<1:15:04, 2.61s/it] 86%|████████▌ | 10589/12313 [7:55:50<1:14:55, 2.61s/it] {'loss': 0.5287, 'grad_norm': 5.113963141533617, 'learning_rate': 2.5269786747596504e-07, 'epoch': 0.86} 86%|████████▌ | 10589/12313 [7:55:50<1:14:55, 2.61s/it] 86%|████████▌ | 10590/12313 [7:55:53<1:15:53, 2.64s/it] {'loss': 0.3916, 'grad_norm': 6.0753816648158345, 'learning_rate': 2.5240983368255365e-07, 'epoch': 0.86} 86%|████████▌ | 10590/12313 [7:55:53<1:15:53, 2.64s/it] 86%|████████▌ | 10591/12313 [7:55:55<1:15:37, 2.63s/it] {'loss': 0.7061, 'grad_norm': 5.777553433923641, 'learning_rate': 2.52121955410487e-07, 'epoch': 0.86} 86%|████████▌ | 10591/12313 [7:55:55<1:15:37, 2.63s/it] 86%|████████▌ | 10592/12313 [7:55:58<1:16:26, 2.66s/it] {'loss': 0.4157, 'grad_norm': 6.304746787186769, 'learning_rate': 2.518342326796844e-07, 'epoch': 0.86} 86%|████████▌ | 10592/12313 [7:55:58<1:16:26, 2.66s/it] 86%|████████▌ | 10593/12313 [7:56:01<1:16:24, 2.67s/it] {'loss': 0.4792, 'grad_norm': 8.184175586338531, 'learning_rate': 2.515466655100543e-07, 'epoch': 0.86} 86%|████████▌ | 10593/12313 [7:56:01<1:16:24, 2.67s/it] 86%|████████▌ | 10594/12313 [7:56:04<1:18:45, 2.75s/it] {'loss': 0.4817, 'grad_norm': 7.804180717198223, 'learning_rate': 2.5125925392149533e-07, 'epoch': 0.86} 86%|████████▌ | 10594/12313 [7:56:04<1:18:45, 2.75s/it] 86%|████████▌ | 10595/12313 [7:56:06<1:17:05, 2.69s/it] {'loss': 0.3667, 'grad_norm': 7.390682075003671, 'learning_rate': 2.5097199793389456e-07, 'epoch': 0.86} 86%|████████▌ | 10595/12313 [7:56:06<1:17:05, 2.69s/it] 86%|████████▌ | 10596/12313 [7:56:09<1:16:36, 2.68s/it] {'loss': 0.4751, 'grad_norm': 4.844008089075363, 'learning_rate': 2.506848975671283e-07, 'epoch': 0.86} 86%|████████▌ | 10596/12313 [7:56:09<1:16:36, 2.68s/it] 86%|████████▌ | 10597/12313 [7:56:12<1:17:19, 2.70s/it] {'loss': 0.47, 'grad_norm': 15.0313162783177, 'learning_rate': 2.5039795284106354e-07, 'epoch': 0.86} 86%|████████▌ | 10597/12313 [7:56:12<1:17:19, 2.70s/it] 86%|████████▌ | 10598/12313 [7:56:15<1:18:20, 2.74s/it] {'loss': 0.3705, 'grad_norm': 7.827029414244254, 'learning_rate': 2.5011116377555463e-07, 'epoch': 0.86} 86%|████████▌ | 10598/12313 [7:56:15<1:18:20, 2.74s/it] 86%|████████▌ | 10599/12313 [7:56:17<1:17:31, 2.71s/it] {'loss': 0.5141, 'grad_norm': 4.165034621143684, 'learning_rate': 2.4982453039044536e-07, 'epoch': 0.86} 86%|████████▌ | 10599/12313 [7:56:17<1:17:31, 2.71s/it] 86%|████████▌ | 10600/12313 [7:56:20<1:15:19, 2.64s/it] {'loss': 0.5018, 'grad_norm': 9.403450320448428, 'learning_rate': 2.495380527055699e-07, 'epoch': 0.86} 86%|████████▌ | 10600/12313 [7:56:20<1:15:19, 2.64s/it] 86%|████████▌ | 10601/12313 [7:56:22<1:15:04, 2.63s/it] {'loss': 0.5024, 'grad_norm': 3.669900811182216, 'learning_rate': 2.49251730740751e-07, 'epoch': 0.86} 86%|████████▌ | 10601/12313 [7:56:22<1:15:04, 2.63s/it] 86%|████████▌ | 10602/12313 [7:56:25<1:15:33, 2.65s/it] {'loss': 0.3806, 'grad_norm': 8.666914052435182, 'learning_rate': 2.4896556451579985e-07, 'epoch': 0.86} 86%|████████▌ | 10602/12313 [7:56:25<1:15:33, 2.65s/it] 86%|████████▌ | 10603/12313 [7:56:28<1:16:44, 2.69s/it] {'loss': 0.436, 'grad_norm': 6.5382282595398955, 'learning_rate': 2.4867955405051826e-07, 'epoch': 0.86} 86%|████████▌ | 10603/12313 [7:56:28<1:16:44, 2.69s/it] 86%|████████▌ | 10604/12313 [7:56:30<1:15:59, 2.67s/it] {'loss': 0.4528, 'grad_norm': 6.652949951974725, 'learning_rate': 2.483936993646971e-07, 'epoch': 0.86} 86%|████████▌ | 10604/12313 [7:56:30<1:15:59, 2.67s/it] 86%|████████▌ | 10605/12313 [7:56:33<1:16:37, 2.69s/it] {'loss': 0.4787, 'grad_norm': 7.183747958159308, 'learning_rate': 2.48108000478115e-07, 'epoch': 0.86} 86%|████████▌ | 10605/12313 [7:56:33<1:16:37, 2.69s/it] 86%|████████▌ | 10606/12313 [7:56:36<1:15:54, 2.67s/it] {'loss': 0.5123, 'grad_norm': 10.226892509366555, 'learning_rate': 2.4782245741054175e-07, 'epoch': 0.86} 86%|████████▌ | 10606/12313 [7:56:36<1:15:54, 2.67s/it] 86%|████████▌ | 10607/12313 [7:56:39<1:17:35, 2.73s/it] {'loss': 0.3932, 'grad_norm': 6.451808047428951, 'learning_rate': 2.475370701817348e-07, 'epoch': 0.86} 86%|████████▌ | 10607/12313 [7:56:39<1:17:35, 2.73s/it] 86%|████████▌ | 10608/12313 [7:56:42<1:19:04, 2.78s/it] {'loss': 0.5491, 'grad_norm': 4.898759701665131, 'learning_rate': 2.4725183881144114e-07, 'epoch': 0.86} 86%|████████▌ | 10608/12313 [7:56:42<1:19:04, 2.78s/it] 86%|████████▌ | 10609/12313 [7:56:44<1:17:56, 2.74s/it] {'loss': 0.4651, 'grad_norm': 31.142791051653955, 'learning_rate': 2.4696676331939786e-07, 'epoch': 0.86} 86%|████████▌ | 10609/12313 [7:56:44<1:17:56, 2.74s/it] 86%|████████▌ | 10610/12313 [7:56:47<1:18:14, 2.76s/it] {'loss': 0.5049, 'grad_norm': 3.5669616787142697, 'learning_rate': 2.46681843725331e-07, 'epoch': 0.86} 86%|████████▌ | 10610/12313 [7:56:47<1:18:14, 2.76s/it] 86%|████████▌ | 10611/12313 [7:56:50<1:20:09, 2.83s/it] {'loss': 0.4391, 'grad_norm': 5.205643471968962, 'learning_rate': 2.4639708004895515e-07, 'epoch': 0.86} 86%|████████▌ | 10611/12313 [7:56:50<1:20:09, 2.83s/it] 86%|████████▌ | 10612/12313 [7:56:53<1:21:22, 2.87s/it] {'loss': 0.5294, 'grad_norm': 5.483632769288808, 'learning_rate': 2.4611247230997366e-07, 'epoch': 0.86} 86%|████████▌ | 10612/12313 [7:56:53<1:21:22, 2.87s/it] 86%|████████▌ | 10613/12313 [7:56:56<1:20:00, 2.82s/it] {'loss': 0.4576, 'grad_norm': 7.848858229369638, 'learning_rate': 2.458280205280811e-07, 'epoch': 0.86} 86%|████████▌ | 10613/12313 [7:56:56<1:20:00, 2.82s/it] 86%|████████▌ | 10614/12313 [7:56:58<1:16:42, 2.71s/it] {'loss': 0.5552, 'grad_norm': 4.862179433393447, 'learning_rate': 2.455437247229595e-07, 'epoch': 0.86} 86%|████████▌ | 10614/12313 [7:56:58<1:16:42, 2.71s/it] 86%|████████▌ | 10615/12313 [7:57:01<1:14:47, 2.64s/it] {'loss': 0.573, 'grad_norm': 7.004075796989462, 'learning_rate': 2.4525958491428026e-07, 'epoch': 0.86} 86%|████████▌ | 10615/12313 [7:57:01<1:14:47, 2.64s/it] 86%|████████▌ | 10616/12313 [7:57:03<1:12:25, 2.56s/it] {'loss': 0.5041, 'grad_norm': 27.404691438629225, 'learning_rate': 2.4497560112170444e-07, 'epoch': 0.86} 86%|████████▌ | 10616/12313 [7:57:03<1:12:25, 2.56s/it] 86%|████████▌ | 10617/12313 [7:57:06<1:14:55, 2.65s/it] {'loss': 0.6183, 'grad_norm': 3.9710080611622955, 'learning_rate': 2.446917733648829e-07, 'epoch': 0.86} 86%|████████▌ | 10617/12313 [7:57:06<1:14:55, 2.65s/it] 86%|████████▌ | 10618/12313 [7:57:09<1:15:11, 2.66s/it] {'loss': 0.6269, 'grad_norm': 7.736011644339195, 'learning_rate': 2.444081016634545e-07, 'epoch': 0.86} 86%|████████▌ | 10618/12313 [7:57:09<1:15:11, 2.66s/it] 86%|████████▌ | 10619/12313 [7:57:11<1:16:10, 2.70s/it] {'loss': 0.6128, 'grad_norm': 3.296357608833365, 'learning_rate': 2.4412458603704806e-07, 'epoch': 0.86} 86%|████████▌ | 10619/12313 [7:57:11<1:16:10, 2.70s/it] 86%|████████▋ | 10620/12313 [7:57:14<1:15:54, 2.69s/it] {'loss': 0.4254, 'grad_norm': 4.26650357660672, 'learning_rate': 2.438412265052814e-07, 'epoch': 0.86} 86%|████████▋ | 10620/12313 [7:57:14<1:15:54, 2.69s/it] 86%|████████▋ | 10621/12313 [7:57:16<1:13:49, 2.62s/it] {'loss': 0.4871, 'grad_norm': 7.332608647465775, 'learning_rate': 2.4355802308776073e-07, 'epoch': 0.86} 86%|████████▋ | 10621/12313 [7:57:16<1:13:49, 2.62s/it] 86%|████████▋ | 10622/12313 [7:57:19<1:15:50, 2.69s/it] {'loss': 0.5566, 'grad_norm': 4.755644735563533, 'learning_rate': 2.4327497580408285e-07, 'epoch': 0.86} 86%|████████▋ | 10622/12313 [7:57:19<1:15:50, 2.69s/it] 86%|████████▋ | 10623/12313 [7:57:22<1:16:00, 2.70s/it] {'loss': 0.4417, 'grad_norm': 3.8321180175370824, 'learning_rate': 2.4299208467383347e-07, 'epoch': 0.86} 86%|████████▋ | 10623/12313 [7:57:22<1:16:00, 2.70s/it] 86%|████████▋ | 10624/12313 [7:57:25<1:15:50, 2.69s/it] {'loss': 0.4393, 'grad_norm': 4.133812851532381, 'learning_rate': 2.427093497165864e-07, 'epoch': 0.86} 86%|████████▋ | 10624/12313 [7:57:25<1:15:50, 2.69s/it] 86%|████████▋ | 10625/12313 [7:57:27<1:15:34, 2.69s/it] {'loss': 0.5653, 'grad_norm': 4.292765424503046, 'learning_rate': 2.4242677095190623e-07, 'epoch': 0.86} 86%|████████▋ | 10625/12313 [7:57:27<1:15:34, 2.69s/it] 86%|████████▋ | 10626/12313 [7:57:30<1:13:20, 2.61s/it] {'loss': 0.3592, 'grad_norm': 5.735964641093858, 'learning_rate': 2.4214434839934545e-07, 'epoch': 0.86} 86%|████████▋ | 10626/12313 [7:57:30<1:13:20, 2.61s/it] 86%|████████▋ | 10627/12313 [7:57:33<1:15:35, 2.69s/it] {'loss': 0.5062, 'grad_norm': 8.285033867612887, 'learning_rate': 2.418620820784462e-07, 'epoch': 0.86} 86%|████████▋ | 10627/12313 [7:57:33<1:15:35, 2.69s/it] 86%|████████▋ | 10628/12313 [7:57:35<1:15:35, 2.69s/it] {'loss': 0.3959, 'grad_norm': 4.512095996980739, 'learning_rate': 2.4157997200873945e-07, 'epoch': 0.86} 86%|████████▋ | 10628/12313 [7:57:35<1:15:35, 2.69s/it] 86%|████████▋ | 10629/12313 [7:57:38<1:13:50, 2.63s/it] {'loss': 0.4257, 'grad_norm': 5.106558948145591, 'learning_rate': 2.4129801820974604e-07, 'epoch': 0.86} 86%|████████▋ | 10629/12313 [7:57:38<1:13:50, 2.63s/it] 86%|████████▋ | 10630/12313 [7:57:41<1:15:12, 2.68s/it] {'loss': 0.4211, 'grad_norm': 8.542776714887058, 'learning_rate': 2.410162207009761e-07, 'epoch': 0.86} 86%|████████▋ | 10630/12313 [7:57:41<1:15:12, 2.68s/it] 86%|████████▋ | 10631/12313 [7:57:43<1:13:52, 2.63s/it] {'loss': 0.4189, 'grad_norm': 4.654592367436493, 'learning_rate': 2.4073457950192806e-07, 'epoch': 0.86} 86%|████████▋ | 10631/12313 [7:57:43<1:13:52, 2.63s/it] 86%|████████▋ | 10632/12313 [7:57:46<1:11:36, 2.56s/it] {'loss': 0.5268, 'grad_norm': 6.844838480590244, 'learning_rate': 2.404530946320904e-07, 'epoch': 0.86} 86%|████████▋ | 10632/12313 [7:57:46<1:11:36, 2.56s/it] 86%|████████▋ | 10633/12313 [7:57:49<1:15:15, 2.69s/it] {'loss': 0.5319, 'grad_norm': 5.39856003020653, 'learning_rate': 2.401717661109401e-07, 'epoch': 0.86} 86%|████████▋ | 10633/12313 [7:57:49<1:15:15, 2.69s/it] 86%|████████▋ | 10634/12313 [7:57:51<1:14:03, 2.65s/it] {'loss': 0.4763, 'grad_norm': 6.044733492723436, 'learning_rate': 2.398905939579432e-07, 'epoch': 0.86} 86%|████████▋ | 10634/12313 [7:57:51<1:14:03, 2.65s/it] 86%|████████▋ | 10635/12313 [7:57:54<1:13:40, 2.63s/it] {'loss': 0.4624, 'grad_norm': 3.216826620946339, 'learning_rate': 2.396095781925556e-07, 'epoch': 0.86} 86%|████████▋ | 10635/12313 [7:57:54<1:13:40, 2.63s/it] 86%|████████▋ | 10636/12313 [7:57:56<1:14:30, 2.67s/it] {'loss': 0.3858, 'grad_norm': 4.322010048301707, 'learning_rate': 2.3932871883422276e-07, 'epoch': 0.86} 86%|████████▋ | 10636/12313 [7:57:56<1:14:30, 2.67s/it] 86%|████████▋ | 10637/12313 [7:57:59<1:14:48, 2.68s/it] {'loss': 0.3858, 'grad_norm': 6.581640577707594, 'learning_rate': 2.3904801590237783e-07, 'epoch': 0.86} 86%|████████▋ | 10637/12313 [7:57:59<1:14:48, 2.68s/it] 86%|████████▋ | 10638/12313 [7:58:02<1:14:44, 2.68s/it] {'loss': 0.547, 'grad_norm': 5.430902885102804, 'learning_rate': 2.3876746941644464e-07, 'epoch': 0.86} 86%|████████▋ | 10638/12313 [7:58:02<1:14:44, 2.68s/it] 86%|████████▋ | 10639/12313 [7:58:04<1:14:33, 2.67s/it] {'loss': 0.4549, 'grad_norm': 9.539650370477867, 'learning_rate': 2.384870793958349e-07, 'epoch': 0.86} 86%|████████▋ | 10639/12313 [7:58:04<1:14:33, 2.67s/it] 86%|████████▋ | 10640/12313 [7:58:07<1:12:52, 2.61s/it] {'loss': 0.5427, 'grad_norm': 6.254836784455941, 'learning_rate': 2.3820684585995012e-07, 'epoch': 0.86} 86%|████████▋ | 10640/12313 [7:58:07<1:12:52, 2.61s/it] 86%|████████▋ | 10641/12313 [7:58:10<1:12:26, 2.60s/it] {'loss': 0.394, 'grad_norm': 7.936489779390335, 'learning_rate': 2.379267688281814e-07, 'epoch': 0.86} 86%|████████▋ | 10641/12313 [7:58:10<1:12:26, 2.60s/it] 86%|████████▋ | 10642/12313 [7:58:12<1:11:06, 2.55s/it] {'loss': 0.5883, 'grad_norm': 4.9403538410304, 'learning_rate': 2.3764684831990874e-07, 'epoch': 0.86} 86%|████████▋ | 10642/12313 [7:58:12<1:11:06, 2.55s/it] 86%|████████▋ | 10643/12313 [7:58:15<1:11:27, 2.57s/it] {'loss': 0.4907, 'grad_norm': 5.718951894452898, 'learning_rate': 2.3736708435450033e-07, 'epoch': 0.86} 86%|████████▋ | 10643/12313 [7:58:15<1:11:27, 2.57s/it] 86%|████████▋ | 10644/12313 [7:58:17<1:12:36, 2.61s/it] {'loss': 0.4677, 'grad_norm': 4.730922288439977, 'learning_rate': 2.370874769513154e-07, 'epoch': 0.86} 86%|████████▋ | 10644/12313 [7:58:17<1:12:36, 2.61s/it] 86%|████████▋ | 10645/12313 [7:58:20<1:11:44, 2.58s/it] {'loss': 0.4901, 'grad_norm': 7.22516165081707, 'learning_rate': 2.3680802612970068e-07, 'epoch': 0.86} 86%|████████▋ | 10645/12313 [7:58:20<1:11:44, 2.58s/it] 86%|████████▋ | 10646/12313 [7:58:22<1:11:52, 2.59s/it] {'loss': 0.4482, 'grad_norm': 5.535751530022896, 'learning_rate': 2.365287319089929e-07, 'epoch': 0.86} 86%|████████▋ | 10646/12313 [7:58:22<1:11:52, 2.59s/it] 86%|████████▋ | 10647/12313 [7:58:25<1:14:24, 2.68s/it] {'loss': 0.3961, 'grad_norm': 4.330430490684841, 'learning_rate': 2.362495943085172e-07, 'epoch': 0.86} 86%|████████▋ | 10647/12313 [7:58:25<1:14:24, 2.68s/it] 86%|████████▋ | 10648/12313 [7:58:28<1:16:54, 2.77s/it] {'loss': 0.4269, 'grad_norm': 5.262984242628663, 'learning_rate': 2.3597061334758864e-07, 'epoch': 0.86} 86%|████████▋ | 10648/12313 [7:58:28<1:16:54, 2.77s/it] 86%|████████▋ | 10649/12313 [7:58:31<1:16:26, 2.76s/it] {'loss': 0.4865, 'grad_norm': 3.5730441803712014, 'learning_rate': 2.3569178904551181e-07, 'epoch': 0.86} 86%|████████▋ | 10649/12313 [7:58:31<1:16:26, 2.76s/it] 86%|████████▋ | 10650/12313 [7:58:34<1:15:23, 2.72s/it] {'loss': 0.62, 'grad_norm': 4.35982646863664, 'learning_rate': 2.3541312142157934e-07, 'epoch': 0.86} 86%|████████▋ | 10650/12313 [7:58:34<1:15:23, 2.72s/it] 87%|████████▋ | 10651/12313 [7:58:36<1:14:42, 2.70s/it] {'loss': 0.309, 'grad_norm': 5.501060251786152, 'learning_rate': 2.3513461049507385e-07, 'epoch': 0.87} 87%|████████▋ | 10651/12313 [7:58:36<1:14:42, 2.70s/it] 87%|████████▋ | 10652/12313 [7:58:39<1:13:23, 2.65s/it] {'loss': 0.3867, 'grad_norm': 18.5509951879782, 'learning_rate': 2.3485625628526688e-07, 'epoch': 0.87} 87%|████████▋ | 10652/12313 [7:58:39<1:13:23, 2.65s/it] 87%|████████▋ | 10653/12313 [7:58:42<1:15:11, 2.72s/it] {'loss': 0.4658, 'grad_norm': 4.778384806825011, 'learning_rate': 2.3457805881141854e-07, 'epoch': 0.87} 87%|████████▋ | 10653/12313 [7:58:42<1:15:11, 2.72s/it] 87%|████████▋ | 10654/12313 [7:58:44<1:14:40, 2.70s/it] {'loss': 0.483, 'grad_norm': 7.32649293368322, 'learning_rate': 2.3430001809277873e-07, 'epoch': 0.87} 87%|████████▋ | 10654/12313 [7:58:44<1:14:40, 2.70s/it] 87%|████████▋ | 10655/12313 [7:58:47<1:14:14, 2.69s/it] {'loss': 0.435, 'grad_norm': 6.199897652234001, 'learning_rate': 2.340221341485871e-07, 'epoch': 0.87} 87%|████████▋ | 10655/12313 [7:58:47<1:14:14, 2.69s/it] 87%|████████▋ | 10656/12313 [7:58:50<1:13:22, 2.66s/it] {'loss': 0.5084, 'grad_norm': 4.754389895318495, 'learning_rate': 2.3374440699807072e-07, 'epoch': 0.87} 87%|████████▋ | 10656/12313 [7:58:50<1:13:22, 2.66s/it] 87%|████████▋ | 10657/12313 [7:58:52<1:12:21, 2.62s/it] {'loss': 0.5881, 'grad_norm': 4.649740673308111, 'learning_rate': 2.334668366604481e-07, 'epoch': 0.87} 87%|████████▋ | 10657/12313 [7:58:52<1:12:21, 2.62s/it] 87%|████████▋ | 10658/12313 [7:58:55<1:10:51, 2.57s/it] {'loss': 0.38, 'grad_norm': 6.638508068101479, 'learning_rate': 2.3318942315492477e-07, 'epoch': 0.87} 87%|████████▋ | 10658/12313 [7:58:55<1:10:51, 2.57s/it] 87%|████████▋ | 10659/12313 [7:58:57<1:11:51, 2.61s/it] {'loss': 0.4626, 'grad_norm': 3.952860937759617, 'learning_rate': 2.3291216650069587e-07, 'epoch': 0.87} 87%|████████▋ | 10659/12313 [7:58:57<1:11:51, 2.61s/it] 87%|████████▋ | 10660/12313 [7:59:00<1:11:03, 2.58s/it] {'loss': 0.4982, 'grad_norm': 6.6956489482132, 'learning_rate': 2.3263506671694747e-07, 'epoch': 0.87} 87%|████████▋ | 10660/12313 [7:59:00<1:11:03, 2.58s/it] 87%|████████▋ | 10661/12313 [7:59:02<1:11:58, 2.61s/it] {'loss': 0.4391, 'grad_norm': 3.478866841645943, 'learning_rate': 2.323581238228517e-07, 'epoch': 0.87} 87%|████████▋ | 10661/12313 [7:59:02<1:11:58, 2.61s/it] 87%|████████▋ | 10662/12313 [7:59:05<1:11:31, 2.60s/it] {'loss': 0.5142, 'grad_norm': 4.525751483245147, 'learning_rate': 2.3208133783757302e-07, 'epoch': 0.87} 87%|████████▋ | 10662/12313 [7:59:05<1:11:31, 2.60s/it] 87%|████████▋ | 10663/12313 [7:59:08<1:11:00, 2.58s/it] {'loss': 0.5449, 'grad_norm': 4.621330522104981, 'learning_rate': 2.3180470878026275e-07, 'epoch': 0.87} 87%|████████▋ | 10663/12313 [7:59:08<1:11:00, 2.58s/it] 87%|████████▋ | 10664/12313 [7:59:10<1:10:13, 2.56s/it] {'loss': 0.4468, 'grad_norm': 4.973654556392699, 'learning_rate': 2.3152823667006248e-07, 'epoch': 0.87} 87%|████████▋ | 10664/12313 [7:59:10<1:10:13, 2.56s/it] 87%|████████▋ | 10665/12313 [7:59:13<1:11:02, 2.59s/it] {'loss': 0.633, 'grad_norm': 5.396466731566874, 'learning_rate': 2.3125192152610277e-07, 'epoch': 0.87} 87%|████████▋ | 10665/12313 [7:59:13<1:11:02, 2.59s/it] 87%|████████▋ | 10666/12313 [7:59:15<1:11:43, 2.61s/it] {'loss': 0.4153, 'grad_norm': 4.918829274592111, 'learning_rate': 2.3097576336750248e-07, 'epoch': 0.87} 87%|████████▋ | 10666/12313 [7:59:15<1:11:43, 2.61s/it] 87%|████████▋ | 10667/12313 [7:59:18<1:09:44, 2.54s/it] {'loss': 0.7968, 'grad_norm': 6.0092199166872, 'learning_rate': 2.3069976221337054e-07, 'epoch': 0.87} 87%|████████▋ | 10667/12313 [7:59:18<1:09:44, 2.54s/it] 87%|████████▋ | 10668/12313 [7:59:20<1:10:07, 2.56s/it] {'loss': 0.6541, 'grad_norm': 13.224667725694793, 'learning_rate': 2.304239180828055e-07, 'epoch': 0.87} 87%|████████▋ | 10668/12313 [7:59:20<1:10:07, 2.56s/it] 87%|████████▋ | 10669/12313 [7:59:23<1:11:20, 2.60s/it] {'loss': 0.4286, 'grad_norm': 4.332528390591684, 'learning_rate': 2.3014823099489326e-07, 'epoch': 0.87} 87%|████████▋ | 10669/12313 [7:59:23<1:11:20, 2.60s/it] 87%|████████▋ | 10670/12313 [7:59:26<1:12:07, 2.63s/it] {'loss': 0.457, 'grad_norm': 8.678328469216439, 'learning_rate': 2.2987270096871072e-07, 'epoch': 0.87} 87%|████████▋ | 10670/12313 [7:59:26<1:12:07, 2.63s/it] 87%|████████▋ | 10671/12313 [7:59:29<1:14:47, 2.73s/it] {'loss': 0.3446, 'grad_norm': 6.543536578014087, 'learning_rate': 2.2959732802332296e-07, 'epoch': 0.87} 87%|████████▋ | 10671/12313 [7:59:29<1:14:47, 2.73s/it] 87%|████████▋ | 10672/12313 [7:59:31<1:14:50, 2.74s/it] {'loss': 0.6514, 'grad_norm': 5.10238287168412, 'learning_rate': 2.2932211217778388e-07, 'epoch': 0.87} 87%|████████▋ | 10672/12313 [7:59:32<1:14:50, 2.74s/it] 87%|████████▋ | 10673/12313 [7:59:34<1:13:54, 2.70s/it] {'loss': 0.6377, 'grad_norm': 4.902163867519704, 'learning_rate': 2.2904705345113743e-07, 'epoch': 0.87} 87%|████████▋ | 10673/12313 [7:59:34<1:13:54, 2.70s/it] 87%|████████▋ | 10674/12313 [7:59:37<1:15:00, 2.75s/it] {'loss': 0.5384, 'grad_norm': 4.914208356693128, 'learning_rate': 2.287721518624156e-07, 'epoch': 0.87} 87%|████████▋ | 10674/12313 [7:59:37<1:15:00, 2.75s/it] 87%|████████▋ | 10675/12313 [7:59:40<1:15:30, 2.77s/it] {'loss': 0.3763, 'grad_norm': 4.695162916684552, 'learning_rate': 2.2849740743064063e-07, 'epoch': 0.87} 87%|████████▋ | 10675/12313 [7:59:40<1:15:30, 2.77s/it] 87%|████████▋ | 10676/12313 [7:59:42<1:14:53, 2.75s/it] {'loss': 0.4784, 'grad_norm': 5.516447784018673, 'learning_rate': 2.282228201748238e-07, 'epoch': 0.87} 87%|████████▋ | 10676/12313 [7:59:42<1:14:53, 2.75s/it] 87%|████████▋ | 10677/12313 [7:59:45<1:14:06, 2.72s/it] {'loss': 0.3715, 'grad_norm': 4.784027348664785, 'learning_rate': 2.2794839011396453e-07, 'epoch': 0.87} 87%|████████▋ | 10677/12313 [7:59:45<1:14:06, 2.72s/it] 87%|████████▋ | 10678/12313 [7:59:48<1:14:01, 2.72s/it] {'loss': 0.66, 'grad_norm': 4.524507243360301, 'learning_rate': 2.2767411726705157e-07, 'epoch': 0.87} 87%|████████▋ | 10678/12313 [7:59:48<1:14:01, 2.72s/it] 87%|████████▋ | 10679/12313 [7:59:50<1:12:20, 2.66s/it] {'loss': 0.4065, 'grad_norm': 6.2607763304203194, 'learning_rate': 2.2740000165306393e-07, 'epoch': 0.87} 87%|████████▋ | 10679/12313 [7:59:50<1:12:20, 2.66s/it] 87%|████████▋ | 10680/12313 [7:59:53<1:14:18, 2.73s/it] {'loss': 0.5403, 'grad_norm': 6.045389943291753, 'learning_rate': 2.2712604329096833e-07, 'epoch': 0.87} 87%|████████▋ | 10680/12313 [7:59:53<1:14:18, 2.73s/it] 87%|████████▋ | 10681/12313 [7:59:56<1:12:54, 2.68s/it] {'loss': 0.6084, 'grad_norm': 4.945546166899713, 'learning_rate': 2.2685224219972185e-07, 'epoch': 0.87} 87%|████████▋ | 10681/12313 [7:59:56<1:12:54, 2.68s/it] 87%|████████▋ | 10682/12313 [7:59:59<1:15:16, 2.77s/it] {'loss': 0.4117, 'grad_norm': 7.020025507043163, 'learning_rate': 2.2657859839826934e-07, 'epoch': 0.87} 87%|████████▋ | 10682/12313 [7:59:59<1:15:16, 2.77s/it] 87%|████████▋ | 10683/12313 [8:00:01<1:12:41, 2.68s/it] {'loss': 0.3219, 'grad_norm': 3.378493580413376, 'learning_rate': 2.2630511190554621e-07, 'epoch': 0.87} 87%|████████▋ | 10683/12313 [8:00:01<1:12:41, 2.68s/it] 87%|████████▋ | 10684/12313 [8:00:04<1:12:53, 2.69s/it] {'loss': 0.3889, 'grad_norm': 14.896543415086516, 'learning_rate': 2.260317827404762e-07, 'epoch': 0.87} 87%|████████▋ | 10684/12313 [8:00:04<1:12:53, 2.69s/it] 87%|████████▋ | 10685/12313 [8:00:07<1:12:11, 2.66s/it] {'loss': 0.6512, 'grad_norm': 4.531796982257481, 'learning_rate': 2.2575861092197143e-07, 'epoch': 0.87} 87%|████████▋ | 10685/12313 [8:00:07<1:12:11, 2.66s/it] 87%|████████▋ | 10686/12313 [8:00:09<1:10:39, 2.61s/it] {'loss': 0.5622, 'grad_norm': 7.006910779412939, 'learning_rate': 2.254855964689351e-07, 'epoch': 0.87} 87%|████████▋ | 10686/12313 [8:00:09<1:10:39, 2.61s/it] 87%|████████▋ | 10687/12313 [8:00:12<1:11:59, 2.66s/it] {'loss': 0.5571, 'grad_norm': 5.778562835427389, 'learning_rate': 2.2521273940025705e-07, 'epoch': 0.87} 87%|████████▋ | 10687/12313 [8:00:12<1:11:59, 2.66s/it] 87%|████████▋ | 10688/12313 [8:00:15<1:14:25, 2.75s/it] {'loss': 0.481, 'grad_norm': 5.550998523968033, 'learning_rate': 2.2494003973481864e-07, 'epoch': 0.87} 87%|████████▋ | 10688/12313 [8:00:15<1:14:25, 2.75s/it] 87%|████████▋ | 10689/12313 [8:00:18<1:15:22, 2.78s/it] {'loss': 0.4879, 'grad_norm': 5.8427464651275836, 'learning_rate': 2.2466749749148919e-07, 'epoch': 0.87} 87%|████████▋ | 10689/12313 [8:00:18<1:15:22, 2.78s/it] 87%|████████▋ | 10690/12313 [8:00:20<1:13:55, 2.73s/it] {'loss': 0.3003, 'grad_norm': 4.136202285236531, 'learning_rate': 2.2439511268912666e-07, 'epoch': 0.87} 87%|████████▋ | 10690/12313 [8:00:20<1:13:55, 2.73s/it] 87%|████████▋ | 10691/12313 [8:00:23<1:11:20, 2.64s/it] {'loss': 0.556, 'grad_norm': 7.410829723701502, 'learning_rate': 2.2412288534657878e-07, 'epoch': 0.87} 87%|████████▋ | 10691/12313 [8:00:23<1:11:20, 2.64s/it] 87%|████████▋ | 10692/12313 [8:00:25<1:11:41, 2.65s/it] {'loss': 0.5302, 'grad_norm': 5.668425487445851, 'learning_rate': 2.2385081548268268e-07, 'epoch': 0.87} 87%|████████▋ | 10692/12313 [8:00:25<1:11:41, 2.65s/it] 87%|████████▋ | 10693/12313 [8:00:28<1:09:33, 2.58s/it] {'loss': 0.4406, 'grad_norm': 5.0755641679869505, 'learning_rate': 2.2357890311626328e-07, 'epoch': 0.87} 87%|████████▋ | 10693/12313 [8:00:28<1:09:33, 2.58s/it] 87%|████████▋ | 10694/12313 [8:00:31<1:11:26, 2.65s/it] {'loss': 0.3976, 'grad_norm': 4.6801666968241795, 'learning_rate': 2.2330714826613586e-07, 'epoch': 0.87} 87%|████████▋ | 10694/12313 [8:00:31<1:11:26, 2.65s/it] 87%|████████▋ | 10695/12313 [8:00:33<1:11:27, 2.65s/it] {'loss': 0.4059, 'grad_norm': 5.777100294603406, 'learning_rate': 2.2303555095110507e-07, 'epoch': 0.87} 87%|████████▋ | 10695/12313 [8:00:33<1:11:27, 2.65s/it] 87%|████████▋ | 10696/12313 [8:00:36<1:10:37, 2.62s/it] {'loss': 0.4719, 'grad_norm': 5.314872055548993, 'learning_rate': 2.2276411118996366e-07, 'epoch': 0.87} 87%|████████▋ | 10696/12313 [8:00:36<1:10:37, 2.62s/it] 87%|████████▋ | 10697/12313 [8:00:38<1:09:19, 2.57s/it] {'loss': 0.4898, 'grad_norm': 4.410426094256012, 'learning_rate': 2.22492829001493e-07, 'epoch': 0.87} 87%|████████▋ | 10697/12313 [8:00:38<1:09:19, 2.57s/it] 87%|████████▋ | 10698/12313 [8:00:41<1:11:22, 2.65s/it] {'loss': 0.4248, 'grad_norm': 4.7733912494094755, 'learning_rate': 2.2222170440446557e-07, 'epoch': 0.87} 87%|████████▋ | 10698/12313 [8:00:41<1:11:22, 2.65s/it] 87%|████████▋ | 10699/12313 [8:00:44<1:12:46, 2.71s/it] {'loss': 0.4269, 'grad_norm': 6.898790830573995, 'learning_rate': 2.219507374176408e-07, 'epoch': 0.87} 87%|████████▋ | 10699/12313 [8:00:44<1:12:46, 2.71s/it] 87%|████████▋ | 10700/12313 [8:00:47<1:18:29, 2.92s/it] {'loss': 0.5232, 'grad_norm': 5.2231893468594635, 'learning_rate': 2.2167992805976896e-07, 'epoch': 0.87} 87%|████████▋ | 10700/12313 [8:00:47<1:18:29, 2.92s/it] 87%|████████▋ | 10701/12313 [8:00:50<1:19:24, 2.96s/it] {'loss': 0.3991, 'grad_norm': 7.520502969621975, 'learning_rate': 2.2140927634958788e-07, 'epoch': 0.87} 87%|████████▋ | 10701/12313 [8:00:50<1:19:24, 2.96s/it] 87%|████████▋ | 10702/12313 [8:00:53<1:18:56, 2.94s/it] {'loss': 0.5095, 'grad_norm': 4.652004187169794, 'learning_rate': 2.2113878230582615e-07, 'epoch': 0.87} 87%|████████▋ | 10702/12313 [8:00:53<1:18:56, 2.94s/it] 87%|████████▋ | 10703/12313 [8:00:56<1:16:49, 2.86s/it] {'loss': 0.5209, 'grad_norm': 3.840996235904483, 'learning_rate': 2.2086844594719993e-07, 'epoch': 0.87} 87%|████████▋ | 10703/12313 [8:00:56<1:16:49, 2.86s/it] 87%|████████▋ | 10704/12313 [8:00:59<1:17:15, 2.88s/it] {'loss': 0.4342, 'grad_norm': 5.428257976832266, 'learning_rate': 2.205982672924145e-07, 'epoch': 0.87} 87%|████████▋ | 10704/12313 [8:00:59<1:17:15, 2.88s/it] 87%|████████▋ | 10705/12313 [8:01:02<1:15:44, 2.83s/it] {'loss': 0.5407, 'grad_norm': 12.646126359342446, 'learning_rate': 2.203282463601661e-07, 'epoch': 0.87} 87%|████████▋ | 10705/12313 [8:01:02<1:15:44, 2.83s/it] 87%|████████▋ | 10706/12313 [8:01:04<1:13:57, 2.76s/it] {'loss': 0.3985, 'grad_norm': 8.151893101320535, 'learning_rate': 2.2005838316913746e-07, 'epoch': 0.87} 87%|████████▋ | 10706/12313 [8:01:04<1:13:57, 2.76s/it] 87%|████████▋ | 10707/12313 [8:01:07<1:10:46, 2.64s/it] {'loss': 0.6897, 'grad_norm': 3.6426553923017337, 'learning_rate': 2.1978867773800205e-07, 'epoch': 0.87} 87%|████████▋ | 10707/12313 [8:01:07<1:10:46, 2.64s/it] 87%|████████▋ | 10708/12313 [8:01:09<1:09:22, 2.59s/it] {'loss': 0.318, 'grad_norm': 5.392872165186957, 'learning_rate': 2.1951913008542297e-07, 'epoch': 0.87} 87%|████████▋ | 10708/12313 [8:01:09<1:09:22, 2.59s/it] 87%|████████▋ | 10709/12313 [8:01:12<1:09:19, 2.59s/it] {'loss': 0.396, 'grad_norm': 6.652109485767294, 'learning_rate': 2.1924974023005086e-07, 'epoch': 0.87} 87%|████████▋ | 10709/12313 [8:01:12<1:09:19, 2.59s/it] 87%|████████▋ | 10710/12313 [8:01:14<1:09:43, 2.61s/it] {'loss': 0.3915, 'grad_norm': 6.329696488288848, 'learning_rate': 2.189805081905255e-07, 'epoch': 0.87} 87%|████████▋ | 10710/12313 [8:01:14<1:09:43, 2.61s/it] 87%|████████▋ | 10711/12313 [8:01:17<1:10:26, 2.64s/it] {'loss': 0.5392, 'grad_norm': 3.238598078513282, 'learning_rate': 2.1871143398547735e-07, 'epoch': 0.87} 87%|████████▋ | 10711/12313 [8:01:17<1:10:26, 2.64s/it] 87%|████████▋ | 10712/12313 [8:01:20<1:10:45, 2.65s/it] {'loss': 0.535, 'grad_norm': 6.202895386014103, 'learning_rate': 2.184425176335239e-07, 'epoch': 0.87} 87%|████████▋ | 10712/12313 [8:01:20<1:10:45, 2.65s/it] 87%|████████▋ | 10713/12313 [8:01:22<1:08:51, 2.58s/it] {'loss': 0.3804, 'grad_norm': 7.1313443357236075, 'learning_rate': 2.1817375915327342e-07, 'epoch': 0.87} 87%|████████▋ | 10713/12313 [8:01:22<1:08:51, 2.58s/it] 87%|████████▋ | 10714/12313 [8:01:25<1:07:23, 2.53s/it] {'loss': 0.485, 'grad_norm': 4.686543627354921, 'learning_rate': 2.1790515856332268e-07, 'epoch': 0.87} 87%|████████▋ | 10714/12313 [8:01:25<1:07:23, 2.53s/it] 87%|████████▋ | 10715/12313 [8:01:27<1:07:35, 2.54s/it] {'loss': 0.4805, 'grad_norm': 4.348632743659829, 'learning_rate': 2.1763671588225705e-07, 'epoch': 0.87} 87%|████████▋ | 10715/12313 [8:01:27<1:07:35, 2.54s/it] 87%|████████▋ | 10716/12313 [8:01:30<1:07:00, 2.52s/it] {'loss': 0.4082, 'grad_norm': 4.845358239649171, 'learning_rate': 2.173684311286517e-07, 'epoch': 0.87} 87%|████████▋ | 10716/12313 [8:01:30<1:07:00, 2.52s/it] 87%|████████▋ | 10717/12313 [8:01:32<1:07:01, 2.52s/it] {'loss': 0.5136, 'grad_norm': 5.562307404701293, 'learning_rate': 2.1710030432106982e-07, 'epoch': 0.87} 87%|████████▋ | 10717/12313 [8:01:32<1:07:01, 2.52s/it] 87%|████████▋ | 10718/12313 [8:01:35<1:08:45, 2.59s/it] {'loss': 0.4594, 'grad_norm': 15.50221360580663, 'learning_rate': 2.1683233547806494e-07, 'epoch': 0.87} 87%|████████▋ | 10718/12313 [8:01:35<1:08:45, 2.59s/it] 87%|████████▋ | 10719/12313 [8:01:38<1:09:46, 2.63s/it] {'loss': 0.5134, 'grad_norm': 8.184235968979692, 'learning_rate': 2.1656452461817883e-07, 'epoch': 0.87} 87%|████████▋ | 10719/12313 [8:01:38<1:09:46, 2.63s/it] 87%|████████▋ | 10720/12313 [8:01:40<1:09:19, 2.61s/it] {'loss': 0.6206, 'grad_norm': 5.086338805312825, 'learning_rate': 2.162968717599423e-07, 'epoch': 0.87} 87%|████████▋ | 10720/12313 [8:01:40<1:09:19, 2.61s/it] 87%|████████▋ | 10721/12313 [8:01:43<1:12:02, 2.72s/it] {'loss': 0.3373, 'grad_norm': 6.866907411125338, 'learning_rate': 2.1602937692187685e-07, 'epoch': 0.87} 87%|████████▋ | 10721/12313 [8:01:43<1:12:02, 2.72s/it] 87%|████████▋ | 10722/12313 [8:01:46<1:12:14, 2.72s/it] {'loss': 0.4828, 'grad_norm': 6.472579080212052, 'learning_rate': 2.1576204012249053e-07, 'epoch': 0.87} 87%|████████▋ | 10722/12313 [8:01:46<1:12:14, 2.72s/it] 87%|████████▋ | 10723/12313 [8:01:48<1:12:02, 2.72s/it] {'loss': 0.537, 'grad_norm': 6.9868588515529915, 'learning_rate': 2.1549486138028125e-07, 'epoch': 0.87} 87%|████████▋ | 10723/12313 [8:01:49<1:12:02, 2.72s/it] 87%|████████▋ | 10724/12313 [8:01:51<1:13:27, 2.77s/it] {'loss': 0.4446, 'grad_norm': 4.648588890106703, 'learning_rate': 2.152278407137376e-07, 'epoch': 0.87} 87%|████████▋ | 10724/12313 [8:01:51<1:13:27, 2.77s/it] 87%|████████▋ | 10725/12313 [8:01:54<1:13:09, 2.76s/it] {'loss': 0.4319, 'grad_norm': 5.2378357215552915, 'learning_rate': 2.1496097814133503e-07, 'epoch': 0.87} 87%|████████▋ | 10725/12313 [8:01:54<1:13:09, 2.76s/it] 87%|████████▋ | 10726/12313 [8:01:57<1:11:46, 2.71s/it] {'loss': 0.3548, 'grad_norm': 4.349980838222139, 'learning_rate': 2.146942736815391e-07, 'epoch': 0.87} 87%|████████▋ | 10726/12313 [8:01:57<1:11:46, 2.71s/it] 87%|████████▋ | 10727/12313 [8:01:59<1:10:23, 2.66s/it] {'loss': 0.574, 'grad_norm': 3.1974318652567026, 'learning_rate': 2.1442772735280532e-07, 'epoch': 0.87} 87%|████████▋ | 10727/12313 [8:01:59<1:10:23, 2.66s/it] 87%|████████▋ | 10728/12313 [8:02:02<1:10:13, 2.66s/it] {'loss': 0.5418, 'grad_norm': 6.948883443858628, 'learning_rate': 2.1416133917357668e-07, 'epoch': 0.87} 87%|████████▋ | 10728/12313 [8:02:02<1:10:13, 2.66s/it] 87%|████████▋ | 10729/12313 [8:02:05<1:10:38, 2.68s/it] {'loss': 0.5892, 'grad_norm': 7.741555115120562, 'learning_rate': 2.1389510916228513e-07, 'epoch': 0.87} 87%|████████▋ | 10729/12313 [8:02:05<1:10:38, 2.68s/it] 87%|████████▋ | 10730/12313 [8:02:07<1:08:57, 2.61s/it] {'loss': 0.6456, 'grad_norm': 4.815397769745156, 'learning_rate': 2.136290373373534e-07, 'epoch': 0.87} 87%|████████▋ | 10730/12313 [8:02:07<1:08:57, 2.61s/it] 87%|████████▋ | 10731/12313 [8:02:10<1:10:40, 2.68s/it] {'loss': 0.5199, 'grad_norm': 4.733327642778111, 'learning_rate': 2.1336312371719182e-07, 'epoch': 0.87} 87%|████████▋ | 10731/12313 [8:02:10<1:10:40, 2.68s/it] 87%|████████▋ | 10732/12313 [8:02:13<1:10:44, 2.68s/it] {'loss': 0.489, 'grad_norm': 6.863949439219225, 'learning_rate': 2.130973683201998e-07, 'epoch': 0.87} 87%|████████▋ | 10732/12313 [8:02:13<1:10:44, 2.68s/it] 87%|████████▋ | 10733/12313 [8:02:15<1:08:24, 2.60s/it] {'loss': 0.4344, 'grad_norm': 6.500756310540661, 'learning_rate': 2.128317711647665e-07, 'epoch': 0.87} 87%|████████▋ | 10733/12313 [8:02:15<1:08:24, 2.60s/it] 87%|████████▋ | 10734/12313 [8:02:18<1:09:27, 2.64s/it] {'loss': 0.5088, 'grad_norm': 7.69706976013433, 'learning_rate': 2.125663322692706e-07, 'epoch': 0.87} 87%|████████▋ | 10734/12313 [8:02:18<1:09:27, 2.64s/it] 87%|████████▋ | 10735/12313 [8:02:20<1:09:04, 2.63s/it] {'loss': 0.6559, 'grad_norm': 6.331349228694531, 'learning_rate': 2.1230105165207848e-07, 'epoch': 0.87} 87%|████████▋ | 10735/12313 [8:02:20<1:09:04, 2.63s/it] 87%|████████▋ | 10736/12313 [8:02:23<1:11:04, 2.70s/it] {'loss': 0.6072, 'grad_norm': 5.041214163229692, 'learning_rate': 2.120359293315455e-07, 'epoch': 0.87} 87%|████████▋ | 10736/12313 [8:02:23<1:11:04, 2.70s/it] 87%|████████▋ | 10737/12313 [8:02:26<1:09:21, 2.64s/it] {'loss': 0.4588, 'grad_norm': 5.582839372557744, 'learning_rate': 2.1177096532601777e-07, 'epoch': 0.87} 87%|████████▋ | 10737/12313 [8:02:26<1:09:21, 2.64s/it] 87%|████████▋ | 10738/12313 [8:02:28<1:09:35, 2.65s/it] {'loss': 0.4815, 'grad_norm': 7.814578266112515, 'learning_rate': 2.115061596538287e-07, 'epoch': 0.87} 87%|████████▋ | 10738/12313 [8:02:28<1:09:35, 2.65s/it] 87%|████████▋ | 10739/12313 [8:02:31<1:10:16, 2.68s/it] {'loss': 0.5404, 'grad_norm': 5.583906428631362, 'learning_rate': 2.112415123333014e-07, 'epoch': 0.87} 87%|████████▋ | 10739/12313 [8:02:31<1:10:16, 2.68s/it] 87%|████████▋ | 10740/12313 [8:02:34<1:09:52, 2.67s/it] {'loss': 0.4854, 'grad_norm': 7.942670374026383, 'learning_rate': 2.1097702338274907e-07, 'epoch': 0.87} 87%|████████▋ | 10740/12313 [8:02:34<1:09:52, 2.67s/it] 87%|████████▋ | 10741/12313 [8:02:36<1:08:58, 2.63s/it] {'loss': 0.6385, 'grad_norm': 4.655699223151669, 'learning_rate': 2.1071269282047196e-07, 'epoch': 0.87} 87%|████████▋ | 10741/12313 [8:02:36<1:08:58, 2.63s/it] 87%|████████▋ | 10742/12313 [8:02:39<1:07:33, 2.58s/it] {'loss': 0.5986, 'grad_norm': 5.347652457276179, 'learning_rate': 2.1044852066476052e-07, 'epoch': 0.87} 87%|████████▋ | 10742/12313 [8:02:39<1:07:33, 2.58s/it] 87%|████████▋ | 10743/12313 [8:02:42<1:09:42, 2.66s/it] {'loss': 0.4435, 'grad_norm': 6.78105342737064, 'learning_rate': 2.1018450693389452e-07, 'epoch': 0.87} 87%|████████▋ | 10743/12313 [8:02:42<1:09:42, 2.66s/it] 87%|████████▋ | 10744/12313 [8:02:44<1:09:37, 2.66s/it] {'loss': 0.5274, 'grad_norm': 4.350453860498677, 'learning_rate': 2.099206516461419e-07, 'epoch': 0.87} 87%|████████▋ | 10744/12313 [8:02:44<1:09:37, 2.66s/it] 87%|████████▋ | 10745/12313 [8:02:47<1:09:23, 2.66s/it] {'loss': 0.551, 'grad_norm': 4.768332070430383, 'learning_rate': 2.096569548197594e-07, 'epoch': 0.87} 87%|████████▋ | 10745/12313 [8:02:47<1:09:23, 2.66s/it] 87%|████████▋ | 10746/12313 [8:02:49<1:07:53, 2.60s/it] {'loss': 0.4384, 'grad_norm': 6.078372849255497, 'learning_rate': 2.0939341647299437e-07, 'epoch': 0.87} 87%|████████▋ | 10746/12313 [8:02:49<1:07:53, 2.60s/it] 87%|████████▋ | 10747/12313 [8:02:52<1:08:48, 2.64s/it] {'loss': 0.6075, 'grad_norm': 11.608495384238454, 'learning_rate': 2.0913003662408254e-07, 'epoch': 0.87} 87%|████████▋ | 10747/12313 [8:02:52<1:08:48, 2.64s/it] 87%|████████▋ | 10748/12313 [8:02:55<1:09:01, 2.65s/it] {'loss': 0.4888, 'grad_norm': 8.027806223001297, 'learning_rate': 2.0886681529124765e-07, 'epoch': 0.87} 87%|████████▋ | 10748/12313 [8:02:55<1:09:01, 2.65s/it] 87%|████████▋ | 10749/12313 [8:02:57<1:08:51, 2.64s/it] {'loss': 0.5281, 'grad_norm': 4.082351604317076, 'learning_rate': 2.086037524927037e-07, 'epoch': 0.87} 87%|████████▋ | 10749/12313 [8:02:57<1:08:51, 2.64s/it] 87%|████████▋ | 10750/12313 [8:03:00<1:10:25, 2.70s/it] {'loss': 0.5854, 'grad_norm': 5.82956001643256, 'learning_rate': 2.0834084824665314e-07, 'epoch': 0.87} 87%|████████▋ | 10750/12313 [8:03:00<1:10:25, 2.70s/it] 87%|████████▋ | 10751/12313 [8:03:03<1:09:57, 2.69s/it] {'loss': 0.4345, 'grad_norm': 9.953252455414237, 'learning_rate': 2.0807810257128692e-07, 'epoch': 0.87} 87%|████████▋ | 10751/12313 [8:03:03<1:09:57, 2.69s/it] 87%|████████▋ | 10752/12313 [8:03:06<1:09:32, 2.67s/it] {'loss': 0.5252, 'grad_norm': 5.897867057523133, 'learning_rate': 2.0781551548478607e-07, 'epoch': 0.87} 87%|████████▋ | 10752/12313 [8:03:06<1:09:32, 2.67s/it] 87%|████████▋ | 10753/12313 [8:03:08<1:08:09, 2.62s/it] {'loss': 0.3069, 'grad_norm': 5.203890577661351, 'learning_rate': 2.0755308700532077e-07, 'epoch': 0.87} 87%|████████▋ | 10753/12313 [8:03:08<1:08:09, 2.62s/it] 87%|████████▋ | 10754/12313 [8:03:11<1:09:07, 2.66s/it] {'loss': 0.3058, 'grad_norm': 9.388507828923037, 'learning_rate': 2.0729081715104958e-07, 'epoch': 0.87} 87%|████████▋ | 10754/12313 [8:03:11<1:09:07, 2.66s/it] 87%|████████▋ | 10755/12313 [8:03:13<1:08:28, 2.64s/it] {'loss': 0.4658, 'grad_norm': 4.665994646948376, 'learning_rate': 2.070287059401191e-07, 'epoch': 0.87} 87%|████████▋ | 10755/12313 [8:03:13<1:08:28, 2.64s/it] 87%|████████▋ | 10756/12313 [8:03:16<1:09:15, 2.67s/it] {'loss': 0.4733, 'grad_norm': 4.509183704578975, 'learning_rate': 2.0676675339066726e-07, 'epoch': 0.87} 87%|████████▋ | 10756/12313 [8:03:16<1:09:15, 2.67s/it] 87%|████████▋ | 10757/12313 [8:03:19<1:09:22, 2.67s/it] {'loss': 0.388, 'grad_norm': 4.982775473415814, 'learning_rate': 2.0650495952081935e-07, 'epoch': 0.87} 87%|████████▋ | 10757/12313 [8:03:19<1:09:22, 2.67s/it] 87%|████████▋ | 10758/12313 [8:03:22<1:09:08, 2.67s/it] {'loss': 0.4357, 'grad_norm': 3.915834558563621, 'learning_rate': 2.062433243486897e-07, 'epoch': 0.87} 87%|████████▋ | 10758/12313 [8:03:22<1:09:08, 2.67s/it] 87%|████████▋ | 10759/12313 [8:03:24<1:08:46, 2.66s/it] {'loss': 0.4962, 'grad_norm': 5.779273705619936, 'learning_rate': 2.059818478923825e-07, 'epoch': 0.87} 87%|████████▋ | 10759/12313 [8:03:24<1:08:46, 2.66s/it] 87%|████████▋ | 10760/12313 [8:03:27<1:09:00, 2.67s/it] {'loss': 0.4915, 'grad_norm': 7.124975275931914, 'learning_rate': 2.0572053016999079e-07, 'epoch': 0.87} 87%|████████▋ | 10760/12313 [8:03:27<1:09:00, 2.67s/it] 87%|████████▋ | 10761/12313 [8:03:30<1:09:21, 2.68s/it] {'loss': 0.5523, 'grad_norm': 4.688739212653256, 'learning_rate': 2.0545937119959557e-07, 'epoch': 0.87} 87%|████████▋ | 10761/12313 [8:03:30<1:09:21, 2.68s/it] 87%|████████▋ | 10762/12313 [8:03:32<1:07:30, 2.61s/it] {'loss': 0.4864, 'grad_norm': 6.512142297147514, 'learning_rate': 2.0519837099926888e-07, 'epoch': 0.87} 87%|████████▋ | 10762/12313 [8:03:32<1:07:30, 2.61s/it] 87%|████████▋ | 10763/12313 [8:03:35<1:09:03, 2.67s/it] {'loss': 0.4545, 'grad_norm': 3.968204310846973, 'learning_rate': 2.0493752958706982e-07, 'epoch': 0.87} 87%|████████▋ | 10763/12313 [8:03:35<1:09:03, 2.67s/it] 87%|████████▋ | 10764/12313 [8:03:38<1:09:14, 2.68s/it] {'loss': 0.3359, 'grad_norm': 5.589051586917989, 'learning_rate': 2.0467684698104674e-07, 'epoch': 0.87} 87%|████████▋ | 10764/12313 [8:03:38<1:09:14, 2.68s/it] 87%|████████▋ | 10765/12313 [8:03:40<1:10:02, 2.72s/it] {'loss': 0.3767, 'grad_norm': 9.371561135244747, 'learning_rate': 2.0441632319923798e-07, 'epoch': 0.87} 87%|████████▋ | 10765/12313 [8:03:40<1:10:02, 2.72s/it] 87%|████████▋ | 10766/12313 [8:03:43<1:09:46, 2.71s/it] {'loss': 0.5425, 'grad_norm': 6.212854338465917, 'learning_rate': 2.0415595825967084e-07, 'epoch': 0.87} 87%|████████▋ | 10766/12313 [8:03:43<1:09:46, 2.71s/it] 87%|████████▋ | 10767/12313 [8:03:46<1:10:08, 2.72s/it] {'loss': 0.6139, 'grad_norm': 4.1144373279605135, 'learning_rate': 2.0389575218036057e-07, 'epoch': 0.87} 87%|████████▋ | 10767/12313 [8:03:46<1:10:08, 2.72s/it] 87%|████████▋ | 10768/12313 [8:03:48<1:09:31, 2.70s/it] {'loss': 0.4479, 'grad_norm': 4.023569708161643, 'learning_rate': 2.0363570497931252e-07, 'epoch': 0.87} 87%|████████▋ | 10768/12313 [8:03:48<1:09:31, 2.70s/it] 87%|████████▋ | 10769/12313 [8:03:51<1:10:15, 2.73s/it] {'loss': 0.5775, 'grad_norm': 3.8449587292031686, 'learning_rate': 2.0337581667452034e-07, 'epoch': 0.87} 87%|████████▋ | 10769/12313 [8:03:51<1:10:15, 2.73s/it] 87%|████████▋ | 10770/12313 [8:03:54<1:08:05, 2.65s/it] {'loss': 0.5565, 'grad_norm': 6.240516560084869, 'learning_rate': 2.0311608728396658e-07, 'epoch': 0.87} 87%|████████▋ | 10770/12313 [8:03:54<1:08:05, 2.65s/it] 87%|████████▋ | 10771/12313 [8:03:56<1:08:28, 2.66s/it] {'loss': 0.4689, 'grad_norm': 7.754381694503205, 'learning_rate': 2.0285651682562357e-07, 'epoch': 0.87} 87%|████████▋ | 10771/12313 [8:03:56<1:08:28, 2.66s/it] 87%|████████▋ | 10772/12313 [8:03:59<1:11:19, 2.78s/it] {'loss': 0.4805, 'grad_norm': 5.711700208395236, 'learning_rate': 2.0259710531745247e-07, 'epoch': 0.87} 87%|████████▋ | 10772/12313 [8:03:59<1:11:19, 2.78s/it] 87%|████████▋ | 10773/12313 [8:04:02<1:10:15, 2.74s/it] {'loss': 0.4845, 'grad_norm': 5.1893458798213326, 'learning_rate': 2.023378527774028e-07, 'epoch': 0.87} 87%|████████▋ | 10773/12313 [8:04:02<1:10:15, 2.74s/it] 88%|████████▊ | 10774/12313 [8:04:05<1:10:40, 2.76s/it] {'loss': 0.4978, 'grad_norm': 7.133320338283911, 'learning_rate': 2.020787592234133e-07, 'epoch': 0.88} 88%|████████▊ | 10774/12313 [8:04:05<1:10:40, 2.76s/it] 88%|████████▊ | 10775/12313 [8:04:08<1:10:04, 2.73s/it] {'loss': 0.4116, 'grad_norm': 22.519170871083364, 'learning_rate': 2.0181982467341238e-07, 'epoch': 0.88} 88%|████████▊ | 10775/12313 [8:04:08<1:10:04, 2.73s/it] 88%|████████▊ | 10776/12313 [8:04:10<1:11:01, 2.77s/it] {'loss': 0.4641, 'grad_norm': 9.287767664431676, 'learning_rate': 2.0156104914531656e-07, 'epoch': 0.88} 88%|████████▊ | 10776/12313 [8:04:10<1:11:01, 2.77s/it] 88%|████████▊ | 10777/12313 [8:04:13<1:09:34, 2.72s/it] {'loss': 0.481, 'grad_norm': 4.893583526059918, 'learning_rate': 2.0130243265703148e-07, 'epoch': 0.88} 88%|████████▊ | 10777/12313 [8:04:13<1:09:34, 2.72s/it] 88%|████████▊ | 10778/12313 [8:04:16<1:08:28, 2.68s/it] {'loss': 0.3727, 'grad_norm': 5.320206234300975, 'learning_rate': 2.010439752264523e-07, 'epoch': 0.88} 88%|████████▊ | 10778/12313 [8:04:16<1:08:28, 2.68s/it] 88%|████████▊ | 10779/12313 [8:04:18<1:08:57, 2.70s/it] {'loss': 0.447, 'grad_norm': 4.677489372940032, 'learning_rate': 2.0078567687146333e-07, 'epoch': 0.88} 88%|████████▊ | 10779/12313 [8:04:18<1:08:57, 2.70s/it] 88%|████████▊ | 10780/12313 [8:04:21<1:07:52, 2.66s/it] {'loss': 0.4569, 'grad_norm': 4.0249196656831465, 'learning_rate': 2.0052753760993693e-07, 'epoch': 0.88} 88%|████████▊ | 10780/12313 [8:04:21<1:07:52, 2.66s/it] 88%|████████▊ | 10781/12313 [8:04:24<1:09:38, 2.73s/it] {'loss': 0.3912, 'grad_norm': 6.529303245559503, 'learning_rate': 2.002695574597352e-07, 'epoch': 0.88} 88%|████████▊ | 10781/12313 [8:04:24<1:09:38, 2.73s/it] 88%|████████▊ | 10782/12313 [8:04:26<1:09:14, 2.71s/it] {'loss': 0.5883, 'grad_norm': 5.418332568962745, 'learning_rate': 2.0001173643870915e-07, 'epoch': 0.88} 88%|████████▊ | 10782/12313 [8:04:26<1:09:14, 2.71s/it] 88%|████████▊ | 10783/12313 [8:04:29<1:07:43, 2.66s/it] {'loss': 0.4482, 'grad_norm': 5.33583924657257, 'learning_rate': 1.9975407456469808e-07, 'epoch': 0.88} 88%|████████▊ | 10783/12313 [8:04:29<1:07:43, 2.66s/it] 88%|████████▊ | 10784/12313 [8:04:32<1:07:46, 2.66s/it] {'loss': 0.4846, 'grad_norm': 5.518754450917286, 'learning_rate': 1.9949657185553113e-07, 'epoch': 0.88} 88%|████████▊ | 10784/12313 [8:04:32<1:07:46, 2.66s/it] 88%|████████▊ | 10785/12313 [8:04:34<1:06:58, 2.63s/it] {'loss': 0.4897, 'grad_norm': 4.466728527582831, 'learning_rate': 1.992392283290265e-07, 'epoch': 0.88} 88%|████████▊ | 10785/12313 [8:04:34<1:06:58, 2.63s/it] 88%|████████▊ | 10786/12313 [8:04:37<1:08:41, 2.70s/it] {'loss': 0.4688, 'grad_norm': 6.062215529147515, 'learning_rate': 1.9898204400299021e-07, 'epoch': 0.88} 88%|████████▊ | 10786/12313 [8:04:37<1:08:41, 2.70s/it] 88%|████████▊ | 10787/12313 [8:04:40<1:08:06, 2.68s/it] {'loss': 0.4591, 'grad_norm': 6.196645398811499, 'learning_rate': 1.9872501889521916e-07, 'epoch': 0.88} 88%|████████▊ | 10787/12313 [8:04:40<1:08:06, 2.68s/it] 88%|████████▊ | 10788/12313 [8:04:43<1:12:42, 2.86s/it] {'loss': 0.3928, 'grad_norm': 5.38690541555504, 'learning_rate': 1.984681530234972e-07, 'epoch': 0.88} 88%|████████▊ | 10788/12313 [8:04:43<1:12:42, 2.86s/it] 88%|████████▊ | 10789/12313 [8:04:46<1:12:34, 2.86s/it] {'loss': 0.4919, 'grad_norm': 9.821582116837323, 'learning_rate': 1.9821144640559842e-07, 'epoch': 0.88} 88%|████████▊ | 10789/12313 [8:04:46<1:12:34, 2.86s/it] 88%|████████▊ | 10790/12313 [8:04:48<1:09:49, 2.75s/it] {'loss': 0.459, 'grad_norm': 7.245646513525235, 'learning_rate': 1.9795489905928527e-07, 'epoch': 0.88} 88%|████████▊ | 10790/12313 [8:04:48<1:09:49, 2.75s/it] 88%|████████▊ | 10791/12313 [8:04:51<1:09:15, 2.73s/it] {'loss': 0.4777, 'grad_norm': 7.6962316221213385, 'learning_rate': 1.976985110023094e-07, 'epoch': 0.88} 88%|████████▊ | 10791/12313 [8:04:51<1:09:15, 2.73s/it] 88%|████████▊ | 10792/12313 [8:04:54<1:12:11, 2.85s/it] {'loss': 0.4522, 'grad_norm': 4.354060604688436, 'learning_rate': 1.9744228225241248e-07, 'epoch': 0.88} 88%|████████▊ | 10792/12313 [8:04:54<1:12:11, 2.85s/it] 88%|████████▊ | 10793/12313 [8:04:57<1:11:40, 2.83s/it] {'loss': 0.4305, 'grad_norm': 4.660763544694212, 'learning_rate': 1.9718621282732302e-07, 'epoch': 0.88} 88%|████████▊ | 10793/12313 [8:04:57<1:11:40, 2.83s/it] 88%|████████▊ | 10794/12313 [8:04:59<1:08:35, 2.71s/it] {'loss': 0.4301, 'grad_norm': 6.796070237851726, 'learning_rate': 1.9693030274476054e-07, 'epoch': 0.88} 88%|████████▊ | 10794/12313 [8:04:59<1:08:35, 2.71s/it] 88%|████████▊ | 10795/12313 [8:05:02<1:07:43, 2.68s/it] {'loss': 0.4308, 'grad_norm': 6.8550929057120635, 'learning_rate': 1.9667455202243223e-07, 'epoch': 0.88} 88%|████████▊ | 10795/12313 [8:05:02<1:07:43, 2.68s/it] 88%|████████▊ | 10796/12313 [8:05:05<1:07:38, 2.68s/it] {'loss': 0.4715, 'grad_norm': 10.526313953549316, 'learning_rate': 1.9641896067803452e-07, 'epoch': 0.88} 88%|████████▊ | 10796/12313 [8:05:05<1:07:38, 2.68s/it] 88%|████████▊ | 10797/12313 [8:05:08<1:10:39, 2.80s/it] {'loss': 0.529, 'grad_norm': 4.6047250304925615, 'learning_rate': 1.9616352872925293e-07, 'epoch': 0.88} 88%|████████▊ | 10797/12313 [8:05:08<1:10:39, 2.80s/it] 88%|████████▊ | 10798/12313 [8:05:10<1:10:36, 2.80s/it] {'loss': 0.5294, 'grad_norm': 6.880938367850307, 'learning_rate': 1.959082561937628e-07, 'epoch': 0.88} 88%|████████▊ | 10798/12313 [8:05:11<1:10:36, 2.80s/it] 88%|████████▊ | 10799/12313 [8:05:13<1:07:56, 2.69s/it] {'loss': 0.4086, 'grad_norm': 3.8531909306263272, 'learning_rate': 1.9565314308922666e-07, 'epoch': 0.88} 88%|████████▊ | 10799/12313 [8:05:13<1:07:56, 2.69s/it] 88%|████████▊ | 10800/12313 [8:05:16<1:11:32, 2.84s/it] {'loss': 0.4557, 'grad_norm': 5.451866169993255, 'learning_rate': 1.9539818943329792e-07, 'epoch': 0.88} 88%|████████▊ | 10800/12313 [8:05:16<1:11:32, 2.84s/it] 88%|████████▊ | 10801/12313 [8:05:19<1:10:21, 2.79s/it] {'loss': 0.4339, 'grad_norm': 6.999546249788755, 'learning_rate': 1.9514339524361742e-07, 'epoch': 0.88} 88%|████████▊ | 10801/12313 [8:05:19<1:10:21, 2.79s/it] 88%|████████▊ | 10802/12313 [8:05:22<1:13:07, 2.90s/it] {'loss': 0.4453, 'grad_norm': 5.662909641489243, 'learning_rate': 1.9488876053781552e-07, 'epoch': 0.88} 88%|████████▊ | 10802/12313 [8:05:22<1:13:07, 2.90s/it] 88%|████████▊ | 10803/12313 [8:05:25<1:10:39, 2.81s/it] {'loss': 0.6062, 'grad_norm': 4.898755716797307, 'learning_rate': 1.9463428533351202e-07, 'epoch': 0.88} 88%|████████▊ | 10803/12313 [8:05:25<1:10:39, 2.81s/it] 88%|████████▊ | 10804/12313 [8:05:27<1:09:50, 2.78s/it] {'loss': 0.4585, 'grad_norm': 4.183435356460552, 'learning_rate': 1.943799696483145e-07, 'epoch': 0.88} 88%|████████▊ | 10804/12313 [8:05:27<1:09:50, 2.78s/it] 88%|████████▊ | 10805/12313 [8:05:30<1:11:27, 2.84s/it] {'loss': 0.4212, 'grad_norm': 4.396794358678346, 'learning_rate': 1.9412581349982113e-07, 'epoch': 0.88} 88%|████████▊ | 10805/12313 [8:05:30<1:11:27, 2.84s/it] 88%|████████▊ | 10806/12313 [8:05:33<1:09:51, 2.78s/it] {'loss': 0.5645, 'grad_norm': 4.627929173347972, 'learning_rate': 1.938718169056175e-07, 'epoch': 0.88} 88%|████████▊ | 10806/12313 [8:05:33<1:09:51, 2.78s/it] 88%|████████▊ | 10807/12313 [8:05:36<1:09:24, 2.77s/it] {'loss': 0.6663, 'grad_norm': 4.99764782498259, 'learning_rate': 1.9361797988327961e-07, 'epoch': 0.88} 88%|████████▊ | 10807/12313 [8:05:36<1:09:24, 2.77s/it] 88%|████████▊ | 10808/12313 [8:05:38<1:06:45, 2.66s/it] {'loss': 0.5455, 'grad_norm': 3.9890801417363937, 'learning_rate': 1.933643024503712e-07, 'epoch': 0.88} 88%|████████▊ | 10808/12313 [8:05:38<1:06:45, 2.66s/it] 88%|████████▊ | 10809/12313 [8:05:41<1:06:18, 2.65s/it] {'loss': 0.633, 'grad_norm': 5.065903978646239, 'learning_rate': 1.9311078462444484e-07, 'epoch': 0.88} 88%|████████▊ | 10809/12313 [8:05:41<1:06:18, 2.65s/it] 88%|████████▊ | 10810/12313 [8:05:44<1:10:17, 2.81s/it] {'loss': 0.3581, 'grad_norm': 7.184247053486219, 'learning_rate': 1.928574264230429e-07, 'epoch': 0.88} 88%|████████▊ | 10810/12313 [8:05:44<1:10:17, 2.81s/it] 88%|████████▊ | 10811/12313 [8:05:46<1:08:31, 2.74s/it] {'loss': 0.3958, 'grad_norm': 7.648845912445987, 'learning_rate': 1.9260422786369747e-07, 'epoch': 0.88} 88%|████████▊ | 10811/12313 [8:05:46<1:08:31, 2.74s/it] 88%|████████▊ | 10812/12313 [8:05:50<1:11:40, 2.86s/it] {'loss': 0.3277, 'grad_norm': 2.778368867161788, 'learning_rate': 1.9235118896392706e-07, 'epoch': 0.88} 88%|████████▊ | 10812/12313 [8:05:50<1:11:40, 2.86s/it] 88%|████████▊ | 10813/12313 [8:05:52<1:09:39, 2.79s/it] {'loss': 0.4247, 'grad_norm': 4.8097717739494685, 'learning_rate': 1.9209830974124183e-07, 'epoch': 0.88} 88%|████████▊ | 10813/12313 [8:05:52<1:09:39, 2.79s/it] 88%|████████▊ | 10814/12313 [8:05:55<1:06:56, 2.68s/it] {'loss': 0.4587, 'grad_norm': 10.137967528923353, 'learning_rate': 1.9184559021313914e-07, 'epoch': 0.88} 88%|████████▊ | 10814/12313 [8:05:55<1:06:56, 2.68s/it] 88%|████████▊ | 10815/12313 [8:05:57<1:07:15, 2.69s/it] {'loss': 0.4826, 'grad_norm': 4.796708634400599, 'learning_rate': 1.9159303039710558e-07, 'epoch': 0.88} 88%|████████▊ | 10815/12313 [8:05:57<1:07:15, 2.69s/it] 88%|████████▊ | 10816/12313 [8:06:00<1:07:09, 2.69s/it] {'loss': 0.516, 'grad_norm': 9.314388410278989, 'learning_rate': 1.9134063031061744e-07, 'epoch': 0.88} 88%|████████▊ | 10816/12313 [8:06:00<1:07:09, 2.69s/it] 88%|████████▊ | 10817/12313 [8:06:03<1:08:28, 2.75s/it] {'loss': 0.5164, 'grad_norm': 7.232628959274363, 'learning_rate': 1.910883899711391e-07, 'epoch': 0.88} 88%|████████▊ | 10817/12313 [8:06:03<1:08:28, 2.75s/it] 88%|████████▊ | 10818/12313 [8:06:06<1:08:25, 2.75s/it] {'loss': 0.5444, 'grad_norm': 3.9022829908795322, 'learning_rate': 1.9083630939612407e-07, 'epoch': 0.88} 88%|████████▊ | 10818/12313 [8:06:06<1:08:25, 2.75s/it] 88%|████████▊ | 10819/12313 [8:06:08<1:05:45, 2.64s/it] {'loss': 0.3569, 'grad_norm': 3.8942957114269627, 'learning_rate': 1.9058438860301621e-07, 'epoch': 0.88} 88%|████████▊ | 10819/12313 [8:06:08<1:05:45, 2.64s/it] 88%|████████▊ | 10820/12313 [8:06:11<1:05:49, 2.65s/it] {'loss': 0.4069, 'grad_norm': 3.166404623353523, 'learning_rate': 1.9033262760924598e-07, 'epoch': 0.88} 88%|████████▊ | 10820/12313 [8:06:11<1:05:49, 2.65s/it] 88%|████████▊ | 10821/12313 [8:06:13<1:04:42, 2.60s/it] {'loss': 0.3724, 'grad_norm': 5.035265769137359, 'learning_rate': 1.900810264322339e-07, 'epoch': 0.88} 88%|████████▊ | 10821/12313 [8:06:13<1:04:42, 2.60s/it] 88%|████████▊ | 10822/12313 [8:06:16<1:07:04, 2.70s/it] {'loss': 0.4983, 'grad_norm': 6.137216581765903, 'learning_rate': 1.8982958508938998e-07, 'epoch': 0.88} 88%|████████▊ | 10822/12313 [8:06:16<1:07:04, 2.70s/it] 88%|████████▊ | 10823/12313 [8:06:19<1:07:12, 2.71s/it] {'loss': 0.5643, 'grad_norm': 6.449621491427784, 'learning_rate': 1.895783035981119e-07, 'epoch': 0.88} 88%|████████▊ | 10823/12313 [8:06:19<1:07:12, 2.71s/it] 88%|████████▊ | 10824/12313 [8:06:21<1:06:26, 2.68s/it] {'loss': 0.4371, 'grad_norm': 7.017513815097841, 'learning_rate': 1.8932718197578802e-07, 'epoch': 0.88} 88%|████████▊ | 10824/12313 [8:06:21<1:06:26, 2.68s/it] 88%|████████▊ | 10825/12313 [8:06:24<1:08:02, 2.74s/it] {'loss': 0.7439, 'grad_norm': 4.915420085875843, 'learning_rate': 1.890762202397936e-07, 'epoch': 0.88} 88%|████████▊ | 10825/12313 [8:06:24<1:08:02, 2.74s/it] 88%|████████▊ | 10826/12313 [8:06:27<1:08:51, 2.78s/it] {'loss': 0.5297, 'grad_norm': 5.057095171528424, 'learning_rate': 1.8882541840749475e-07, 'epoch': 0.88} 88%|████████▊ | 10826/12313 [8:06:27<1:08:51, 2.78s/it] 88%|████████▊ | 10827/12313 [8:06:30<1:07:51, 2.74s/it] {'loss': 0.5945, 'grad_norm': 4.992472857392407, 'learning_rate': 1.8857477649624533e-07, 'epoch': 0.88} 88%|████████▊ | 10827/12313 [8:06:30<1:07:51, 2.74s/it] 88%|████████▊ | 10828/12313 [8:06:32<1:06:20, 2.68s/it] {'loss': 0.5583, 'grad_norm': 8.570114515507747, 'learning_rate': 1.883242945233879e-07, 'epoch': 0.88} 88%|████████▊ | 10828/12313 [8:06:32<1:06:20, 2.68s/it] 88%|████████▊ | 10829/12313 [8:06:35<1:05:12, 2.64s/it] {'loss': 0.3574, 'grad_norm': 4.881453778874329, 'learning_rate': 1.8807397250625497e-07, 'epoch': 0.88} 88%|████████▊ | 10829/12313 [8:06:35<1:05:12, 2.64s/it] 88%|████████▊ | 10830/12313 [8:06:37<1:03:56, 2.59s/it] {'loss': 0.2625, 'grad_norm': 9.774498600782247, 'learning_rate': 1.878238104621677e-07, 'epoch': 0.88} 88%|████████▊ | 10830/12313 [8:06:37<1:03:56, 2.59s/it] 88%|████████▊ | 10831/12313 [8:06:40<1:04:16, 2.60s/it] {'loss': 0.4918, 'grad_norm': 5.93121202645404, 'learning_rate': 1.8757380840843526e-07, 'epoch': 0.88} 88%|████████▊ | 10831/12313 [8:06:40<1:04:16, 2.60s/it] 88%|████████▊ | 10832/12313 [8:06:43<1:05:03, 2.64s/it] {'loss': 0.6349, 'grad_norm': 3.681222565715722, 'learning_rate': 1.8732396636235744e-07, 'epoch': 0.88} 88%|████████▊ | 10832/12313 [8:06:43<1:05:03, 2.64s/it] 88%|████████▊ | 10833/12313 [8:06:46<1:08:56, 2.79s/it] {'loss': 0.576, 'grad_norm': 6.108142266361864, 'learning_rate': 1.8707428434122155e-07, 'epoch': 0.88} 88%|████████▊ | 10833/12313 [8:06:46<1:08:56, 2.79s/it] 88%|████████▊ | 10834/12313 [8:06:48<1:06:33, 2.70s/it] {'loss': 0.4904, 'grad_norm': 6.054085118997518, 'learning_rate': 1.8682476236230372e-07, 'epoch': 0.88} 88%|████████▊ | 10834/12313 [8:06:48<1:06:33, 2.70s/it] 88%|████████▊ | 10835/12313 [8:06:51<1:05:08, 2.64s/it] {'loss': 0.3619, 'grad_norm': 7.843292571737394, 'learning_rate': 1.8657540044287047e-07, 'epoch': 0.88} 88%|████████▊ | 10835/12313 [8:06:51<1:05:08, 2.64s/it] 88%|████████▊ | 10836/12313 [8:06:54<1:05:48, 2.67s/it] {'loss': 0.3475, 'grad_norm': 3.6574632924680044, 'learning_rate': 1.8632619860017547e-07, 'epoch': 0.88} 88%|████████▊ | 10836/12313 [8:06:54<1:05:48, 2.67s/it] 88%|████████▊ | 10837/12313 [8:06:56<1:05:57, 2.68s/it] {'loss': 0.463, 'grad_norm': 5.74599899447566, 'learning_rate': 1.8607715685146244e-07, 'epoch': 0.88} 88%|████████▊ | 10837/12313 [8:06:56<1:05:57, 2.68s/it] 88%|████████▊ | 10838/12313 [8:06:59<1:04:49, 2.64s/it] {'loss': 0.5069, 'grad_norm': 4.130892724817378, 'learning_rate': 1.8582827521396453e-07, 'epoch': 0.88} 88%|████████▊ | 10838/12313 [8:06:59<1:04:49, 2.64s/it] 88%|████████▊ | 10839/12313 [8:07:02<1:05:25, 2.66s/it] {'loss': 0.4378, 'grad_norm': 6.536393003461804, 'learning_rate': 1.855795537049021e-07, 'epoch': 0.88} 88%|████████▊ | 10839/12313 [8:07:02<1:05:25, 2.66s/it] 88%|████████▊ | 10840/12313 [8:07:05<1:07:01, 2.73s/it] {'loss': 0.481, 'grad_norm': 5.520938271169849, 'learning_rate': 1.853309923414856e-07, 'epoch': 0.88} 88%|████████▊ | 10840/12313 [8:07:05<1:07:01, 2.73s/it] 88%|████████▊ | 10841/12313 [8:07:07<1:05:19, 2.66s/it] {'loss': 0.5875, 'grad_norm': 13.77624681570704, 'learning_rate': 1.8508259114091432e-07, 'epoch': 0.88} 88%|████████▊ | 10841/12313 [8:07:07<1:05:19, 2.66s/it] 88%|████████▊ | 10842/12313 [8:07:10<1:04:51, 2.65s/it] {'loss': 0.3521, 'grad_norm': 4.241071474476681, 'learning_rate': 1.8483435012037587e-07, 'epoch': 0.88} 88%|████████▊ | 10842/12313 [8:07:10<1:04:51, 2.65s/it] 88%|████████▊ | 10843/12313 [8:07:12<1:05:20, 2.67s/it] {'loss': 0.4218, 'grad_norm': 8.981810629766965, 'learning_rate': 1.8458626929704821e-07, 'epoch': 0.88} 88%|████████▊ | 10843/12313 [8:07:12<1:05:20, 2.67s/it] 88%|████████▊ | 10844/12313 [8:07:15<1:07:21, 2.75s/it] {'loss': 0.6531, 'grad_norm': 4.1379268820071236, 'learning_rate': 1.843383486880959e-07, 'epoch': 0.88} 88%|████████▊ | 10844/12313 [8:07:15<1:07:21, 2.75s/it] 88%|████████▊ | 10845/12313 [8:07:18<1:05:25, 2.67s/it] {'loss': 0.6161, 'grad_norm': 4.30762221426034, 'learning_rate': 1.840905883106747e-07, 'epoch': 0.88} 88%|████████▊ | 10845/12313 [8:07:18<1:05:25, 2.67s/it] 88%|████████▊ | 10846/12313 [8:07:20<1:04:14, 2.63s/it] {'loss': 0.4597, 'grad_norm': 6.207428929843423, 'learning_rate': 1.8384298818192814e-07, 'epoch': 0.88} 88%|████████▊ | 10846/12313 [8:07:20<1:04:14, 2.63s/it] 88%|████████▊ | 10847/12313 [8:07:23<1:04:51, 2.65s/it] {'loss': 0.4932, 'grad_norm': 6.194628295310705, 'learning_rate': 1.835955483189883e-07, 'epoch': 0.88} 88%|████████▊ | 10847/12313 [8:07:23<1:04:51, 2.65s/it] 88%|████████▊ | 10848/12313 [8:07:26<1:07:35, 2.77s/it] {'loss': 0.4016, 'grad_norm': 4.398170364958908, 'learning_rate': 1.833482687389776e-07, 'epoch': 0.88} 88%|████████▊ | 10848/12313 [8:07:26<1:07:35, 2.77s/it] 88%|████████▊ | 10849/12313 [8:07:29<1:06:06, 2.71s/it] {'loss': 0.4765, 'grad_norm': 6.570347236455752, 'learning_rate': 1.831011494590054e-07, 'epoch': 0.88} 88%|████████▊ | 10849/12313 [8:07:29<1:06:06, 2.71s/it] 88%|████████▊ | 10850/12313 [8:07:31<1:05:54, 2.70s/it] {'loss': 0.4721, 'grad_norm': 5.039000129540002, 'learning_rate': 1.828541904961717e-07, 'epoch': 0.88} 88%|████████▊ | 10850/12313 [8:07:31<1:05:54, 2.70s/it] 88%|████████▊ | 10851/12313 [8:07:34<1:07:18, 2.76s/it] {'loss': 0.7508, 'grad_norm': 6.470581102201411, 'learning_rate': 1.8260739186756527e-07, 'epoch': 0.88} 88%|████████▊ | 10851/12313 [8:07:34<1:07:18, 2.76s/it] 88%|████████▊ | 10852/12313 [8:07:37<1:07:39, 2.78s/it] {'loss': 0.4522, 'grad_norm': 4.439815787899128, 'learning_rate': 1.8236075359026246e-07, 'epoch': 0.88} 88%|████████▊ | 10852/12313 [8:07:37<1:07:39, 2.78s/it] 88%|████████▊ | 10853/12313 [8:07:40<1:06:02, 2.71s/it] {'loss': 0.4571, 'grad_norm': 6.875926190964903, 'learning_rate': 1.8211427568132932e-07, 'epoch': 0.88} 88%|████████▊ | 10853/12313 [8:07:40<1:06:02, 2.71s/it] 88%|████████▊ | 10854/12313 [8:07:42<1:04:38, 2.66s/it] {'loss': 0.4715, 'grad_norm': 5.042487513250472, 'learning_rate': 1.8186795815782143e-07, 'epoch': 0.88} 88%|████████▊ | 10854/12313 [8:07:42<1:04:38, 2.66s/it] 88%|████████▊ | 10855/12313 [8:07:45<1:03:04, 2.60s/it] {'loss': 0.357, 'grad_norm': 6.319914060990893, 'learning_rate': 1.8162180103678177e-07, 'epoch': 0.88} 88%|████████▊ | 10855/12313 [8:07:45<1:03:04, 2.60s/it] 88%|████████▊ | 10856/12313 [8:07:47<1:01:59, 2.55s/it] {'loss': 0.6732, 'grad_norm': 5.98665836166037, 'learning_rate': 1.813758043352437e-07, 'epoch': 0.88} 88%|████████▊ | 10856/12313 [8:07:47<1:01:59, 2.55s/it] 88%|████████▊ | 10857/12313 [8:07:49<1:00:42, 2.50s/it] {'loss': 0.4064, 'grad_norm': 4.90779966317718, 'learning_rate': 1.8112996807022943e-07, 'epoch': 0.88} 88%|████████▊ | 10857/12313 [8:07:49<1:00:42, 2.50s/it] 88%|████████▊ | 10858/12313 [8:07:52<1:00:53, 2.51s/it] {'loss': 0.5962, 'grad_norm': 4.954078590442837, 'learning_rate': 1.8088429225874865e-07, 'epoch': 0.88} 88%|████████▊ | 10858/12313 [8:07:52<1:00:53, 2.51s/it] 88%|████████▊ | 10859/12313 [8:07:54<1:00:53, 2.51s/it] {'loss': 0.4485, 'grad_norm': 7.208628926964807, 'learning_rate': 1.8063877691780114e-07, 'epoch': 0.88} 88%|████████▊ | 10859/12313 [8:07:54<1:00:53, 2.51s/it] 88%|████████▊ | 10860/12313 [8:07:57<1:02:19, 2.57s/it] {'loss': 0.6773, 'grad_norm': 5.399228378212703, 'learning_rate': 1.8039342206437494e-07, 'epoch': 0.88} 88%|████████▊ | 10860/12313 [8:07:57<1:02:19, 2.57s/it] 88%|████████▊ | 10861/12313 [8:08:00<1:05:29, 2.71s/it] {'loss': 0.5487, 'grad_norm': 6.733238765318234, 'learning_rate': 1.8014822771544787e-07, 'epoch': 0.88} 88%|████████▊ | 10861/12313 [8:08:00<1:05:29, 2.71s/it] 88%|████████▊ | 10862/12313 [8:08:03<1:05:14, 2.70s/it] {'loss': 0.379, 'grad_norm': 6.204508713784901, 'learning_rate': 1.7990319388798527e-07, 'epoch': 0.88} 88%|████████▊ | 10862/12313 [8:08:03<1:05:14, 2.70s/it] 88%|████████▊ | 10863/12313 [8:08:05<1:04:47, 2.68s/it] {'loss': 0.4973, 'grad_norm': 4.9418711968437545, 'learning_rate': 1.79658320598943e-07, 'epoch': 0.88} 88%|████████▊ | 10863/12313 [8:08:05<1:04:47, 2.68s/it] 88%|████████▊ | 10864/12313 [8:08:08<1:05:04, 2.69s/it] {'loss': 0.5288, 'grad_norm': 7.6609348179930805, 'learning_rate': 1.79413607865265e-07, 'epoch': 0.88} 88%|████████▊ | 10864/12313 [8:08:08<1:05:04, 2.69s/it] 88%|████████▊ | 10865/12313 [8:08:11<1:04:07, 2.66s/it] {'loss': 0.4702, 'grad_norm': 5.302913931640557, 'learning_rate': 1.7916905570388387e-07, 'epoch': 0.88} 88%|████████▊ | 10865/12313 [8:08:11<1:04:07, 2.66s/it] 88%|████████▊ | 10866/12313 [8:08:13<1:04:24, 2.67s/it] {'loss': 0.3797, 'grad_norm': 5.228383412229427, 'learning_rate': 1.7892466413172076e-07, 'epoch': 0.88} 88%|████████▊ | 10866/12313 [8:08:13<1:04:24, 2.67s/it] 88%|████████▊ | 10867/12313 [8:08:16<1:03:29, 2.63s/it] {'loss': 0.3931, 'grad_norm': 4.948041110181531, 'learning_rate': 1.7868043316568718e-07, 'epoch': 0.88} 88%|████████▊ | 10867/12313 [8:08:16<1:03:29, 2.63s/it] 88%|████████▊ | 10868/12313 [8:08:19<1:03:33, 2.64s/it] {'loss': 0.4122, 'grad_norm': 8.069840917352794, 'learning_rate': 1.784363628226818e-07, 'epoch': 0.88} 88%|████████▊ | 10868/12313 [8:08:19<1:03:33, 2.64s/it] 88%|████████▊ | 10869/12313 [8:08:21<1:03:40, 2.65s/it] {'loss': 0.4682, 'grad_norm': 5.771520601557648, 'learning_rate': 1.781924531195933e-07, 'epoch': 0.88} 88%|████████▊ | 10869/12313 [8:08:21<1:03:40, 2.65s/it] 88%|████████▊ | 10870/12313 [8:08:24<1:03:06, 2.62s/it] {'loss': 0.3881, 'grad_norm': 5.964777339901852, 'learning_rate': 1.7794870407329968e-07, 'epoch': 0.88} 88%|████████▊ | 10870/12313 [8:08:24<1:03:06, 2.62s/it] 88%|████████▊ | 10871/12313 [8:08:27<1:02:55, 2.62s/it] {'loss': 0.4282, 'grad_norm': 17.63419248026244, 'learning_rate': 1.7770511570066622e-07, 'epoch': 0.88} 88%|████████▊ | 10871/12313 [8:08:27<1:02:55, 2.62s/it] 88%|████████▊ | 10872/12313 [8:08:30<1:05:31, 2.73s/it] {'loss': 0.4074, 'grad_norm': 4.937868582682188, 'learning_rate': 1.7746168801854786e-07, 'epoch': 0.88} 88%|████████▊ | 10872/12313 [8:08:30<1:05:31, 2.73s/it] 88%|████████▊ | 10873/12313 [8:08:32<1:06:49, 2.78s/it] {'loss': 0.3702, 'grad_norm': 7.45687522217002, 'learning_rate': 1.772184210437894e-07, 'epoch': 0.88} 88%|████████▊ | 10873/12313 [8:08:32<1:06:49, 2.78s/it] 88%|████████▊ | 10874/12313 [8:08:35<1:05:03, 2.71s/it] {'loss': 0.4297, 'grad_norm': 4.456345546727624, 'learning_rate': 1.7697531479322294e-07, 'epoch': 0.88} 88%|████████▊ | 10874/12313 [8:08:35<1:05:03, 2.71s/it] 88%|████████▊ | 10875/12313 [8:08:38<1:07:39, 2.82s/it] {'loss': 0.4171, 'grad_norm': 4.869647315752035, 'learning_rate': 1.7673236928366976e-07, 'epoch': 0.88} 88%|████████▊ | 10875/12313 [8:08:38<1:07:39, 2.82s/it] 88%|████████▊ | 10876/12313 [8:08:41<1:06:26, 2.77s/it] {'loss': 0.4707, 'grad_norm': 4.695411262956829, 'learning_rate': 1.7648958453194086e-07, 'epoch': 0.88} 88%|████████▊ | 10876/12313 [8:08:41<1:06:26, 2.77s/it] 88%|████████▊ | 10877/12313 [8:08:43<1:06:09, 2.76s/it] {'loss': 0.4886, 'grad_norm': 7.882022541168915, 'learning_rate': 1.7624696055483643e-07, 'epoch': 0.88} 88%|████████▊ | 10877/12313 [8:08:43<1:06:09, 2.76s/it] 88%|████████▊ | 10878/12313 [8:08:46<1:04:22, 2.69s/it] {'loss': 0.4957, 'grad_norm': 7.071815658627316, 'learning_rate': 1.7600449736914384e-07, 'epoch': 0.88} 88%|████████▊ | 10878/12313 [8:08:46<1:04:22, 2.69s/it] 88%|████████▊ | 10879/12313 [8:08:49<1:06:07, 2.77s/it] {'loss': 0.395, 'grad_norm': 5.21615020327953, 'learning_rate': 1.7576219499163995e-07, 'epoch': 0.88} 88%|████████▊ | 10879/12313 [8:08:49<1:06:07, 2.77s/it] 88%|████████▊ | 10880/12313 [8:08:52<1:05:36, 2.75s/it] {'loss': 0.6283, 'grad_norm': 3.7677226715193015, 'learning_rate': 1.7552005343909162e-07, 'epoch': 0.88} 88%|████████▊ | 10880/12313 [8:08:52<1:05:36, 2.75s/it] 88%|████████▊ | 10881/12313 [8:08:54<1:05:33, 2.75s/it] {'loss': 0.4107, 'grad_norm': 6.221136511822739, 'learning_rate': 1.7527807272825326e-07, 'epoch': 0.88} 88%|████████▊ | 10881/12313 [8:08:54<1:05:33, 2.75s/it] 88%|████████▊ | 10882/12313 [8:08:57<1:04:21, 2.70s/it] {'loss': 0.4108, 'grad_norm': 4.483843051687907, 'learning_rate': 1.7503625287586896e-07, 'epoch': 0.88} 88%|████████▊ | 10882/12313 [8:08:57<1:04:21, 2.70s/it] 88%|████████▊ | 10883/12313 [8:09:00<1:05:33, 2.75s/it] {'loss': 0.463, 'grad_norm': 5.371900604040972, 'learning_rate': 1.7479459389867141e-07, 'epoch': 0.88} 88%|████████▊ | 10883/12313 [8:09:00<1:05:33, 2.75s/it] 88%|████████▊ | 10884/12313 [8:09:02<1:03:55, 2.68s/it] {'loss': 0.5657, 'grad_norm': 4.729136507795324, 'learning_rate': 1.7455309581338204e-07, 'epoch': 0.88} 88%|████████▊ | 10884/12313 [8:09:02<1:03:55, 2.68s/it] 88%|████████▊ | 10885/12313 [8:09:05<1:04:22, 2.70s/it] {'loss': 0.462, 'grad_norm': 5.952224993059543, 'learning_rate': 1.7431175863671102e-07, 'epoch': 0.88} 88%|████████▊ | 10885/12313 [8:09:05<1:04:22, 2.70s/it] 88%|████████▊ | 10886/12313 [8:09:08<1:04:30, 2.71s/it] {'loss': 0.5023, 'grad_norm': 5.332245081683401, 'learning_rate': 1.740705823853578e-07, 'epoch': 0.88} 88%|████████▊ | 10886/12313 [8:09:08<1:04:30, 2.71s/it] 88%|████████▊ | 10887/12313 [8:09:10<1:04:00, 2.69s/it] {'loss': 0.6023, 'grad_norm': 4.301376869737246, 'learning_rate': 1.7382956707601068e-07, 'epoch': 0.88} 88%|████████▊ | 10887/12313 [8:09:10<1:04:00, 2.69s/it] 88%|████████▊ | 10888/12313 [8:09:13<1:05:27, 2.76s/it] {'loss': 0.383, 'grad_norm': 7.399999113034231, 'learning_rate': 1.7358871272534604e-07, 'epoch': 0.88} 88%|████████▊ | 10888/12313 [8:09:13<1:05:27, 2.76s/it] 88%|████████▊ | 10889/12313 [8:09:16<1:03:14, 2.66s/it] {'loss': 0.3966, 'grad_norm': 6.01935069429763, 'learning_rate': 1.7334801935003003e-07, 'epoch': 0.88} 88%|████████▊ | 10889/12313 [8:09:16<1:03:14, 2.66s/it] 88%|████████▊ | 10890/12313 [8:09:18<1:01:22, 2.59s/it] {'loss': 0.4748, 'grad_norm': 12.267510974208244, 'learning_rate': 1.7310748696671791e-07, 'epoch': 0.88} 88%|████████▊ | 10890/12313 [8:09:18<1:01:22, 2.59s/it] 88%|████████▊ | 10891/12313 [8:09:21<1:02:39, 2.64s/it] {'loss': 0.557, 'grad_norm': 4.471683226123735, 'learning_rate': 1.728671155920525e-07, 'epoch': 0.88} 88%|████████▊ | 10891/12313 [8:09:21<1:02:39, 2.64s/it] 88%|████████▊ | 10892/12313 [8:09:24<1:04:10, 2.71s/it] {'loss': 0.3825, 'grad_norm': 4.5797052007927626, 'learning_rate': 1.7262690524266658e-07, 'epoch': 0.88} 88%|████████▊ | 10892/12313 [8:09:24<1:04:10, 2.71s/it] 88%|████████▊ | 10893/12313 [8:09:26<1:01:56, 2.62s/it] {'loss': 0.4803, 'grad_norm': 5.819985449696891, 'learning_rate': 1.7238685593518157e-07, 'epoch': 0.88} 88%|████████▊ | 10893/12313 [8:09:26<1:01:56, 2.62s/it] 88%|████████▊ | 10894/12313 [8:09:29<1:01:19, 2.59s/it] {'loss': 0.4021, 'grad_norm': 12.610582423388232, 'learning_rate': 1.7214696768620699e-07, 'epoch': 0.88} 88%|████████▊ | 10894/12313 [8:09:29<1:01:19, 2.59s/it] 88%|████████▊ | 10895/12313 [8:09:32<1:02:41, 2.65s/it] {'loss': 0.4256, 'grad_norm': 5.295182105134932, 'learning_rate': 1.719072405123423e-07, 'epoch': 0.88} 88%|████████▊ | 10895/12313 [8:09:32<1:02:41, 2.65s/it] 88%|████████▊ | 10896/12313 [8:09:34<1:01:13, 2.59s/it] {'loss': 0.4549, 'grad_norm': 4.518250999036192, 'learning_rate': 1.7166767443017567e-07, 'epoch': 0.88} 88%|████████▊ | 10896/12313 [8:09:34<1:01:13, 2.59s/it] 88%|████████▊ | 10897/12313 [8:09:37<1:01:40, 2.61s/it] {'loss': 0.7135, 'grad_norm': 3.165277818453232, 'learning_rate': 1.7142826945628353e-07, 'epoch': 0.88} 88%|████████▊ | 10897/12313 [8:09:37<1:01:40, 2.61s/it] 89%|████████▊ | 10898/12313 [8:09:39<1:01:59, 2.63s/it] {'loss': 0.5167, 'grad_norm': 14.285562335378861, 'learning_rate': 1.7118902560723072e-07, 'epoch': 0.89} 89%|████████▊ | 10898/12313 [8:09:39<1:01:59, 2.63s/it] 89%|████████▊ | 10899/12313 [8:09:42<1:02:40, 2.66s/it] {'loss': 0.4811, 'grad_norm': 5.520164322684935, 'learning_rate': 1.7094994289957285e-07, 'epoch': 0.89} 89%|████████▊ | 10899/12313 [8:09:42<1:02:40, 2.66s/it] 89%|████████▊ | 10900/12313 [8:09:45<1:01:17, 2.60s/it] {'loss': 0.4762, 'grad_norm': 6.172181533567202, 'learning_rate': 1.7071102134985224e-07, 'epoch': 0.89} 89%|████████▊ | 10900/12313 [8:09:45<1:01:17, 2.60s/it] 89%|████████▊ | 10901/12313 [8:09:48<1:05:03, 2.76s/it] {'loss': 0.3914, 'grad_norm': 4.846940632513523, 'learning_rate': 1.7047226097460123e-07, 'epoch': 0.89} 89%|████████▊ | 10901/12313 [8:09:48<1:05:03, 2.76s/it] 89%|████████▊ | 10902/12313 [8:09:51<1:08:03, 2.89s/it] {'loss': 0.4715, 'grad_norm': 3.9531646427451927, 'learning_rate': 1.7023366179034135e-07, 'epoch': 0.89} 89%|████████▊ | 10902/12313 [8:09:51<1:08:03, 2.89s/it] 89%|████████▊ | 10903/12313 [8:09:54<1:06:22, 2.82s/it] {'loss': 0.3049, 'grad_norm': 7.419094636905311, 'learning_rate': 1.6999522381358187e-07, 'epoch': 0.89} 89%|████████▊ | 10903/12313 [8:09:54<1:06:22, 2.82s/it] 89%|████████▊ | 10904/12313 [8:09:56<1:05:29, 2.79s/it] {'loss': 0.4374, 'grad_norm': 8.695140393852423, 'learning_rate': 1.6975694706082125e-07, 'epoch': 0.89} 89%|████████▊ | 10904/12313 [8:09:56<1:05:29, 2.79s/it] 89%|████████▊ | 10905/12313 [8:09:59<1:04:26, 2.75s/it] {'loss': 0.3473, 'grad_norm': 4.688891686843388, 'learning_rate': 1.6951883154854771e-07, 'epoch': 0.89} 89%|████████▊ | 10905/12313 [8:09:59<1:04:26, 2.75s/it] 89%|████████▊ | 10906/12313 [8:10:02<1:03:42, 2.72s/it] {'loss': 0.4753, 'grad_norm': 9.242824312960995, 'learning_rate': 1.6928087729323695e-07, 'epoch': 0.89} 89%|████████▊ | 10906/12313 [8:10:02<1:03:42, 2.72s/it] 89%|████████▊ | 10907/12313 [8:10:04<1:03:33, 2.71s/it] {'loss': 0.4638, 'grad_norm': 21.34205228899008, 'learning_rate': 1.6904308431135414e-07, 'epoch': 0.89} 89%|████████▊ | 10907/12313 [8:10:04<1:03:33, 2.71s/it] 89%|████████▊ | 10908/12313 [8:10:07<1:02:15, 2.66s/it] {'loss': 0.4495, 'grad_norm': 7.234575319695479, 'learning_rate': 1.6880545261935333e-07, 'epoch': 0.89} 89%|████████▊ | 10908/12313 [8:10:07<1:02:15, 2.66s/it] 89%|████████▊ | 10909/12313 [8:10:10<1:02:38, 2.68s/it] {'loss': 0.4903, 'grad_norm': 4.584768372230779, 'learning_rate': 1.6856798223367777e-07, 'epoch': 0.89} 89%|████████▊ | 10909/12313 [8:10:10<1:02:38, 2.68s/it] 89%|████████▊ | 10910/12313 [8:10:12<1:02:43, 2.68s/it] {'loss': 0.5838, 'grad_norm': 3.8405465855706966, 'learning_rate': 1.6833067317075875e-07, 'epoch': 0.89} 89%|████████▊ | 10910/12313 [8:10:12<1:02:43, 2.68s/it] 89%|████████▊ | 10911/12313 [8:10:15<1:03:08, 2.70s/it] {'loss': 0.3647, 'grad_norm': 11.336953000753638, 'learning_rate': 1.680935254470173e-07, 'epoch': 0.89} 89%|████████▊ | 10911/12313 [8:10:15<1:03:08, 2.70s/it] 89%|████████▊ | 10912/12313 [8:10:18<1:04:23, 2.76s/it] {'loss': 0.4593, 'grad_norm': 3.4138677524491783, 'learning_rate': 1.6785653907886251e-07, 'epoch': 0.89} 89%|████████▊ | 10912/12313 [8:10:18<1:04:23, 2.76s/it] 89%|████████▊ | 10913/12313 [8:10:21<1:03:56, 2.74s/it] {'loss': 0.4414, 'grad_norm': 7.281119584083047, 'learning_rate': 1.6761971408269184e-07, 'epoch': 0.89} 89%|████████▊ | 10913/12313 [8:10:21<1:03:56, 2.74s/it] 89%|████████▊ | 10914/12313 [8:10:23<1:04:01, 2.75s/it] {'loss': 0.5625, 'grad_norm': 8.096587754790656, 'learning_rate': 1.673830504748933e-07, 'epoch': 0.89} 89%|████████▊ | 10914/12313 [8:10:23<1:04:01, 2.75s/it] 89%|████████▊ | 10915/12313 [8:10:26<1:04:01, 2.75s/it] {'loss': 0.6731, 'grad_norm': 4.3613469827373725, 'learning_rate': 1.6714654827184263e-07, 'epoch': 0.89} 89%|████████▊ | 10915/12313 [8:10:26<1:04:01, 2.75s/it] 89%|████████▊ | 10916/12313 [8:10:29<1:02:52, 2.70s/it] {'loss': 0.4585, 'grad_norm': 36.26226393417871, 'learning_rate': 1.6691020748990455e-07, 'epoch': 0.89} 89%|████████▊ | 10916/12313 [8:10:29<1:02:52, 2.70s/it] 89%|████████▊ | 10917/12313 [8:10:31<1:00:26, 2.60s/it] {'loss': 0.5141, 'grad_norm': 4.750194227102958, 'learning_rate': 1.6667402814543209e-07, 'epoch': 0.89} 89%|████████▊ | 10917/12313 [8:10:31<1:00:26, 2.60s/it] 89%|████████▊ | 10918/12313 [8:10:33<58:55, 2.53s/it] {'loss': 0.6047, 'grad_norm': 7.5959663569614495, 'learning_rate': 1.66438010254768e-07, 'epoch': 0.89} 89%|████████▊ | 10918/12313 [8:10:33<58:55, 2.53s/it] 89%|████████▊ | 10919/12313 [8:10:36<59:06, 2.54s/it] {'loss': 0.5965, 'grad_norm': 6.985478995191811, 'learning_rate': 1.662021538342437e-07, 'epoch': 0.89} 89%|████████▊ | 10919/12313 [8:10:36<59:06, 2.54s/it] 89%|████████▊ | 10920/12313 [8:10:39<1:02:38, 2.70s/it] {'loss': 0.4358, 'grad_norm': 4.967674400774454, 'learning_rate': 1.6596645890017832e-07, 'epoch': 0.89} 89%|████████▊ | 10920/12313 [8:10:39<1:02:38, 2.70s/it] 89%|████████▊ | 10921/12313 [8:10:41<1:00:45, 2.62s/it] {'loss': 0.4619, 'grad_norm': 4.562428672417342, 'learning_rate': 1.6573092546888132e-07, 'epoch': 0.89} 89%|████████▊ | 10921/12313 [8:10:41<1:00:45, 2.62s/it] 89%|████████▊ | 10922/12313 [8:10:44<1:02:32, 2.70s/it] {'loss': 0.452, 'grad_norm': 4.147461101360203, 'learning_rate': 1.6549555355665076e-07, 'epoch': 0.89} 89%|████████▊ | 10922/12313 [8:10:44<1:02:32, 2.70s/it] 89%|████████▊ | 10923/12313 [8:10:47<1:02:20, 2.69s/it] {'loss': 0.5393, 'grad_norm': 3.695271801611688, 'learning_rate': 1.6526034317977225e-07, 'epoch': 0.89} 89%|████████▊ | 10923/12313 [8:10:47<1:02:20, 2.69s/it] 89%|████████▊ | 10924/12313 [8:10:50<1:01:16, 2.65s/it] {'loss': 0.4885, 'grad_norm': 4.837405408033375, 'learning_rate': 1.650252943545222e-07, 'epoch': 0.89} 89%|████████▊ | 10924/12313 [8:10:50<1:01:16, 2.65s/it] 89%|████████▊ | 10925/12313 [8:10:52<1:01:34, 2.66s/it] {'loss': 0.5001, 'grad_norm': 4.704508929048984, 'learning_rate': 1.647904070971637e-07, 'epoch': 0.89} 89%|████████▊ | 10925/12313 [8:10:52<1:01:34, 2.66s/it] 89%|████████▊ | 10926/12313 [8:10:55<1:00:53, 2.63s/it] {'loss': 0.4963, 'grad_norm': 3.9115802054782427, 'learning_rate': 1.645556814239499e-07, 'epoch': 0.89} 89%|████████▊ | 10926/12313 [8:10:55<1:00:53, 2.63s/it] 89%|████████▊ | 10927/12313 [8:10:57<1:00:16, 2.61s/it] {'loss': 0.4367, 'grad_norm': 7.938004047544957, 'learning_rate': 1.6432111735112277e-07, 'epoch': 0.89} 89%|████████▊ | 10927/12313 [8:10:57<1:00:16, 2.61s/it] 89%|████████▉ | 10928/12313 [8:11:00<58:45, 2.55s/it] {'loss': 0.3794, 'grad_norm': 12.387873573548433, 'learning_rate': 1.6408671489491323e-07, 'epoch': 0.89} 89%|████████▉ | 10928/12313 [8:11:00<58:45, 2.55s/it] 89%|████████▉ | 10929/12313 [8:11:03<1:01:56, 2.69s/it] {'loss': 0.4554, 'grad_norm': 7.148993736986469, 'learning_rate': 1.6385247407154025e-07, 'epoch': 0.89} 89%|████████▉ | 10929/12313 [8:11:03<1:01:56, 2.69s/it] 89%|████████▉ | 10930/12313 [8:11:05<1:01:50, 2.68s/it] {'loss': 0.46, 'grad_norm': 6.143994601377806, 'learning_rate': 1.6361839489721227e-07, 'epoch': 0.89} 89%|████████▉ | 10930/12313 [8:11:05<1:01:50, 2.68s/it] 89%|████████▉ | 10931/12313 [8:11:08<1:01:30, 2.67s/it] {'loss': 0.6431, 'grad_norm': 21.816971684053996, 'learning_rate': 1.6338447738812628e-07, 'epoch': 0.89} 89%|████████▉ | 10931/12313 [8:11:08<1:01:30, 2.67s/it] 89%|████████▉ | 10932/12313 [8:11:11<1:01:55, 2.69s/it] {'loss': 0.4062, 'grad_norm': 6.378758988903642, 'learning_rate': 1.631507215604683e-07, 'epoch': 0.89} 89%|████████▉ | 10932/12313 [8:11:11<1:01:55, 2.69s/it] 89%|████████▉ | 10933/12313 [8:11:14<1:01:44, 2.68s/it] {'loss': 0.4481, 'grad_norm': 8.818910417308302, 'learning_rate': 1.6291712743041226e-07, 'epoch': 0.89} 89%|████████▉ | 10933/12313 [8:11:14<1:01:44, 2.68s/it] 89%|████████▉ | 10934/12313 [8:11:16<1:01:51, 2.69s/it] {'loss': 0.6451, 'grad_norm': 3.3592851595714537, 'learning_rate': 1.6268369501412195e-07, 'epoch': 0.89} 89%|████████▉ | 10934/12313 [8:11:16<1:01:51, 2.69s/it] 89%|████████▉ | 10935/12313 [8:11:19<1:03:44, 2.78s/it] {'loss': 0.4928, 'grad_norm': 3.537250723511631, 'learning_rate': 1.6245042432775054e-07, 'epoch': 0.89} 89%|████████▉ | 10935/12313 [8:11:19<1:03:44, 2.78s/it] 89%|████████▉ | 10936/12313 [8:11:22<1:02:12, 2.71s/it] {'loss': 0.5083, 'grad_norm': 6.147519517852104, 'learning_rate': 1.622173153874379e-07, 'epoch': 0.89} 89%|████████▉ | 10936/12313 [8:11:22<1:02:12, 2.71s/it] 89%|████████▉ | 10937/12313 [8:11:24<59:38, 2.60s/it] {'loss': 0.4876, 'grad_norm': 3.472301797973062, 'learning_rate': 1.61984368209315e-07, 'epoch': 0.89} 89%|████████▉ | 10937/12313 [8:11:24<59:38, 2.60s/it] 89%|████████▉ | 10938/12313 [8:11:27<1:00:27, 2.64s/it] {'loss': 0.3319, 'grad_norm': 4.755539554685603, 'learning_rate': 1.617515828095001e-07, 'epoch': 0.89} 89%|████████▉ | 10938/12313 [8:11:27<1:00:27, 2.64s/it] 89%|████████▉ | 10939/12313 [8:11:29<1:00:16, 2.63s/it] {'loss': 0.5498, 'grad_norm': 9.123690619357854, 'learning_rate': 1.615189592041e-07, 'epoch': 0.89} 89%|████████▉ | 10939/12313 [8:11:29<1:00:16, 2.63s/it] 89%|████████▉ | 10940/12313 [8:11:32<59:03, 2.58s/it] {'loss': 0.4218, 'grad_norm': 5.947150990803091, 'learning_rate': 1.6128649740921182e-07, 'epoch': 0.89} 89%|████████▉ | 10940/12313 [8:11:32<59:03, 2.58s/it] 89%|████████▉ | 10941/12313 [8:11:34<58:13, 2.55s/it] {'loss': 0.5246, 'grad_norm': 6.785328375370574, 'learning_rate': 1.6105419744092105e-07, 'epoch': 0.89} 89%|████████▉ | 10941/12313 [8:11:34<58:13, 2.55s/it] 89%|████████▉ | 10942/12313 [8:11:37<58:16, 2.55s/it] {'loss': 0.4103, 'grad_norm': 7.518002211907693, 'learning_rate': 1.6082205931530064e-07, 'epoch': 0.89} 89%|████████▉ | 10942/12313 [8:11:37<58:16, 2.55s/it] 89%|████████▉ | 10943/12313 [8:11:39<57:33, 2.52s/it] {'loss': 0.422, 'grad_norm': 5.241816204201098, 'learning_rate': 1.6059008304841417e-07, 'epoch': 0.89} 89%|████████▉ | 10943/12313 [8:11:39<57:33, 2.52s/it] 89%|████████▉ | 10944/12313 [8:11:42<1:00:06, 2.63s/it] {'loss': 0.4762, 'grad_norm': 4.9351607203741175, 'learning_rate': 1.6035826865631292e-07, 'epoch': 0.89} 89%|████████▉ | 10944/12313 [8:11:42<1:00:06, 2.63s/it] 89%|████████▉ | 10945/12313 [8:11:45<58:20, 2.56s/it] {'loss': 0.6662, 'grad_norm': 19.003799619323694, 'learning_rate': 1.601266161550366e-07, 'epoch': 0.89} 89%|████████▉ | 10945/12313 [8:11:45<58:20, 2.56s/it] 89%|████████▉ | 10946/12313 [8:11:47<59:29, 2.61s/it] {'loss': 0.6389, 'grad_norm': 3.8825556090436786, 'learning_rate': 1.5989512556061516e-07, 'epoch': 0.89} 89%|████████▉ | 10946/12313 [8:11:47<59:29, 2.61s/it] 89%|████████▉ | 10947/12313 [8:11:50<58:53, 2.59s/it] {'loss': 0.4424, 'grad_norm': 7.947300455523199, 'learning_rate': 1.5966379688906576e-07, 'epoch': 0.89} 89%|████████▉ | 10947/12313 [8:11:50<58:53, 2.59s/it] 89%|████████▉ | 10948/12313 [8:11:52<57:49, 2.54s/it] {'loss': 0.4964, 'grad_norm': 5.795613017876368, 'learning_rate': 1.5943263015639614e-07, 'epoch': 0.89} 89%|████████▉ | 10948/12313 [8:11:52<57:49, 2.54s/it] 89%|████████▉ | 10949/12313 [8:11:55<56:41, 2.49s/it] {'loss': 0.6119, 'grad_norm': 6.8505646268664515, 'learning_rate': 1.592016253786008e-07, 'epoch': 0.89} 89%|████████▉ | 10949/12313 [8:11:55<56:41, 2.49s/it] 89%|████████▉ | 10950/12313 [8:11:57<57:33, 2.53s/it] {'loss': 0.4562, 'grad_norm': 8.889358889295613, 'learning_rate': 1.5897078257166492e-07, 'epoch': 0.89} 89%|████████▉ | 10950/12313 [8:11:57<57:33, 2.53s/it] 89%|████████▉ | 10951/12313 [8:12:00<57:33, 2.54s/it] {'loss': 0.4509, 'grad_norm': 5.6468959215119225, 'learning_rate': 1.5874010175156106e-07, 'epoch': 0.89} 89%|████████▉ | 10951/12313 [8:12:00<57:33, 2.54s/it] 89%|████████▉ | 10952/12313 [8:12:02<57:23, 2.53s/it] {'loss': 0.3751, 'grad_norm': 5.671002281059167, 'learning_rate': 1.585095829342509e-07, 'epoch': 0.89} 89%|████████▉ | 10952/12313 [8:12:02<57:23, 2.53s/it] 89%|████████▉ | 10953/12313 [8:12:05<56:34, 2.50s/it] {'loss': 0.4659, 'grad_norm': 7.946535661783076, 'learning_rate': 1.5827922613568524e-07, 'epoch': 0.89} 89%|████████▉ | 10953/12313 [8:12:05<56:34, 2.50s/it] 89%|████████▉ | 10954/12313 [8:12:07<57:16, 2.53s/it] {'loss': 0.3708, 'grad_norm': 5.9522004926872025, 'learning_rate': 1.5804903137180415e-07, 'epoch': 0.89} 89%|████████▉ | 10954/12313 [8:12:07<57:16, 2.53s/it] 89%|████████▉ | 10955/12313 [8:12:10<57:30, 2.54s/it] {'loss': 0.5587, 'grad_norm': 10.178045190705534, 'learning_rate': 1.5781899865853544e-07, 'epoch': 0.89} 89%|████████▉ | 10955/12313 [8:12:10<57:30, 2.54s/it] 89%|████████▉ | 10956/12313 [8:12:13<57:07, 2.53s/it] {'loss': 0.4611, 'grad_norm': 4.253031697214437, 'learning_rate': 1.5758912801179637e-07, 'epoch': 0.89} 89%|████████▉ | 10956/12313 [8:12:13<57:07, 2.53s/it] 89%|████████▉ | 10957/12313 [8:12:15<57:31, 2.55s/it] {'loss': 0.4811, 'grad_norm': 6.692152756992295, 'learning_rate': 1.5735941944749255e-07, 'epoch': 0.89} 89%|████████▉ | 10957/12313 [8:12:15<57:31, 2.55s/it] 89%|████████▉ | 10958/12313 [8:12:18<58:31, 2.59s/it] {'loss': 0.3912, 'grad_norm': 4.694878917526122, 'learning_rate': 1.571298729815182e-07, 'epoch': 0.89} 89%|████████▉ | 10958/12313 [8:12:18<58:31, 2.59s/it] 89%|████████▉ | 10959/12313 [8:12:20<57:49, 2.56s/it] {'loss': 0.5173, 'grad_norm': 5.740301752503594, 'learning_rate': 1.569004886297576e-07, 'epoch': 0.89} 89%|████████▉ | 10959/12313 [8:12:20<57:49, 2.56s/it] 89%|████████▉ | 10960/12313 [8:12:23<56:49, 2.52s/it] {'loss': 0.5058, 'grad_norm': 4.500280402580769, 'learning_rate': 1.5667126640808216e-07, 'epoch': 0.89} 89%|████████▉ | 10960/12313 [8:12:23<56:49, 2.52s/it] 89%|████████▉ | 10961/12313 [8:12:25<56:59, 2.53s/it] {'loss': 0.4839, 'grad_norm': 9.359821537794131, 'learning_rate': 1.564422063323534e-07, 'epoch': 0.89} 89%|████████▉ | 10961/12313 [8:12:25<56:59, 2.53s/it] 89%|████████▉ | 10962/12313 [8:12:28<56:53, 2.53s/it] {'loss': 0.3788, 'grad_norm': 11.943572724510128, 'learning_rate': 1.5621330841842086e-07, 'epoch': 0.89} 89%|████████▉ | 10962/12313 [8:12:28<56:53, 2.53s/it] 89%|████████▉ | 10963/12313 [8:12:31<1:00:30, 2.69s/it] {'loss': 0.4711, 'grad_norm': 5.141666870810457, 'learning_rate': 1.5598457268212353e-07, 'epoch': 0.89} 89%|████████▉ | 10963/12313 [8:12:31<1:00:30, 2.69s/it] 89%|████████▉ | 10964/12313 [8:12:34<1:02:54, 2.80s/it] {'loss': 0.3897, 'grad_norm': 3.330373351694768, 'learning_rate': 1.5575599913928735e-07, 'epoch': 0.89} 89%|████████▉ | 10964/12313 [8:12:34<1:02:54, 2.80s/it] 89%|████████▉ | 10965/12313 [8:12:37<1:01:21, 2.73s/it] {'loss': 0.4227, 'grad_norm': 3.825856052673269, 'learning_rate': 1.5552758780572995e-07, 'epoch': 0.89} 89%|████████▉ | 10965/12313 [8:12:37<1:01:21, 2.73s/it] 89%|████████▉ | 10966/12313 [8:12:39<1:01:01, 2.72s/it] {'loss': 0.4816, 'grad_norm': 5.142764433332306, 'learning_rate': 1.552993386972551e-07, 'epoch': 0.89} 89%|████████▉ | 10966/12313 [8:12:39<1:01:01, 2.72s/it] 89%|████████▉ | 10967/12313 [8:12:42<1:00:12, 2.68s/it] {'loss': 0.684, 'grad_norm': 6.3600498260391785, 'learning_rate': 1.5507125182965737e-07, 'epoch': 0.89} 89%|████████▉ | 10967/12313 [8:12:42<1:00:12, 2.68s/it] 89%|████████▉ | 10968/12313 [8:12:44<59:23, 2.65s/it] {'loss': 0.5193, 'grad_norm': 8.817725414443457, 'learning_rate': 1.5484332721871804e-07, 'epoch': 0.89} 89%|████████▉ | 10968/12313 [8:12:44<59:23, 2.65s/it] 89%|████████▉ | 10969/12313 [8:12:47<59:06, 2.64s/it] {'loss': 0.4265, 'grad_norm': 7.292802786468984, 'learning_rate': 1.5461556488020945e-07, 'epoch': 0.89} 89%|████████▉ | 10969/12313 [8:12:47<59:06, 2.64s/it] 89%|████████▉ | 10970/12313 [8:12:50<59:11, 2.64s/it] {'loss': 0.4049, 'grad_norm': 18.816459578152827, 'learning_rate': 1.5438796482989072e-07, 'epoch': 0.89} 89%|████████▉ | 10970/12313 [8:12:50<59:11, 2.64s/it] 89%|████████▉ | 10971/12313 [8:12:52<58:27, 2.61s/it] {'loss': 0.444, 'grad_norm': 4.741507710988063, 'learning_rate': 1.541605270835106e-07, 'epoch': 0.89} 89%|████████▉ | 10971/12313 [8:12:52<58:27, 2.61s/it] 89%|████████▉ | 10972/12313 [8:12:55<59:03, 2.64s/it] {'loss': 0.4207, 'grad_norm': 11.630609215202053, 'learning_rate': 1.5393325165680707e-07, 'epoch': 0.89} 89%|████████▉ | 10972/12313 [8:12:55<59:03, 2.64s/it] 89%|████████▉ | 10973/12313 [8:12:58<59:13, 2.65s/it] {'loss': 0.5021, 'grad_norm': 3.301339255185129, 'learning_rate': 1.5370613856550615e-07, 'epoch': 0.89} 89%|████████▉ | 10973/12313 [8:12:58<59:13, 2.65s/it] 89%|████████▉ | 10974/12313 [8:13:00<58:16, 2.61s/it] {'loss': 0.3981, 'grad_norm': 4.4372396364921105, 'learning_rate': 1.534791878253228e-07, 'epoch': 0.89} 89%|████████▉ | 10974/12313 [8:13:00<58:16, 2.61s/it] 89%|████████▉ | 10975/12313 [8:13:03<59:55, 2.69s/it] {'loss': 0.4659, 'grad_norm': 8.507200043869714, 'learning_rate': 1.5325239945196108e-07, 'epoch': 0.89} 89%|████████▉ | 10975/12313 [8:13:03<59:55, 2.69s/it] 89%|████████▉ | 10976/12313 [8:13:06<59:48, 2.68s/it] {'loss': 0.4745, 'grad_norm': 6.152072516412701, 'learning_rate': 1.530257734611132e-07, 'epoch': 0.89} 89%|████████▉ | 10976/12313 [8:13:06<59:48, 2.68s/it] 89%|████████▉ | 10977/12313 [8:13:08<58:20, 2.62s/it] {'loss': 0.6653, 'grad_norm': 3.5425159922458582, 'learning_rate': 1.5279930986846047e-07, 'epoch': 0.89} 89%|████████▉ | 10977/12313 [8:13:08<58:20, 2.62s/it] 89%|████████▉ | 10978/12313 [8:13:11<1:01:05, 2.75s/it] {'loss': 0.5044, 'grad_norm': 5.233121096684976, 'learning_rate': 1.5257300868967344e-07, 'epoch': 0.89} 89%|████████▉ | 10978/12313 [8:13:11<1:01:05, 2.75s/it] 89%|████████▉ | 10979/12313 [8:13:14<1:00:14, 2.71s/it] {'loss': 0.4987, 'grad_norm': 6.291818310904449, 'learning_rate': 1.5234686994041016e-07, 'epoch': 0.89} 89%|████████▉ | 10979/12313 [8:13:14<1:00:14, 2.71s/it] 89%|████████▉ | 10980/12313 [8:13:16<59:57, 2.70s/it] {'loss': 0.3563, 'grad_norm': 5.098504129650742, 'learning_rate': 1.521208936363186e-07, 'epoch': 0.89} 89%|████████▉ | 10980/12313 [8:13:16<59:57, 2.70s/it] 89%|████████▉ | 10981/12313 [8:13:19<58:06, 2.62s/it] {'loss': 0.7008, 'grad_norm': 4.803655200270107, 'learning_rate': 1.5189507979303575e-07, 'epoch': 0.89} 89%|████████▉ | 10981/12313 [8:13:19<58:06, 2.62s/it] 89%|████████▉ | 10982/12313 [8:13:21<56:48, 2.56s/it] {'loss': 0.3977, 'grad_norm': 4.05749114440078, 'learning_rate': 1.5166942842618632e-07, 'epoch': 0.89} 89%|████████▉ | 10982/12313 [8:13:21<56:48, 2.56s/it] 89%|████████▉ | 10983/12313 [8:13:24<56:25, 2.55s/it] {'loss': 0.4221, 'grad_norm': 6.222213392758151, 'learning_rate': 1.5144393955138336e-07, 'epoch': 0.89} 89%|████████▉ | 10983/12313 [8:13:24<56:25, 2.55s/it] 89%|████████▉ | 10984/12313 [8:13:27<58:03, 2.62s/it] {'loss': 0.4855, 'grad_norm': 3.854822685859984, 'learning_rate': 1.512186131842308e-07, 'epoch': 0.89} 89%|████████▉ | 10984/12313 [8:13:27<58:03, 2.62s/it] 89%|████████▉ | 10985/12313 [8:13:29<57:50, 2.61s/it] {'loss': 0.4516, 'grad_norm': 7.555916028857482, 'learning_rate': 1.5099344934031923e-07, 'epoch': 0.89} 89%|████████▉ | 10985/12313 [8:13:29<57:50, 2.61s/it] 89%|████████▉ | 10986/12313 [8:13:32<58:12, 2.63s/it] {'loss': 0.415, 'grad_norm': 8.539386703282604, 'learning_rate': 1.507684480352292e-07, 'epoch': 0.89} 89%|████████▉ | 10986/12313 [8:13:32<58:12, 2.63s/it] 89%|████████▉ | 10987/12313 [8:13:35<1:00:13, 2.73s/it] {'loss': 0.4186, 'grad_norm': 4.964839789056717, 'learning_rate': 1.5054360928452915e-07, 'epoch': 0.89} 89%|████████▉ | 10987/12313 [8:13:35<1:00:13, 2.73s/it] 89%|████████▉ | 10988/12313 [8:13:38<1:01:15, 2.77s/it] {'loss': 0.5064, 'grad_norm': 6.084627536003772, 'learning_rate': 1.5031893310377716e-07, 'epoch': 0.89} 89%|████████▉ | 10988/12313 [8:13:38<1:01:15, 2.77s/it] 89%|████████▉ | 10989/12313 [8:13:40<1:00:12, 2.73s/it] {'loss': 0.5235, 'grad_norm': 3.940604600092638, 'learning_rate': 1.5009441950851965e-07, 'epoch': 0.89} 89%|████████▉ | 10989/12313 [8:13:40<1:00:12, 2.73s/it] 89%|████████▉ | 10990/12313 [8:13:43<1:00:17, 2.73s/it] {'loss': 0.3932, 'grad_norm': 5.3695438746577375, 'learning_rate': 1.4987006851429147e-07, 'epoch': 0.89} 89%|████████▉ | 10990/12313 [8:13:43<1:00:17, 2.73s/it] 89%|████████▉ | 10991/12313 [8:13:46<58:51, 2.67s/it] {'loss': 0.3775, 'grad_norm': 9.145034230965127, 'learning_rate': 1.4964588013661657e-07, 'epoch': 0.89} 89%|████████▉ | 10991/12313 [8:13:46<58:51, 2.67s/it] 89%|████████▉ | 10992/12313 [8:13:48<58:25, 2.65s/it] {'loss': 0.4791, 'grad_norm': 7.658853077669966, 'learning_rate': 1.4942185439100753e-07, 'epoch': 0.89} 89%|████████▉ | 10992/12313 [8:13:48<58:25, 2.65s/it] 89%|████████▉ | 10993/12313 [8:13:51<57:57, 2.63s/it] {'loss': 0.3937, 'grad_norm': 7.168128470178995, 'learning_rate': 1.4919799129296615e-07, 'epoch': 0.89} 89%|████████▉ | 10993/12313 [8:13:51<57:57, 2.63s/it] 89%|████████▉ | 10994/12313 [8:13:53<57:48, 2.63s/it] {'loss': 0.4437, 'grad_norm': 5.628929909843431, 'learning_rate': 1.489742908579822e-07, 'epoch': 0.89} 89%|████████▉ | 10994/12313 [8:13:53<57:48, 2.63s/it] 89%|████████▉ | 10995/12313 [8:13:56<58:05, 2.64s/it] {'loss': 0.4699, 'grad_norm': 6.054942900449149, 'learning_rate': 1.4875075310153504e-07, 'epoch': 0.89} 89%|████████▉ | 10995/12313 [8:13:56<58:05, 2.64s/it] 89%|████████▉ | 10996/12313 [8:13:59<57:36, 2.62s/it] {'loss': 0.6048, 'grad_norm': 5.835599987062635, 'learning_rate': 1.4852737803909167e-07, 'epoch': 0.89} 89%|████████▉ | 10996/12313 [8:13:59<57:36, 2.62s/it] 89%|████████▉ | 10997/12313 [8:14:01<58:32, 2.67s/it] {'loss': 0.5113, 'grad_norm': 7.175331718017556, 'learning_rate': 1.4830416568610893e-07, 'epoch': 0.89} 89%|████████▉ | 10997/12313 [8:14:01<58:32, 2.67s/it] 89%|████████▉ | 10998/12313 [8:14:04<58:43, 2.68s/it] {'loss': 0.4906, 'grad_norm': 8.62919238424243, 'learning_rate': 1.4808111605803117e-07, 'epoch': 0.89} 89%|████████▉ | 10998/12313 [8:14:04<58:43, 2.68s/it] 89%|████████▉ | 10999/12313 [8:14:07<58:30, 2.67s/it] {'loss': 0.415, 'grad_norm': 11.718705876342495, 'learning_rate': 1.4785822917029318e-07, 'epoch': 0.89} 89%|████████▉ | 10999/12313 [8:14:07<58:30, 2.67s/it] 89%|████████▉ | 11000/12313 [8:14:10<59:02, 2.70s/it] {'loss': 0.483, 'grad_norm': 6.786219876737145, 'learning_rate': 1.476355050383174e-07, 'epoch': 0.89} 89%|████████▉ | 11000/12313 [8:14:10<59:02, 2.70s/it] 89%|████████▉ | 11001/12313 [8:14:12<58:51, 2.69s/it] {'loss': 0.5062, 'grad_norm': 4.549779356360936, 'learning_rate': 1.4741294367751484e-07, 'epoch': 0.89} 89%|████████▉ | 11001/12313 [8:14:12<58:51, 2.69s/it] 89%|████████▉ | 11002/12313 [8:14:15<58:36, 2.68s/it] {'loss': 0.3839, 'grad_norm': 4.594436506496313, 'learning_rate': 1.4719054510328595e-07, 'epoch': 0.89} 89%|████████▉ | 11002/12313 [8:14:15<58:36, 2.68s/it] 89%|████████▉ | 11003/12313 [8:14:17<57:46, 2.65s/it] {'loss': 0.4757, 'grad_norm': 3.87676469142472, 'learning_rate': 1.4696830933101868e-07, 'epoch': 0.89} 89%|████████▉ | 11003/12313 [8:14:17<57:46, 2.65s/it] 89%|████████▉ | 11004/12313 [8:14:20<56:53, 2.61s/it] {'loss': 0.5542, 'grad_norm': 4.757027256089799, 'learning_rate': 1.467462363760916e-07, 'epoch': 0.89} 89%|████████▉ | 11004/12313 [8:14:20<56:53, 2.61s/it] 89%|████████▉ | 11005/12313 [8:14:24<1:05:09, 2.99s/it] {'loss': 0.4679, 'grad_norm': 3.9669910321207587, 'learning_rate': 1.4652432625387013e-07, 'epoch': 0.89} 89%|████████▉ | 11005/12313 [8:14:24<1:05:09, 2.99s/it] 89%|████████▉ | 11006/12313 [8:14:27<1:04:54, 2.98s/it] {'loss': 0.4068, 'grad_norm': 5.6200804103086295, 'learning_rate': 1.4630257897970985e-07, 'epoch': 0.89} 89%|████████▉ | 11006/12313 [8:14:27<1:04:54, 2.98s/it] 89%|████████▉ | 11007/12313 [8:14:29<1:02:56, 2.89s/it] {'loss': 0.4993, 'grad_norm': 3.8084388168574894, 'learning_rate': 1.4608099456895452e-07, 'epoch': 0.89} 89%|████████▉ | 11007/12313 [8:14:30<1:02:56, 2.89s/it] 89%|████████▉ | 11008/12313 [8:14:32<1:01:42, 2.84s/it] {'loss': 0.4431, 'grad_norm': 6.108741515365576, 'learning_rate': 1.4585957303693664e-07, 'epoch': 0.89} 89%|████████▉ | 11008/12313 [8:14:32<1:01:42, 2.84s/it] 89%|████████▉ | 11009/12313 [8:14:35<1:00:20, 2.78s/it] {'loss': 0.5353, 'grad_norm': 5.264223037773271, 'learning_rate': 1.4563831439897647e-07, 'epoch': 0.89} 89%|████████▉ | 11009/12313 [8:14:35<1:00:20, 2.78s/it] 89%|████████▉ | 11010/12313 [8:14:37<58:58, 2.72s/it] {'loss': 0.4353, 'grad_norm': 5.540285793593693, 'learning_rate': 1.4541721867038532e-07, 'epoch': 0.89} 89%|████████▉ | 11010/12313 [8:14:37<58:58, 2.72s/it] 89%|████████▉ | 11011/12313 [8:14:40<58:18, 2.69s/it] {'loss': 0.4418, 'grad_norm': 21.662383239919144, 'learning_rate': 1.4519628586646073e-07, 'epoch': 0.89} 89%|████████▉ | 11011/12313 [8:14:40<58:18, 2.69s/it] 89%|████████▉ | 11012/12313 [8:14:42<56:46, 2.62s/it] {'loss': 0.4228, 'grad_norm': 7.463645051609318, 'learning_rate': 1.4497551600249044e-07, 'epoch': 0.89} 89%|████████▉ | 11012/12313 [8:14:42<56:46, 2.62s/it] 89%|████████▉ | 11013/12313 [8:14:45<56:44, 2.62s/it] {'loss': 0.5227, 'grad_norm': 28.77597330752949, 'learning_rate': 1.447549090937511e-07, 'epoch': 0.89} 89%|████████▉ | 11013/12313 [8:14:45<56:44, 2.62s/it] 89%|████████▉ | 11014/12313 [8:14:48<56:42, 2.62s/it] {'loss': 0.4077, 'grad_norm': 3.957758324457693, 'learning_rate': 1.4453446515550724e-07, 'epoch': 0.89} 89%|████████▉ | 11014/12313 [8:14:48<56:42, 2.62s/it] 89%|████████▉ | 11015/12313 [8:14:51<57:39, 2.67s/it] {'loss': 0.5104, 'grad_norm': 7.878758982064296, 'learning_rate': 1.4431418420301157e-07, 'epoch': 0.89} 89%|████████▉ | 11015/12313 [8:14:51<57:39, 2.67s/it] 89%|████████▉ | 11016/12313 [8:14:53<55:39, 2.57s/it] {'loss': 0.6012, 'grad_norm': 5.62814185101261, 'learning_rate': 1.440940662515075e-07, 'epoch': 0.89} 89%|████████▉ | 11016/12313 [8:14:53<55:39, 2.57s/it] 89%|████████▉ | 11017/12313 [8:14:55<55:38, 2.58s/it] {'loss': 0.4324, 'grad_norm': 5.517645951231921, 'learning_rate': 1.4387411131622592e-07, 'epoch': 0.89} 89%|████████▉ | 11017/12313 [8:14:55<55:38, 2.58s/it] 89%|████████▉ | 11018/12313 [8:14:58<55:18, 2.56s/it] {'loss': 0.4455, 'grad_norm': 4.575883407647924, 'learning_rate': 1.4365431941238544e-07, 'epoch': 0.89} 89%|████████▉ | 11018/12313 [8:14:58<55:18, 2.56s/it] 89%|████████▉ | 11019/12313 [8:15:00<54:11, 2.51s/it] {'loss': 0.3927, 'grad_norm': 4.43419935645762, 'learning_rate': 1.434346905551956e-07, 'epoch': 0.89} 89%|████████▉ | 11019/12313 [8:15:00<54:11, 2.51s/it] 89%|████████▉ | 11020/12313 [8:15:03<55:30, 2.58s/it] {'loss': 0.456, 'grad_norm': 5.32989054760345, 'learning_rate': 1.432152247598534e-07, 'epoch': 0.89} 89%|████████▉ | 11020/12313 [8:15:03<55:30, 2.58s/it] 90%|████████▉ | 11021/12313 [8:15:06<56:19, 2.62s/it] {'loss': 0.4092, 'grad_norm': 4.528118555469795, 'learning_rate': 1.4299592204154445e-07, 'epoch': 0.9} 90%|████████▉ | 11021/12313 [8:15:06<56:19, 2.62s/it] 90%|████████▉ | 11022/12313 [8:15:09<57:04, 2.65s/it] {'loss': 0.3977, 'grad_norm': 5.024689885658865, 'learning_rate': 1.4277678241544328e-07, 'epoch': 0.9} 90%|████████▉ | 11022/12313 [8:15:09<57:04, 2.65s/it] 90%|████████▉ | 11023/12313 [8:15:11<57:09, 2.66s/it] {'loss': 0.4807, 'grad_norm': 6.316835150350028, 'learning_rate': 1.4255780589671337e-07, 'epoch': 0.9} 90%|████████▉ | 11023/12313 [8:15:11<57:09, 2.66s/it] 90%|████████▉ | 11024/12313 [8:15:14<55:43, 2.59s/it] {'loss': 0.4474, 'grad_norm': 5.3616028545471845, 'learning_rate': 1.423389925005067e-07, 'epoch': 0.9} 90%|████████▉ | 11024/12313 [8:15:14<55:43, 2.59s/it] 90%|████████▉ | 11025/12313 [8:15:16<56:03, 2.61s/it] {'loss': 0.488, 'grad_norm': 3.2813886037054543, 'learning_rate': 1.421203422419637e-07, 'epoch': 0.9} 90%|████████▉ | 11025/12313 [8:15:16<56:03, 2.61s/it] 90%|████████▉ | 11026/12313 [8:15:19<56:26, 2.63s/it] {'loss': 0.4516, 'grad_norm': 8.181117262341452, 'learning_rate': 1.4190185513621473e-07, 'epoch': 0.9} 90%|████████▉ | 11026/12313 [8:15:19<56:26, 2.63s/it] 90%|████████▉ | 11027/12313 [8:15:22<55:59, 2.61s/it] {'loss': 0.486, 'grad_norm': 5.938608632171009, 'learning_rate': 1.416835311983772e-07, 'epoch': 0.9} 90%|████████▉ | 11027/12313 [8:15:22<55:59, 2.61s/it] 90%|████████▉ | 11028/12313 [8:15:24<55:48, 2.61s/it] {'loss': 0.4106, 'grad_norm': 9.269220540893665, 'learning_rate': 1.4146537044355785e-07, 'epoch': 0.9} 90%|████████▉ | 11028/12313 [8:15:24<55:48, 2.61s/it] 90%|████████▉ | 11029/12313 [8:15:27<55:08, 2.58s/it] {'loss': 0.4675, 'grad_norm': 18.999226973217382, 'learning_rate': 1.412473728868527e-07, 'epoch': 0.9} 90%|████████▉ | 11029/12313 [8:15:27<55:08, 2.58s/it] 90%|████████▉ | 11030/12313 [8:15:30<57:05, 2.67s/it] {'loss': 0.4947, 'grad_norm': 3.9421317838103103, 'learning_rate': 1.410295385433455e-07, 'epoch': 0.9} 90%|████████▉ | 11030/12313 [8:15:30<57:05, 2.67s/it] 90%|████████▉ | 11031/12313 [8:15:32<56:33, 2.65s/it] {'loss': 0.4467, 'grad_norm': 4.337954956064237, 'learning_rate': 1.4081186742810948e-07, 'epoch': 0.9} 90%|████████▉ | 11031/12313 [8:15:32<56:33, 2.65s/it] 90%|████████▉ | 11032/12313 [8:15:35<55:32, 2.60s/it] {'loss': 0.5096, 'grad_norm': 5.626043047717378, 'learning_rate': 1.4059435955620704e-07, 'epoch': 0.9} 90%|████████▉ | 11032/12313 [8:15:35<55:32, 2.60s/it] 90%|████████▉ | 11033/12313 [8:15:38<58:01, 2.72s/it] {'loss': 0.4278, 'grad_norm': 4.67727289541783, 'learning_rate': 1.403770149426878e-07, 'epoch': 0.9} 90%|████████▉ | 11033/12313 [8:15:38<58:01, 2.72s/it] 90%|████████▉ | 11034/12313 [8:15:40<57:38, 2.70s/it] {'loss': 0.5335, 'grad_norm': 10.738068197611934, 'learning_rate': 1.4015983360259055e-07, 'epoch': 0.9} 90%|████████▉ | 11034/12313 [8:15:40<57:38, 2.70s/it] 90%|████████▉ | 11035/12313 [8:15:43<56:54, 2.67s/it] {'loss': 0.3716, 'grad_norm': 4.8693579129133235, 'learning_rate': 1.3994281555094386e-07, 'epoch': 0.9} 90%|████████▉ | 11035/12313 [8:15:43<56:54, 2.67s/it] 90%|████████▉ | 11036/12313 [8:15:46<57:27, 2.70s/it] {'loss': 0.7287, 'grad_norm': 2.9347782845843136, 'learning_rate': 1.3972596080276402e-07, 'epoch': 0.9} 90%|████████▉ | 11036/12313 [8:15:46<57:27, 2.70s/it] 90%|████████▉ | 11037/12313 [8:15:48<57:38, 2.71s/it] {'loss': 0.3883, 'grad_norm': 4.942121031320022, 'learning_rate': 1.395092693730557e-07, 'epoch': 0.9} 90%|████████▉ | 11037/12313 [8:15:48<57:38, 2.71s/it] 90%|████████▉ | 11038/12313 [8:15:51<56:13, 2.65s/it] {'loss': 0.4705, 'grad_norm': 5.046562171098466, 'learning_rate': 1.3929274127681303e-07, 'epoch': 0.9} 90%|████████▉ | 11038/12313 [8:15:51<56:13, 2.65s/it] 90%|████████▉ | 11039/12313 [8:15:53<55:36, 2.62s/it] {'loss': 0.5613, 'grad_norm': 3.9477589888855906, 'learning_rate': 1.3907637652901957e-07, 'epoch': 0.9} 90%|████████▉ | 11039/12313 [8:15:53<55:36, 2.62s/it] 90%|████████▉ | 11040/12313 [8:15:56<56:24, 2.66s/it] {'loss': 0.39, 'grad_norm': 4.192994245868831, 'learning_rate': 1.3886017514464555e-07, 'epoch': 0.9} 90%|████████▉ | 11040/12313 [8:15:56<56:24, 2.66s/it] 90%|████████▉ | 11041/12313 [8:15:59<54:39, 2.58s/it] {'loss': 0.4266, 'grad_norm': 4.637113922674406, 'learning_rate': 1.3864413713865098e-07, 'epoch': 0.9} 90%|████████▉ | 11041/12313 [8:15:59<54:39, 2.58s/it] 90%|████████▉ | 11042/12313 [8:16:01<55:57, 2.64s/it] {'loss': 0.5181, 'grad_norm': 5.8016593697423975, 'learning_rate': 1.38428262525985e-07, 'epoch': 0.9} 90%|████████▉ | 11042/12313 [8:16:01<55:57, 2.64s/it] 90%|████████▉ | 11043/12313 [8:16:04<55:10, 2.61s/it] {'loss': 0.4214, 'grad_norm': 10.961817792927185, 'learning_rate': 1.3821255132158456e-07, 'epoch': 0.9} 90%|████████▉ | 11043/12313 [8:16:04<55:10, 2.61s/it] 90%|████████▉ | 11044/12313 [8:16:06<54:27, 2.57s/it] {'loss': 0.3722, 'grad_norm': 6.98872156389094, 'learning_rate': 1.3799700354037605e-07, 'epoch': 0.9} 90%|████████▉ | 11044/12313 [8:16:06<54:27, 2.57s/it] 90%|████████▉ | 11045/12313 [8:16:09<54:40, 2.59s/it] {'loss': 0.4123, 'grad_norm': 6.014898811863191, 'learning_rate': 1.3778161919727472e-07, 'epoch': 0.9} 90%|████████▉ | 11045/12313 [8:16:09<54:40, 2.59s/it] 90%|████████▉ | 11046/12313 [8:16:12<54:15, 2.57s/it] {'loss': 0.4154, 'grad_norm': 4.681509306562985, 'learning_rate': 1.3756639830718316e-07, 'epoch': 0.9} 90%|████████▉ | 11046/12313 [8:16:12<54:15, 2.57s/it] 90%|████████▉ | 11047/12313 [8:16:14<55:14, 2.62s/it] {'loss': 0.4988, 'grad_norm': 5.1158065640859425, 'learning_rate': 1.373513408849936e-07, 'epoch': 0.9} 90%|████████▉ | 11047/12313 [8:16:14<55:14, 2.62s/it] 90%|████████▉ | 11048/12313 [8:16:17<55:37, 2.64s/it] {'loss': 0.447, 'grad_norm': 5.355479142787348, 'learning_rate': 1.3713644694558742e-07, 'epoch': 0.9} 90%|████████▉ | 11048/12313 [8:16:17<55:37, 2.64s/it] 90%|████████▉ | 11049/12313 [8:16:20<56:15, 2.67s/it] {'loss': 0.4586, 'grad_norm': 5.532841373770518, 'learning_rate': 1.369217165038339e-07, 'epoch': 0.9} 90%|████████▉ | 11049/12313 [8:16:20<56:15, 2.67s/it] 90%|████████▉ | 11050/12313 [8:16:22<55:53, 2.66s/it] {'loss': 0.4633, 'grad_norm': 14.185804561707538, 'learning_rate': 1.367071495745906e-07, 'epoch': 0.9} 90%|████████▉ | 11050/12313 [8:16:22<55:53, 2.66s/it] 90%|████████▉ | 11051/12313 [8:16:25<56:49, 2.70s/it] {'loss': 0.539, 'grad_norm': 5.325288616133615, 'learning_rate': 1.3649274617270531e-07, 'epoch': 0.9} 90%|████████▉ | 11051/12313 [8:16:25<56:49, 2.70s/it] 90%|████████▉ | 11052/12313 [8:16:28<58:20, 2.78s/it] {'loss': 0.4839, 'grad_norm': 3.940700522997248, 'learning_rate': 1.3627850631301344e-07, 'epoch': 0.9} 90%|████████▉ | 11052/12313 [8:16:28<58:20, 2.78s/it] 90%|████████▉ | 11053/12313 [8:16:31<57:30, 2.74s/it] {'loss': 0.4044, 'grad_norm': 4.124491341239392, 'learning_rate': 1.3606443001033864e-07, 'epoch': 0.9} 90%|████████▉ | 11053/12313 [8:16:31<57:30, 2.74s/it] 90%|████████▉ | 11054/12313 [8:16:33<56:14, 2.68s/it] {'loss': 0.4023, 'grad_norm': 8.104468950407128, 'learning_rate': 1.3585051727949494e-07, 'epoch': 0.9} 90%|████████▉ | 11054/12313 [8:16:33<56:14, 2.68s/it] 90%|████████▉ | 11055/12313 [8:16:36<57:06, 2.72s/it] {'loss': 0.3839, 'grad_norm': 9.103539674813327, 'learning_rate': 1.3563676813528325e-07, 'epoch': 0.9} 90%|████████▉ | 11055/12313 [8:16:36<57:06, 2.72s/it] 90%|████████▉ | 11056/12313 [8:16:39<57:10, 2.73s/it] {'loss': 0.5269, 'grad_norm': 4.227081794199914, 'learning_rate': 1.354231825924937e-07, 'epoch': 0.9} 90%|████████▉ | 11056/12313 [8:16:39<57:10, 2.73s/it] 90%|████████▉ | 11057/12313 [8:16:41<56:27, 2.70s/it] {'loss': 0.3988, 'grad_norm': 4.986647735979627, 'learning_rate': 1.3520976066590557e-07, 'epoch': 0.9} 90%|████████▉ | 11057/12313 [8:16:41<56:27, 2.70s/it] 90%|████████▉ | 11058/12313 [8:16:44<55:10, 2.64s/it] {'loss': 0.3746, 'grad_norm': 4.88191309684984, 'learning_rate': 1.3499650237028677e-07, 'epoch': 0.9} 90%|████████▉ | 11058/12313 [8:16:44<55:10, 2.64s/it] 90%|████████▉ | 11059/12313 [8:16:47<55:22, 2.65s/it] {'loss': 0.5004, 'grad_norm': 8.27159257192499, 'learning_rate': 1.3478340772039328e-07, 'epoch': 0.9} 90%|████████▉ | 11059/12313 [8:16:47<55:22, 2.65s/it] 90%|████████▉ | 11060/12313 [8:16:49<55:42, 2.67s/it] {'loss': 0.3978, 'grad_norm': 7.866676776482577, 'learning_rate': 1.3457047673097024e-07, 'epoch': 0.9} 90%|████████▉ | 11060/12313 [8:16:49<55:42, 2.67s/it] 90%|████████▉ | 11061/12313 [8:16:52<55:40, 2.67s/it] {'loss': 0.4521, 'grad_norm': 6.3754067681303805, 'learning_rate': 1.343577094167514e-07, 'epoch': 0.9} 90%|████████▉ | 11061/12313 [8:16:52<55:40, 2.67s/it] 90%|████████▉ | 11062/12313 [8:16:55<54:49, 2.63s/it] {'loss': 0.545, 'grad_norm': 3.9619701694967744, 'learning_rate': 1.341451057924592e-07, 'epoch': 0.9} 90%|████████▉ | 11062/12313 [8:16:55<54:49, 2.63s/it] 90%|████████▉ | 11063/12313 [8:16:57<55:52, 2.68s/it] {'loss': 0.8769, 'grad_norm': 4.982039565280355, 'learning_rate': 1.3393266587280434e-07, 'epoch': 0.9} 90%|████████▉ | 11063/12313 [8:16:57<55:52, 2.68s/it] 90%|████████▉ | 11064/12313 [8:17:00<57:42, 2.77s/it] {'loss': 0.5124, 'grad_norm': 4.163415192247937, 'learning_rate': 1.3372038967248647e-07, 'epoch': 0.9} 90%|████████▉ | 11064/12313 [8:17:00<57:42, 2.77s/it] 90%|████████▉ | 11065/12313 [8:17:03<58:30, 2.81s/it] {'loss': 0.5187, 'grad_norm': 5.048906567605772, 'learning_rate': 1.335082772061949e-07, 'epoch': 0.9} 90%|████████▉ | 11065/12313 [8:17:03<58:30, 2.81s/it] 90%|████████▉ | 11066/12313 [8:17:06<56:50, 2.73s/it] {'loss': 0.6352, 'grad_norm': 6.207042181936884, 'learning_rate': 1.3329632848860545e-07, 'epoch': 0.9} 90%|████████▉ | 11066/12313 [8:17:06<56:50, 2.73s/it] 90%|████████▉ | 11067/12313 [8:17:09<58:21, 2.81s/it] {'loss': 0.5995, 'grad_norm': 7.073497034721812, 'learning_rate': 1.33084543534385e-07, 'epoch': 0.9} 90%|████████▉ | 11067/12313 [8:17:09<58:21, 2.81s/it] 90%|████████▉ | 11068/12313 [8:17:12<57:53, 2.79s/it] {'loss': 0.4552, 'grad_norm': 9.491444143304246, 'learning_rate': 1.3287292235818732e-07, 'epoch': 0.9} 90%|████████▉ | 11068/12313 [8:17:12<57:53, 2.79s/it] 90%|████████▉ | 11069/12313 [8:17:14<56:19, 2.72s/it] {'loss': 0.3663, 'grad_norm': 3.2830020992185185, 'learning_rate': 1.326614649746555e-07, 'epoch': 0.9} 90%|████████▉ | 11069/12313 [8:17:14<56:19, 2.72s/it] 90%|████████▉ | 11070/12313 [8:17:17<55:52, 2.70s/it] {'loss': 0.443, 'grad_norm': 4.692919809966446, 'learning_rate': 1.324501713984211e-07, 'epoch': 0.9} 90%|████████▉ | 11070/12313 [8:17:17<55:52, 2.70s/it] 90%|████████▉ | 11071/12313 [8:17:20<56:13, 2.72s/it] {'loss': 0.3483, 'grad_norm': 6.01529901714284, 'learning_rate': 1.3223904164410494e-07, 'epoch': 0.9} 90%|████████▉ | 11071/12313 [8:17:20<56:13, 2.72s/it] 90%|████████▉ | 11072/12313 [8:17:22<56:40, 2.74s/it] {'loss': 0.5003, 'grad_norm': 8.122333020006561, 'learning_rate': 1.3202807572631564e-07, 'epoch': 0.9} 90%|████████▉ | 11072/12313 [8:17:22<56:40, 2.74s/it] 90%|████████▉ | 11073/12313 [8:17:25<57:32, 2.78s/it] {'loss': 0.4683, 'grad_norm': 8.111867294079145, 'learning_rate': 1.318172736596518e-07, 'epoch': 0.9} 90%|████████▉ | 11073/12313 [8:17:25<57:32, 2.78s/it] 90%|████████▉ | 11074/12313 [8:17:28<56:19, 2.73s/it] {'loss': 0.4858, 'grad_norm': 9.069724308334372, 'learning_rate': 1.3160663545869896e-07, 'epoch': 0.9} 90%|████████▉ | 11074/12313 [8:17:28<56:19, 2.73s/it] 90%|████████▉ | 11075/12313 [8:17:31<56:10, 2.72s/it] {'loss': 0.6833, 'grad_norm': 7.1277675224613075, 'learning_rate': 1.3139616113803238e-07, 'epoch': 0.9} 90%|████████▉ | 11075/12313 [8:17:31<56:10, 2.72s/it] 90%|████████▉ | 11076/12313 [8:17:33<56:06, 2.72s/it] {'loss': 0.6296, 'grad_norm': 4.1361872264740285, 'learning_rate': 1.3118585071221546e-07, 'epoch': 0.9} 90%|████████▉ | 11076/12313 [8:17:33<56:06, 2.72s/it] 90%|████████▉ | 11077/12313 [8:17:36<57:16, 2.78s/it] {'loss': 0.5692, 'grad_norm': 4.28970175862567, 'learning_rate': 1.3097570419580096e-07, 'epoch': 0.9} 90%|████████▉ | 11077/12313 [8:17:36<57:16, 2.78s/it] 90%|████████▉ | 11078/12313 [8:17:39<56:21, 2.74s/it] {'loss': 0.431, 'grad_norm': 8.454970827458345, 'learning_rate': 1.3076572160333007e-07, 'epoch': 0.9} 90%|████████▉ | 11078/12313 [8:17:39<56:21, 2.74s/it] 90%|████████▉ | 11079/12313 [8:17:41<55:01, 2.68s/it] {'loss': 0.5454, 'grad_norm': 9.012725656230002, 'learning_rate': 1.3055590294933196e-07, 'epoch': 0.9} 90%|████████▉ | 11079/12313 [8:17:41<55:01, 2.68s/it] 90%|████████▉ | 11080/12313 [8:17:44<54:20, 2.64s/it] {'loss': 0.4561, 'grad_norm': 5.94575386452439, 'learning_rate': 1.303462482483256e-07, 'epoch': 0.9} 90%|████████▉ | 11080/12313 [8:17:44<54:20, 2.64s/it] 90%|████████▉ | 11081/12313 [8:17:47<54:22, 2.65s/it] {'loss': 0.4424, 'grad_norm': 5.789262579104675, 'learning_rate': 1.301367575148177e-07, 'epoch': 0.9} 90%|████████▉ | 11081/12313 [8:17:47<54:22, 2.65s/it] 90%|█████████ | 11082/12313 [8:17:49<53:32, 2.61s/it] {'loss': 0.5148, 'grad_norm': 38.814186598162095, 'learning_rate': 1.299274307633036e-07, 'epoch': 0.9} 90%|█████████ | 11082/12313 [8:17:49<53:32, 2.61s/it] 90%|█████████ | 11083/12313 [8:17:52<55:05, 2.69s/it] {'loss': 0.5525, 'grad_norm': 9.66072321360139, 'learning_rate': 1.297182680082676e-07, 'epoch': 0.9} 90%|█████████ | 11083/12313 [8:17:52<55:05, 2.69s/it] 90%|█████████ | 11084/12313 [8:17:55<54:36, 2.67s/it] {'loss': 0.5021, 'grad_norm': 4.4257331245909555, 'learning_rate': 1.2950926926418362e-07, 'epoch': 0.9} 90%|█████████ | 11084/12313 [8:17:55<54:36, 2.67s/it] 90%|█████████ | 11085/12313 [8:17:57<54:48, 2.68s/it] {'loss': 0.468, 'grad_norm': 3.0817044530551954, 'learning_rate': 1.2930043454551178e-07, 'epoch': 0.9} 90%|█████████ | 11085/12313 [8:17:57<54:48, 2.68s/it] 90%|█████████ | 11086/12313 [8:18:00<54:35, 2.67s/it] {'loss': 0.3354, 'grad_norm': 5.56969653241233, 'learning_rate': 1.2909176386670385e-07, 'epoch': 0.9} 90%|█████████ | 11086/12313 [8:18:00<54:35, 2.67s/it] 90%|█████████ | 11087/12313 [8:18:03<54:18, 2.66s/it] {'loss': 0.5716, 'grad_norm': 4.504308093649935, 'learning_rate': 1.2888325724219775e-07, 'epoch': 0.9} 90%|█████████ | 11087/12313 [8:18:03<54:18, 2.66s/it] 90%|█████████ | 11088/12313 [8:18:05<54:35, 2.67s/it] {'loss': 0.4541, 'grad_norm': 4.663416416080067, 'learning_rate': 1.2867491468642106e-07, 'epoch': 0.9} 90%|█████████ | 11088/12313 [8:18:05<54:35, 2.67s/it] 90%|█████████ | 11089/12313 [8:18:08<54:43, 2.68s/it] {'loss': 0.514, 'grad_norm': 4.148002537213163, 'learning_rate': 1.2846673621379035e-07, 'epoch': 0.9} 90%|█████████ | 11089/12313 [8:18:08<54:43, 2.68s/it] 90%|█████████ | 11090/12313 [8:18:11<54:58, 2.70s/it] {'loss': 0.5123, 'grad_norm': 4.589279907686427, 'learning_rate': 1.282587218387102e-07, 'epoch': 0.9} 90%|█████████ | 11090/12313 [8:18:11<54:58, 2.70s/it] 90%|█████████ | 11091/12313 [8:18:13<54:47, 2.69s/it] {'loss': 0.5089, 'grad_norm': 3.974192487774201, 'learning_rate': 1.2805087157557434e-07, 'epoch': 0.9} 90%|█████████ | 11091/12313 [8:18:13<54:47, 2.69s/it] 90%|█████████ | 11092/12313 [8:18:16<57:17, 2.82s/it] {'loss': 0.4654, 'grad_norm': 5.602158178066266, 'learning_rate': 1.2784318543876463e-07, 'epoch': 0.9} 90%|█████████ | 11092/12313 [8:18:16<57:17, 2.82s/it] 90%|█████████ | 11093/12313 [8:18:19<54:40, 2.69s/it] {'loss': 0.4957, 'grad_norm': 8.541653052285914, 'learning_rate': 1.276356634426526e-07, 'epoch': 0.9} 90%|█████████ | 11093/12313 [8:18:19<54:40, 2.69s/it] 90%|█████████ | 11094/12313 [8:18:21<54:04, 2.66s/it] {'loss': 0.6248, 'grad_norm': 3.690935781414827, 'learning_rate': 1.274283056015968e-07, 'epoch': 0.9} 90%|█████████ | 11094/12313 [8:18:21<54:04, 2.66s/it] 90%|█████████ | 11095/12313 [8:18:24<53:46, 2.65s/it] {'loss': 0.4652, 'grad_norm': 4.875190589469171, 'learning_rate': 1.272211119299452e-07, 'epoch': 0.9} 90%|█████████ | 11095/12313 [8:18:24<53:46, 2.65s/it] 90%|█████████ | 11096/12313 [8:18:27<53:16, 2.63s/it] {'loss': 0.4016, 'grad_norm': 5.525144142434985, 'learning_rate': 1.270140824420349e-07, 'epoch': 0.9} 90%|█████████ | 11096/12313 [8:18:27<53:16, 2.63s/it] 90%|█████████ | 11097/12313 [8:18:29<54:24, 2.68s/it] {'loss': 0.4344, 'grad_norm': 3.8356059525271387, 'learning_rate': 1.2680721715219168e-07, 'epoch': 0.9} 90%|█████████ | 11097/12313 [8:18:29<54:24, 2.68s/it] 90%|█████████ | 11098/12313 [8:18:32<54:36, 2.70s/it] {'loss': 0.6449, 'grad_norm': 4.807969276574533, 'learning_rate': 1.2660051607472885e-07, 'epoch': 0.9} 90%|█████████ | 11098/12313 [8:18:32<54:36, 2.70s/it] 90%|█████████ | 11099/12313 [8:18:35<54:13, 2.68s/it] {'loss': 0.5645, 'grad_norm': 5.802173378442104, 'learning_rate': 1.2639397922394963e-07, 'epoch': 0.9} 90%|█████████ | 11099/12313 [8:18:35<54:13, 2.68s/it] 90%|█████████ | 11100/12313 [8:18:37<53:46, 2.66s/it] {'loss': 0.499, 'grad_norm': 6.895498171533333, 'learning_rate': 1.261876066141446e-07, 'epoch': 0.9} 90%|█████████ | 11100/12313 [8:18:37<53:46, 2.66s/it] 90%|█████████ | 11101/12313 [8:18:40<53:55, 2.67s/it] {'loss': 0.4524, 'grad_norm': 7.515564655019646, 'learning_rate': 1.2598139825959393e-07, 'epoch': 0.9} 90%|█████████ | 11101/12313 [8:18:40<53:55, 2.67s/it] 90%|█████████ | 11102/12313 [8:18:43<53:58, 2.67s/it] {'loss': 0.3594, 'grad_norm': 10.60238190605972, 'learning_rate': 1.2577535417456599e-07, 'epoch': 0.9} 90%|█████████ | 11102/12313 [8:18:43<53:58, 2.67s/it] 90%|█████████ | 11103/12313 [8:18:45<53:01, 2.63s/it] {'loss': 0.3964, 'grad_norm': 5.744098794749828, 'learning_rate': 1.255694743733185e-07, 'epoch': 0.9} 90%|█████████ | 11103/12313 [8:18:45<53:01, 2.63s/it] 90%|█████████ | 11104/12313 [8:18:48<53:35, 2.66s/it] {'loss': 0.4082, 'grad_norm': 3.9046557359930865, 'learning_rate': 1.253637588700965e-07, 'epoch': 0.9} 90%|█████████ | 11104/12313 [8:18:48<53:35, 2.66s/it] 90%|█████████ | 11105/12313 [8:18:51<53:52, 2.68s/it] {'loss': 0.5643, 'grad_norm': 5.1997775870318765, 'learning_rate': 1.251582076791352e-07, 'epoch': 0.9} 90%|█████████ | 11105/12313 [8:18:51<53:52, 2.68s/it] 90%|█████████ | 11106/12313 [8:18:53<53:18, 2.65s/it] {'loss': 0.5973, 'grad_norm': 8.139019581385137, 'learning_rate': 1.2495282081465747e-07, 'epoch': 0.9} 90%|█████████ | 11106/12313 [8:18:53<53:18, 2.65s/it] 90%|█████████ | 11107/12313 [8:18:56<52:48, 2.63s/it] {'loss': 0.5088, 'grad_norm': 6.919395171153576, 'learning_rate': 1.2474759829087413e-07, 'epoch': 0.9} 90%|█████████ | 11107/12313 [8:18:56<52:48, 2.63s/it] 90%|█████████ | 11108/12313 [8:18:58<51:56, 2.59s/it] {'loss': 0.5294, 'grad_norm': 7.322279551123576, 'learning_rate': 1.2454254012198657e-07, 'epoch': 0.9} 90%|█████████ | 11108/12313 [8:18:58<51:56, 2.59s/it] 90%|█████████ | 11109/12313 [8:19:02<55:33, 2.77s/it] {'loss': 0.4372, 'grad_norm': 4.725685255377581, 'learning_rate': 1.2433764632218293e-07, 'epoch': 0.9} 90%|█████████ | 11109/12313 [8:19:02<55:33, 2.77s/it] 90%|█████████ | 11110/12313 [8:19:04<54:11, 2.70s/it] {'loss': 0.4079, 'grad_norm': 14.72704680471624, 'learning_rate': 1.2413291690564154e-07, 'epoch': 0.9} 90%|█████████ | 11110/12313 [8:19:04<54:11, 2.70s/it] 90%|█████████ | 11111/12313 [8:19:07<55:47, 2.78s/it] {'loss': 0.5958, 'grad_norm': 5.242560339794683, 'learning_rate': 1.239283518865278e-07, 'epoch': 0.9} 90%|█████████ | 11111/12313 [8:19:07<55:47, 2.78s/it] 90%|█████████ | 11112/12313 [8:19:10<55:13, 2.76s/it] {'loss': 0.4431, 'grad_norm': 3.152784855668021, 'learning_rate': 1.2372395127899728e-07, 'epoch': 0.9} 90%|█████████ | 11112/12313 [8:19:10<55:13, 2.76s/it] 90%|█████████ | 11113/12313 [8:19:12<53:56, 2.70s/it] {'loss': 0.3836, 'grad_norm': 12.381085280329925, 'learning_rate': 1.2351971509719312e-07, 'epoch': 0.9} 90%|█████████ | 11113/12313 [8:19:12<53:56, 2.70s/it] 90%|█████████ | 11114/12313 [8:19:15<54:00, 2.70s/it] {'loss': 0.4327, 'grad_norm': 6.282293943148033, 'learning_rate': 1.233156433552471e-07, 'epoch': 0.9} 90%|█████████ | 11114/12313 [8:19:15<54:00, 2.70s/it] 90%|█████████ | 11115/12313 [8:19:18<54:35, 2.73s/it] {'loss': 0.5872, 'grad_norm': 9.80917559710566, 'learning_rate': 1.2311173606727982e-07, 'epoch': 0.9} 90%|█████████ | 11115/12313 [8:19:18<54:35, 2.73s/it] 90%|█████████ | 11116/12313 [8:19:21<56:21, 2.82s/it] {'loss': 0.604, 'grad_norm': 4.868387447251448, 'learning_rate': 1.2290799324740144e-07, 'epoch': 0.9} 90%|█████████ | 11116/12313 [8:19:21<56:21, 2.82s/it] 90%|█████████ | 11117/12313 [8:19:24<54:33, 2.74s/it] {'loss': 0.4203, 'grad_norm': 4.469666914330099, 'learning_rate': 1.2270441490970897e-07, 'epoch': 0.9} 90%|█████████ | 11117/12313 [8:19:24<54:33, 2.74s/it] 90%|█████████ | 11118/12313 [8:19:26<54:02, 2.71s/it] {'loss': 0.4397, 'grad_norm': 5.0365987552230935, 'learning_rate': 1.2250100106828978e-07, 'epoch': 0.9} 90%|█████████ | 11118/12313 [8:19:26<54:02, 2.71s/it] 90%|█████████ | 11119/12313 [8:19:29<54:39, 2.75s/it] {'loss': 0.8365, 'grad_norm': 4.266209336176858, 'learning_rate': 1.222977517372184e-07, 'epoch': 0.9} 90%|█████████ | 11119/12313 [8:19:29<54:39, 2.75s/it] 90%|█████████ | 11120/12313 [8:19:32<54:20, 2.73s/it] {'loss': 0.5762, 'grad_norm': 5.161680402157924, 'learning_rate': 1.2209466693055867e-07, 'epoch': 0.9} 90%|█████████ | 11120/12313 [8:19:32<54:20, 2.73s/it] 90%|█████████ | 11121/12313 [8:19:34<52:29, 2.64s/it] {'loss': 0.4376, 'grad_norm': 4.148007256784386, 'learning_rate': 1.2189174666236314e-07, 'epoch': 0.9} 90%|█████████ | 11121/12313 [8:19:34<52:29, 2.64s/it] 90%|█████████ | 11122/12313 [8:19:37<52:02, 2.62s/it] {'loss': 0.6871, 'grad_norm': 5.331795822073706, 'learning_rate': 1.2168899094667257e-07, 'epoch': 0.9} 90%|█████████ | 11122/12313 [8:19:37<52:02, 2.62s/it] 90%|█████████ | 11123/12313 [8:19:39<53:03, 2.67s/it] {'loss': 0.4611, 'grad_norm': 4.452121593101198, 'learning_rate': 1.2148639979751686e-07, 'epoch': 0.9} 90%|█████████ | 11123/12313 [8:19:39<53:03, 2.67s/it] 90%|█████████ | 11124/12313 [8:19:42<52:38, 2.66s/it] {'loss': 0.3484, 'grad_norm': 5.092994028700953, 'learning_rate': 1.212839732289145e-07, 'epoch': 0.9} 90%|█████████ | 11124/12313 [8:19:42<52:38, 2.66s/it] 90%|█████████ | 11125/12313 [8:19:45<52:33, 2.65s/it] {'loss': 0.3757, 'grad_norm': 4.802229400487032, 'learning_rate': 1.2108171125487177e-07, 'epoch': 0.9} 90%|█████████ | 11125/12313 [8:19:45<52:33, 2.65s/it] 90%|█████████ | 11126/12313 [8:19:47<51:20, 2.59s/it] {'loss': 0.5139, 'grad_norm': 5.17753286975903, 'learning_rate': 1.2087961388938473e-07, 'epoch': 0.9} 90%|█████████ | 11126/12313 [8:19:47<51:20, 2.59s/it] 90%|█████████ | 11127/12313 [8:19:50<52:53, 2.68s/it] {'loss': 0.5869, 'grad_norm': 4.289775333983445, 'learning_rate': 1.2067768114643635e-07, 'epoch': 0.9} 90%|█████████ | 11127/12313 [8:19:50<52:53, 2.68s/it] 90%|█████████ | 11128/12313 [8:19:53<52:27, 2.66s/it] {'loss': 0.4739, 'grad_norm': 5.723264204379845, 'learning_rate': 1.2047591304000044e-07, 'epoch': 0.9} 90%|█████████ | 11128/12313 [8:19:53<52:27, 2.66s/it] 90%|█████████ | 11129/12313 [8:19:55<52:37, 2.67s/it] {'loss': 0.4689, 'grad_norm': 10.15998509588337, 'learning_rate': 1.2027430958403808e-07, 'epoch': 0.9} 90%|█████████ | 11129/12313 [8:19:55<52:37, 2.67s/it] 90%|█████████ | 11130/12313 [8:19:58<51:39, 2.62s/it] {'loss': 0.4664, 'grad_norm': 4.420651137514857, 'learning_rate': 1.2007287079249863e-07, 'epoch': 0.9} 90%|█████████ | 11130/12313 [8:19:58<51:39, 2.62s/it] 90%|█████████ | 11131/12313 [8:20:01<52:27, 2.66s/it] {'loss': 0.5313, 'grad_norm': 4.274612039891857, 'learning_rate': 1.1987159667932124e-07, 'epoch': 0.9} 90%|█████████ | 11131/12313 [8:20:01<52:27, 2.66s/it] 90%|█████████ | 11132/12313 [8:20:03<53:05, 2.70s/it] {'loss': 0.4139, 'grad_norm': 7.759017785750078, 'learning_rate': 1.1967048725843256e-07, 'epoch': 0.9} 90%|█████████ | 11132/12313 [8:20:03<53:05, 2.70s/it] 90%|█████████ | 11133/12313 [8:20:06<52:37, 2.68s/it] {'loss': 0.5634, 'grad_norm': 6.646277040961836, 'learning_rate': 1.1946954254374838e-07, 'epoch': 0.9} 90%|█████████ | 11133/12313 [8:20:06<52:37, 2.68s/it] 90%|█████████ | 11134/12313 [8:20:09<53:38, 2.73s/it] {'loss': 0.4933, 'grad_norm': 4.609209324854486, 'learning_rate': 1.1926876254917314e-07, 'epoch': 0.9} 90%|█████████ | 11134/12313 [8:20:09<53:38, 2.73s/it] 90%|█████████ | 11135/12313 [8:20:12<53:43, 2.74s/it] {'loss': 0.4394, 'grad_norm': 4.565133981153573, 'learning_rate': 1.190681472885996e-07, 'epoch': 0.9} 90%|█████████ | 11135/12313 [8:20:12<53:43, 2.74s/it] 90%|█████████ | 11136/12313 [8:20:14<52:22, 2.67s/it] {'loss': 0.5565, 'grad_norm': 8.18111120557696, 'learning_rate': 1.188676967759092e-07, 'epoch': 0.9} 90%|█████████ | 11136/12313 [8:20:14<52:22, 2.67s/it] 90%|█████████ | 11137/12313 [8:20:17<52:54, 2.70s/it] {'loss': 0.5724, 'grad_norm': 5.013373635726462, 'learning_rate': 1.1866741102497275e-07, 'epoch': 0.9} 90%|█████████ | 11137/12313 [8:20:17<52:54, 2.70s/it] 90%|█████████ | 11138/12313 [8:20:20<53:23, 2.73s/it] {'loss': 0.4643, 'grad_norm': 4.362396063750319, 'learning_rate': 1.1846729004964835e-07, 'epoch': 0.9} 90%|█████████ | 11138/12313 [8:20:20<53:23, 2.73s/it] 90%|█████████ | 11139/12313 [8:20:22<52:49, 2.70s/it] {'loss': 0.4188, 'grad_norm': 5.980174069654119, 'learning_rate': 1.1826733386378297e-07, 'epoch': 0.9} 90%|█████████ | 11139/12313 [8:20:22<52:49, 2.70s/it] 90%|█████████ | 11140/12313 [8:20:25<51:14, 2.62s/it] {'loss': 0.4468, 'grad_norm': 4.951897983742022, 'learning_rate': 1.1806754248121333e-07, 'epoch': 0.9} 90%|█████████ | 11140/12313 [8:20:25<51:14, 2.62s/it] 90%|█████████ | 11141/12313 [8:20:27<51:04, 2.61s/it] {'loss': 0.5959, 'grad_norm': 4.454466679854169, 'learning_rate': 1.1786791591576307e-07, 'epoch': 0.9} 90%|█████████ | 11141/12313 [8:20:27<51:04, 2.61s/it] 90%|█████████ | 11142/12313 [8:20:30<50:51, 2.61s/it] {'loss': 0.7119, 'grad_norm': 4.311779144133112, 'learning_rate': 1.176684541812459e-07, 'epoch': 0.9} 90%|█████████ | 11142/12313 [8:20:30<50:51, 2.61s/it] 90%|█████████ | 11143/12313 [8:20:33<51:27, 2.64s/it] {'loss': 0.5982, 'grad_norm': 5.216361658357243, 'learning_rate': 1.174691572914638e-07, 'epoch': 0.9} 90%|█████████ | 11143/12313 [8:20:33<51:27, 2.64s/it] 91%|█████████ | 11144/12313 [8:20:35<50:53, 2.61s/it] {'loss': 0.5352, 'grad_norm': 4.598499041200991, 'learning_rate': 1.1727002526020631e-07, 'epoch': 0.91} 91%|█████████ | 11144/12313 [8:20:35<50:53, 2.61s/it] 91%|█████████ | 11145/12313 [8:20:38<53:04, 2.73s/it] {'loss': 0.5892, 'grad_norm': 3.192274398240333, 'learning_rate': 1.1707105810125297e-07, 'epoch': 0.91} 91%|█████████ | 11145/12313 [8:20:38<53:04, 2.73s/it] 91%|█████████ | 11146/12313 [8:20:41<51:26, 2.64s/it] {'loss': 0.3351, 'grad_norm': 11.971740880860915, 'learning_rate': 1.1687225582837052e-07, 'epoch': 0.91} 91%|█████████ | 11146/12313 [8:20:41<51:26, 2.64s/it] 91%|█████████ | 11147/12313 [8:20:43<51:56, 2.67s/it] {'loss': 0.368, 'grad_norm': 8.973707498145792, 'learning_rate': 1.1667361845531578e-07, 'epoch': 0.91} 91%|█████████ | 11147/12313 [8:20:43<51:56, 2.67s/it] 91%|█████████ | 11148/12313 [8:20:46<52:04, 2.68s/it] {'loss': 0.3972, 'grad_norm': 4.2462620886968825, 'learning_rate': 1.164751459958327e-07, 'epoch': 0.91} 91%|█████████ | 11148/12313 [8:20:46<52:04, 2.68s/it] 91%|█████████ | 11149/12313 [8:20:49<52:10, 2.69s/it] {'loss': 0.548, 'grad_norm': 4.590797394683434, 'learning_rate': 1.1627683846365478e-07, 'epoch': 0.91} 91%|█████████ | 11149/12313 [8:20:49<52:10, 2.69s/it] 91%|█████████ | 11150/12313 [8:20:51<51:27, 2.65s/it] {'loss': 0.4099, 'grad_norm': 7.826708434408468, 'learning_rate': 1.1607869587250464e-07, 'epoch': 0.91} 91%|█████████ | 11150/12313 [8:20:51<51:27, 2.65s/it] 91%|█████████ | 11151/12313 [8:20:54<52:28, 2.71s/it] {'loss': 0.6002, 'grad_norm': 4.643007677952196, 'learning_rate': 1.1588071823609159e-07, 'epoch': 0.91} 91%|█████████ | 11151/12313 [8:20:54<52:28, 2.71s/it] 91%|█████████ | 11152/12313 [8:20:57<52:37, 2.72s/it] {'loss': 0.5826, 'grad_norm': 6.195712206260984, 'learning_rate': 1.1568290556811495e-07, 'epoch': 0.91} 91%|█████████ | 11152/12313 [8:20:57<52:37, 2.72s/it] 91%|█████████ | 11153/12313 [8:21:00<53:22, 2.76s/it] {'loss': 0.5314, 'grad_norm': 5.225028473648337, 'learning_rate': 1.1548525788226267e-07, 'epoch': 0.91} 91%|█████████ | 11153/12313 [8:21:00<53:22, 2.76s/it] 91%|█████████ | 11154/12313 [8:21:02<51:22, 2.66s/it] {'loss': 0.5205, 'grad_norm': 9.685757941498554, 'learning_rate': 1.1528777519221046e-07, 'epoch': 0.91} 91%|█████████ | 11154/12313 [8:21:02<51:22, 2.66s/it] 91%|█████████ | 11155/12313 [8:21:05<51:30, 2.67s/it] {'loss': 0.4093, 'grad_norm': 3.9456543583410526, 'learning_rate': 1.1509045751162324e-07, 'epoch': 0.91} 91%|█████████ | 11155/12313 [8:21:05<51:30, 2.67s/it] 91%|█████████ | 11156/12313 [8:21:08<52:42, 2.73s/it] {'loss': 0.3512, 'grad_norm': 5.960099778891394, 'learning_rate': 1.1489330485415479e-07, 'epoch': 0.91} 91%|█████████ | 11156/12313 [8:21:08<52:42, 2.73s/it] 91%|█████████ | 11157/12313 [8:21:10<51:54, 2.69s/it] {'loss': 0.4409, 'grad_norm': 5.908029100963964, 'learning_rate': 1.1469631723344671e-07, 'epoch': 0.91} 91%|█████████ | 11157/12313 [8:21:10<51:54, 2.69s/it] 91%|█████████ | 11158/12313 [8:21:13<51:26, 2.67s/it] {'loss': 0.4521, 'grad_norm': 4.606484674584257, 'learning_rate': 1.1449949466312893e-07, 'epoch': 0.91} 91%|█████████ | 11158/12313 [8:21:13<51:26, 2.67s/it] 91%|█████████ | 11159/12313 [8:21:15<49:51, 2.59s/it] {'loss': 0.5358, 'grad_norm': 5.177826704703417, 'learning_rate': 1.1430283715682139e-07, 'epoch': 0.91} 91%|█████████ | 11159/12313 [8:21:15<49:51, 2.59s/it] 91%|█████████ | 11160/12313 [8:21:18<49:43, 2.59s/it] {'loss': 0.4514, 'grad_norm': 4.966905493065145, 'learning_rate': 1.1410634472813098e-07, 'epoch': 0.91} 91%|█████████ | 11160/12313 [8:21:18<49:43, 2.59s/it] 91%|█████████ | 11161/12313 [8:21:21<49:59, 2.60s/it] {'loss': 0.5214, 'grad_norm': 5.793696903665879, 'learning_rate': 1.1391001739065432e-07, 'epoch': 0.91} 91%|█████████ | 11161/12313 [8:21:21<49:59, 2.60s/it] 91%|█████████ | 11162/12313 [8:21:23<50:53, 2.65s/it] {'loss': 0.4365, 'grad_norm': 5.420921312113236, 'learning_rate': 1.1371385515797695e-07, 'epoch': 0.91} 91%|█████████ | 11162/12313 [8:21:23<50:53, 2.65s/it] 91%|█████████ | 11163/12313 [8:21:26<49:42, 2.59s/it] {'loss': 0.5352, 'grad_norm': 3.588449960902482, 'learning_rate': 1.1351785804367105e-07, 'epoch': 0.91} 91%|█████████ | 11163/12313 [8:21:26<49:42, 2.59s/it] 91%|█████████ | 11164/12313 [8:21:28<48:56, 2.56s/it] {'loss': 0.4415, 'grad_norm': 5.796738656501365, 'learning_rate': 1.1332202606129938e-07, 'epoch': 0.91} 91%|█████████ | 11164/12313 [8:21:28<48:56, 2.56s/it] 91%|█████████ | 11165/12313 [8:21:31<48:11, 2.52s/it] {'loss': 0.4904, 'grad_norm': 5.625515444116812, 'learning_rate': 1.1312635922441195e-07, 'epoch': 0.91} 91%|█████████ | 11165/12313 [8:21:31<48:11, 2.52s/it] 91%|█████████ | 11166/12313 [8:21:34<49:12, 2.57s/it] {'loss': 0.6403, 'grad_norm': 4.830094151436432, 'learning_rate': 1.129308575465482e-07, 'epoch': 0.91} 91%|█████████ | 11166/12313 [8:21:34<49:12, 2.57s/it] 91%|█████████ | 11167/12313 [8:21:36<48:47, 2.55s/it] {'loss': 0.4543, 'grad_norm': 4.562587755650938, 'learning_rate': 1.1273552104123564e-07, 'epoch': 0.91} 91%|█████████ | 11167/12313 [8:21:36<48:47, 2.55s/it] 91%|█████████ | 11168/12313 [8:21:39<48:53, 2.56s/it] {'loss': 0.4899, 'grad_norm': 7.282911765724218, 'learning_rate': 1.125403497219904e-07, 'epoch': 0.91} 91%|█████████ | 11168/12313 [8:21:39<48:53, 2.56s/it] 91%|█████████ | 11169/12313 [8:21:41<49:43, 2.61s/it] {'loss': 0.3675, 'grad_norm': 4.877190705118263, 'learning_rate': 1.123453436023178e-07, 'epoch': 0.91} 91%|█████████ | 11169/12313 [8:21:41<49:43, 2.61s/it] 91%|█████████ | 11170/12313 [8:21:44<50:11, 2.64s/it] {'loss': 0.4837, 'grad_norm': 6.299432110607247, 'learning_rate': 1.121505026957112e-07, 'epoch': 0.91} 91%|█████████ | 11170/12313 [8:21:44<50:11, 2.64s/it] 91%|█████████ | 11171/12313 [8:21:47<51:13, 2.69s/it] {'loss': 0.5353, 'grad_norm': 5.585090356105831, 'learning_rate': 1.1195582701565177e-07, 'epoch': 0.91} 91%|█████████ | 11171/12313 [8:21:47<51:13, 2.69s/it] 91%|█████████ | 11172/12313 [8:21:50<51:16, 2.70s/it] {'loss': 0.5091, 'grad_norm': 5.1062027169899435, 'learning_rate': 1.1176131657561095e-07, 'epoch': 0.91} 91%|█████████ | 11172/12313 [8:21:50<51:16, 2.70s/it] 91%|█████████ | 11173/12313 [8:21:52<50:12, 2.64s/it] {'loss': 0.73, 'grad_norm': 3.970272507111287, 'learning_rate': 1.1156697138904715e-07, 'epoch': 0.91} 91%|█████████ | 11173/12313 [8:21:52<50:12, 2.64s/it] 91%|█████████ | 11174/12313 [8:21:55<52:07, 2.75s/it] {'loss': 0.3436, 'grad_norm': 4.9753615625859196, 'learning_rate': 1.1137279146940821e-07, 'epoch': 0.91} 91%|█████████ | 11174/12313 [8:21:55<52:07, 2.75s/it] 91%|█████████ | 11175/12313 [8:21:58<51:08, 2.70s/it] {'loss': 0.5066, 'grad_norm': 4.2196215333931, 'learning_rate': 1.111787768301309e-07, 'epoch': 0.91} 91%|█████████ | 11175/12313 [8:21:58<51:08, 2.70s/it] 91%|█████████ | 11176/12313 [8:22:00<50:04, 2.64s/it] {'loss': 0.4379, 'grad_norm': 9.64461171338353, 'learning_rate': 1.1098492748463945e-07, 'epoch': 0.91} 91%|█████████ | 11176/12313 [8:22:00<50:04, 2.64s/it] 91%|█████████ | 11177/12313 [8:22:03<48:44, 2.57s/it] {'loss': 0.4515, 'grad_norm': 7.251627691507904, 'learning_rate': 1.1079124344634707e-07, 'epoch': 0.91} 91%|█████████ | 11177/12313 [8:22:03<48:44, 2.57s/it] 91%|█████████ | 11178/12313 [8:22:05<47:56, 2.53s/it] {'loss': 0.408, 'grad_norm': 4.400388440896803, 'learning_rate': 1.1059772472865632e-07, 'epoch': 0.91} 91%|█████████ | 11178/12313 [8:22:05<47:56, 2.53s/it] 91%|█████████ | 11179/12313 [8:22:08<48:43, 2.58s/it] {'loss': 0.4494, 'grad_norm': 4.694092152410516, 'learning_rate': 1.1040437134495708e-07, 'epoch': 0.91} 91%|█████████ | 11179/12313 [8:22:08<48:43, 2.58s/it] 91%|█████████ | 11180/12313 [8:22:11<51:16, 2.72s/it] {'loss': 0.455, 'grad_norm': 4.510297602747996, 'learning_rate': 1.1021118330862835e-07, 'epoch': 0.91} 91%|█████████ | 11180/12313 [8:22:11<51:16, 2.72s/it] 91%|█████████ | 11181/12313 [8:22:14<52:34, 2.79s/it] {'loss': 0.5094, 'grad_norm': 3.8454716633958546, 'learning_rate': 1.1001816063303805e-07, 'epoch': 0.91} 91%|█████████ | 11181/12313 [8:22:14<52:34, 2.79s/it] 91%|█████████ | 11182/12313 [8:22:17<53:06, 2.82s/it] {'loss': 0.6907, 'grad_norm': 7.989031039920255, 'learning_rate': 1.0982530333154245e-07, 'epoch': 0.91} 91%|█████████ | 11182/12313 [8:22:17<53:06, 2.82s/it] 91%|█████████ | 11183/12313 [8:22:19<52:14, 2.77s/it] {'loss': 0.3874, 'grad_norm': 5.9615319524726695, 'learning_rate': 1.0963261141748616e-07, 'epoch': 0.91} 91%|█████████ | 11183/12313 [8:22:19<52:14, 2.77s/it] 91%|█████████ | 11184/12313 [8:22:22<50:04, 2.66s/it] {'loss': 0.6927, 'grad_norm': 4.85080597593589, 'learning_rate': 1.0944008490420183e-07, 'epoch': 0.91} 91%|█████████ | 11184/12313 [8:22:22<50:04, 2.66s/it] 91%|█████████ | 11185/12313 [8:22:24<49:05, 2.61s/it] {'loss': 0.4814, 'grad_norm': 12.043245022297707, 'learning_rate': 1.0924772380501215e-07, 'epoch': 0.91} 91%|█████████ | 11185/12313 [8:22:24<49:05, 2.61s/it] 91%|█████████ | 11186/12313 [8:22:27<49:15, 2.62s/it] {'loss': 0.43, 'grad_norm': 3.70058521293832, 'learning_rate': 1.0905552813322701e-07, 'epoch': 0.91} 91%|█████████ | 11186/12313 [8:22:27<49:15, 2.62s/it] 91%|█████████ | 11187/12313 [8:22:30<54:31, 2.91s/it] {'loss': 0.4644, 'grad_norm': 5.677961162801236, 'learning_rate': 1.0886349790214495e-07, 'epoch': 0.91} 91%|█████████ | 11187/12313 [8:22:30<54:31, 2.91s/it] 91%|█████████ | 11188/12313 [8:22:33<53:27, 2.85s/it] {'loss': 0.5697, 'grad_norm': 5.958066806913473, 'learning_rate': 1.0867163312505452e-07, 'epoch': 0.91} 91%|█████████ | 11188/12313 [8:22:33<53:27, 2.85s/it] 91%|█████████ | 11189/12313 [8:22:36<51:13, 2.73s/it] {'loss': 0.4075, 'grad_norm': 8.122121992747614, 'learning_rate': 1.084799338152312e-07, 'epoch': 0.91} 91%|█████████ | 11189/12313 [8:22:36<51:13, 2.73s/it] 91%|█████████ | 11190/12313 [8:22:39<52:59, 2.83s/it] {'loss': 0.61, 'grad_norm': 3.68707963935213, 'learning_rate': 1.082883999859391e-07, 'epoch': 0.91} 91%|█████████ | 11190/12313 [8:22:39<52:59, 2.83s/it] 91%|█████████ | 11191/12313 [8:22:41<52:25, 2.80s/it] {'loss': 0.4208, 'grad_norm': 6.469919999736764, 'learning_rate': 1.0809703165043206e-07, 'epoch': 0.91} 91%|█████████ | 11191/12313 [8:22:41<52:25, 2.80s/it] 91%|█████████ | 11192/12313 [8:22:44<50:50, 2.72s/it] {'loss': 0.3323, 'grad_norm': 13.337941491052447, 'learning_rate': 1.0790582882195172e-07, 'epoch': 0.91} 91%|█████████ | 11192/12313 [8:22:44<50:50, 2.72s/it] 91%|█████████ | 11193/12313 [8:22:46<49:16, 2.64s/it] {'loss': 0.3798, 'grad_norm': 5.027652600451383, 'learning_rate': 1.0771479151372749e-07, 'epoch': 0.91} 91%|█████████ | 11193/12313 [8:22:46<49:16, 2.64s/it] 91%|█████████ | 11194/12313 [8:22:49<48:53, 2.62s/it] {'loss': 0.4353, 'grad_norm': 19.702276711092185, 'learning_rate': 1.0752391973897852e-07, 'epoch': 0.91} 91%|█████████ | 11194/12313 [8:22:49<48:53, 2.62s/it] 91%|█████████ | 11195/12313 [8:22:51<48:25, 2.60s/it] {'loss': 0.4836, 'grad_norm': 6.736698974070218, 'learning_rate': 1.0733321351091286e-07, 'epoch': 0.91} 91%|█████████ | 11195/12313 [8:22:51<48:25, 2.60s/it] 91%|█████████ | 11196/12313 [8:22:54<48:28, 2.60s/it] {'loss': 0.4187, 'grad_norm': 8.31372301888567, 'learning_rate': 1.071426728427255e-07, 'epoch': 0.91} 91%|█████████ | 11196/12313 [8:22:54<48:28, 2.60s/it] 91%|█████████ | 11197/12313 [8:22:57<51:00, 2.74s/it] {'loss': 0.4106, 'grad_norm': 5.818025708926807, 'learning_rate': 1.0695229774760147e-07, 'epoch': 0.91} 91%|█████████ | 11197/12313 [8:22:57<51:00, 2.74s/it] 91%|█████████ | 11198/12313 [8:23:00<49:04, 2.64s/it] {'loss': 0.545, 'grad_norm': 4.4508095887650105, 'learning_rate': 1.0676208823871326e-07, 'epoch': 0.91} 91%|█████████ | 11198/12313 [8:23:00<49:04, 2.64s/it] 91%|█████████ | 11199/12313 [8:23:02<48:02, 2.59s/it] {'loss': 0.3934, 'grad_norm': 8.123684613091736, 'learning_rate': 1.065720443292223e-07, 'epoch': 0.91} 91%|█████████ | 11199/12313 [8:23:02<48:02, 2.59s/it] 91%|█████████ | 11200/12313 [8:23:05<48:55, 2.64s/it] {'loss': 0.5218, 'grad_norm': 5.968329057688115, 'learning_rate': 1.0638216603227892e-07, 'epoch': 0.91} 91%|█████████ | 11200/12313 [8:23:05<48:55, 2.64s/it] 91%|█████████ | 11201/12313 [8:23:07<48:50, 2.64s/it] {'loss': 0.4462, 'grad_norm': 8.367605133725334, 'learning_rate': 1.0619245336102174e-07, 'epoch': 0.91} 91%|█████████ | 11201/12313 [8:23:07<48:50, 2.64s/it] 91%|█████████ | 11202/12313 [8:23:10<50:06, 2.71s/it] {'loss': 0.5753, 'grad_norm': 7.432108062190521, 'learning_rate': 1.060029063285778e-07, 'epoch': 0.91} 91%|█████████ | 11202/12313 [8:23:10<50:06, 2.71s/it] 91%|█████████ | 11203/12313 [8:23:13<49:13, 2.66s/it] {'loss': 0.3382, 'grad_norm': 5.076717044768981, 'learning_rate': 1.0581352494806241e-07, 'epoch': 0.91} 91%|█████████ | 11203/12313 [8:23:13<49:13, 2.66s/it] 91%|█████████ | 11204/12313 [8:23:16<49:40, 2.69s/it] {'loss': 0.5211, 'grad_norm': 6.177656873247412, 'learning_rate': 1.0562430923258037e-07, 'epoch': 0.91} 91%|█████████ | 11204/12313 [8:23:16<49:40, 2.69s/it] 91%|█████████ | 11205/12313 [8:23:18<49:40, 2.69s/it] {'loss': 0.4251, 'grad_norm': 6.778475917464361, 'learning_rate': 1.0543525919522401e-07, 'epoch': 0.91} 91%|█████████ | 11205/12313 [8:23:18<49:40, 2.69s/it] 91%|█████████ | 11206/12313 [8:23:21<49:43, 2.70s/it] {'loss': 0.3896, 'grad_norm': 6.121707124021081, 'learning_rate': 1.0524637484907424e-07, 'epoch': 0.91} 91%|█████████ | 11206/12313 [8:23:21<49:43, 2.70s/it] 91%|█████████ | 11207/12313 [8:23:24<49:30, 2.69s/it] {'loss': 0.6354, 'grad_norm': 6.136173762658393, 'learning_rate': 1.0505765620720143e-07, 'epoch': 0.91} 91%|█████████ | 11207/12313 [8:23:24<49:30, 2.69s/it] 91%|█████████ | 11208/12313 [8:23:26<49:32, 2.69s/it] {'loss': 0.3988, 'grad_norm': 7.9613881203228525, 'learning_rate': 1.0486910328266403e-07, 'epoch': 0.91} 91%|█████████ | 11208/12313 [8:23:26<49:32, 2.69s/it] 91%|█████████ | 11209/12313 [8:23:29<50:15, 2.73s/it] {'loss': 0.537, 'grad_norm': 5.222762969931984, 'learning_rate': 1.0468071608850827e-07, 'epoch': 0.91} 91%|█████████ | 11209/12313 [8:23:29<50:15, 2.73s/it] 91%|█████████ | 11210/12313 [8:23:32<49:39, 2.70s/it] {'loss': 0.41, 'grad_norm': 4.889505356591718, 'learning_rate': 1.0449249463777039e-07, 'epoch': 0.91} 91%|█████████ | 11210/12313 [8:23:32<49:39, 2.70s/it] 91%|█████████ | 11211/12313 [8:23:34<48:43, 2.65s/it] {'loss': 0.8052, 'grad_norm': 5.393977337842649, 'learning_rate': 1.0430443894347358e-07, 'epoch': 0.91} 91%|█████████ | 11211/12313 [8:23:34<48:43, 2.65s/it] 91%|█████████ | 11212/12313 [8:23:37<48:53, 2.66s/it] {'loss': 0.4754, 'grad_norm': 4.927676914766112, 'learning_rate': 1.041165490186305e-07, 'epoch': 0.91} 91%|█████████ | 11212/12313 [8:23:37<48:53, 2.66s/it] 91%|█████████ | 11213/12313 [8:23:40<49:39, 2.71s/it] {'loss': 0.5263, 'grad_norm': 4.738799420453396, 'learning_rate': 1.0392882487624212e-07, 'epoch': 0.91} 91%|█████████ | 11213/12313 [8:23:40<49:39, 2.71s/it] 91%|█████████ | 11214/12313 [8:23:43<49:36, 2.71s/it] {'loss': 0.3904, 'grad_norm': 3.4587373103230474, 'learning_rate': 1.0374126652929805e-07, 'epoch': 0.91} 91%|█████████ | 11214/12313 [8:23:43<49:36, 2.71s/it] 91%|█████████ | 11215/12313 [8:23:45<49:12, 2.69s/it] {'loss': 0.5351, 'grad_norm': 4.320689046060023, 'learning_rate': 1.0355387399077627e-07, 'epoch': 0.91} 91%|█████████ | 11215/12313 [8:23:45<49:12, 2.69s/it] 91%|█████████ | 11216/12313 [8:23:48<47:55, 2.62s/it] {'loss': 0.4286, 'grad_norm': 6.811312905006839, 'learning_rate': 1.033666472736436e-07, 'epoch': 0.91} 91%|█████████ | 11216/12313 [8:23:48<47:55, 2.62s/it] 91%|█████████ | 11217/12313 [8:23:50<48:07, 2.63s/it] {'loss': 0.5945, 'grad_norm': 3.4936475176998845, 'learning_rate': 1.0317958639085524e-07, 'epoch': 0.91} 91%|█████████ | 11217/12313 [8:23:50<48:07, 2.63s/it] 91%|█████████ | 11218/12313 [8:23:53<48:31, 2.66s/it] {'loss': 0.3791, 'grad_norm': 4.327867349353077, 'learning_rate': 1.0299269135535416e-07, 'epoch': 0.91} 91%|█████████ | 11218/12313 [8:23:53<48:31, 2.66s/it] 91%|█████████ | 11219/12313 [8:23:56<48:43, 2.67s/it] {'loss': 0.4511, 'grad_norm': 7.91773166597089, 'learning_rate': 1.0280596218007254e-07, 'epoch': 0.91} 91%|█████████ | 11219/12313 [8:23:56<48:43, 2.67s/it] 91%|█████████ | 11220/12313 [8:23:58<48:17, 2.65s/it] {'loss': 0.4298, 'grad_norm': 5.982009386221902, 'learning_rate': 1.0261939887793143e-07, 'epoch': 0.91} 91%|█████████ | 11220/12313 [8:23:58<48:17, 2.65s/it] 91%|█████████ | 11221/12313 [8:24:01<47:52, 2.63s/it] {'loss': 0.5457, 'grad_norm': 9.681222480637057, 'learning_rate': 1.0243300146184048e-07, 'epoch': 0.91} 91%|█████████ | 11221/12313 [8:24:01<47:52, 2.63s/it] 91%|█████████ | 11222/12313 [8:24:04<48:48, 2.68s/it] {'loss': 0.4311, 'grad_norm': 8.645718538389323, 'learning_rate': 1.0224676994469635e-07, 'epoch': 0.91} 91%|█████████ | 11222/12313 [8:24:04<48:48, 2.68s/it] 91%|█████████ | 11223/12313 [8:24:07<49:31, 2.73s/it] {'loss': 0.4284, 'grad_norm': 6.818193527467812, 'learning_rate': 1.020607043393862e-07, 'epoch': 0.91} 91%|█████████ | 11223/12313 [8:24:07<49:31, 2.73s/it] 91%|█████████ | 11224/12313 [8:24:10<50:49, 2.80s/it] {'loss': 0.4452, 'grad_norm': 4.876516812885544, 'learning_rate': 1.0187480465878418e-07, 'epoch': 0.91} 91%|█████████ | 11224/12313 [8:24:10<50:49, 2.80s/it] 91%|█████████ | 11225/12313 [8:24:12<49:41, 2.74s/it] {'loss': 0.5541, 'grad_norm': 4.240926353476526, 'learning_rate': 1.0168907091575364e-07, 'epoch': 0.91} 91%|█████████ | 11225/12313 [8:24:12<49:41, 2.74s/it] 91%|█████████ | 11226/12313 [8:24:15<49:05, 2.71s/it] {'loss': 0.4558, 'grad_norm': 5.868350340007438, 'learning_rate': 1.015035031231465e-07, 'epoch': 0.91} 91%|█████████ | 11226/12313 [8:24:15<49:05, 2.71s/it] 91%|█████████ | 11227/12313 [8:24:17<48:58, 2.71s/it] {'loss': 0.6506, 'grad_norm': 3.823767663089306, 'learning_rate': 1.0131810129380332e-07, 'epoch': 0.91} 91%|█████████ | 11227/12313 [8:24:17<48:58, 2.71s/it] 91%|█████████ | 11228/12313 [8:24:20<48:35, 2.69s/it] {'loss': 0.4312, 'grad_norm': 7.147187550366974, 'learning_rate': 1.0113286544055245e-07, 'epoch': 0.91} 91%|█████████ | 11228/12313 [8:24:20<48:35, 2.69s/it] 91%|█████████ | 11229/12313 [8:24:23<48:45, 2.70s/it] {'loss': 0.4037, 'grad_norm': 50.85174427248752, 'learning_rate': 1.0094779557621171e-07, 'epoch': 0.91} 91%|█████████ | 11229/12313 [8:24:23<48:45, 2.70s/it] 91%|█████████ | 11230/12313 [8:24:25<47:59, 2.66s/it] {'loss': 0.5095, 'grad_norm': 10.89692418155859, 'learning_rate': 1.0076289171358695e-07, 'epoch': 0.91} 91%|█████████ | 11230/12313 [8:24:25<47:59, 2.66s/it] 91%|█████████ | 11231/12313 [8:24:28<46:52, 2.60s/it] {'loss': 0.6489, 'grad_norm': 6.406674309261182, 'learning_rate': 1.0057815386547181e-07, 'epoch': 0.91} 91%|█████████ | 11231/12313 [8:24:28<46:52, 2.60s/it] 91%|█████████ | 11232/12313 [8:24:30<46:24, 2.58s/it] {'loss': 0.5091, 'grad_norm': 4.4242464166233955, 'learning_rate': 1.0039358204464943e-07, 'epoch': 0.91} 91%|█████████ | 11232/12313 [8:24:30<46:24, 2.58s/it] 91%|█████████ | 11233/12313 [8:24:33<47:54, 2.66s/it] {'loss': 0.4469, 'grad_norm': 4.743762192538487, 'learning_rate': 1.0020917626389209e-07, 'epoch': 0.91} 91%|█████████ | 11233/12313 [8:24:33<47:54, 2.66s/it] 91%|█████████ | 11234/12313 [8:24:36<47:54, 2.66s/it] {'loss': 0.39, 'grad_norm': 4.14370710399411, 'learning_rate': 1.0002493653595902e-07, 'epoch': 0.91} 91%|█████████ | 11234/12313 [8:24:36<47:54, 2.66s/it] 91%|█████████ | 11235/12313 [8:24:39<48:48, 2.72s/it] {'loss': 0.5105, 'grad_norm': 14.280052275876619, 'learning_rate': 9.984086287359806e-08, 'epoch': 0.91} 91%|█████████ | 11235/12313 [8:24:39<48:48, 2.72s/it] 91%|█████████▏| 11236/12313 [8:24:41<48:28, 2.70s/it] {'loss': 0.357, 'grad_norm': 6.244441893382094, 'learning_rate': 9.965695528954711e-08, 'epoch': 0.91} 91%|█████████▏| 11236/12313 [8:24:41<48:28, 2.70s/it] 91%|█████████▏| 11237/12313 [8:24:44<49:16, 2.75s/it] {'loss': 0.4236, 'grad_norm': 7.898334173798384, 'learning_rate': 9.947321379653152e-08, 'epoch': 0.91} 91%|█████████▏| 11237/12313 [8:24:44<49:16, 2.75s/it] 91%|█████████▏| 11238/12313 [8:24:47<49:15, 2.75s/it] {'loss': 0.5023, 'grad_norm': 4.788515846324651, 'learning_rate': 9.928963840726418e-08, 'epoch': 0.91} 91%|█████████▏| 11238/12313 [8:24:47<49:15, 2.75s/it] 91%|█████████▏| 11239/12313 [8:24:50<48:21, 2.70s/it] {'loss': 0.5852, 'grad_norm': 25.04518107661059, 'learning_rate': 9.910622913444856e-08, 'epoch': 0.91} 91%|█████████▏| 11239/12313 [8:24:50<48:21, 2.70s/it] 91%|█████████▏| 11240/12313 [8:24:52<47:59, 2.68s/it] {'loss': 0.4615, 'grad_norm': 4.841830459052142, 'learning_rate': 9.89229859907756e-08, 'epoch': 0.91} 91%|█████████▏| 11240/12313 [8:24:52<47:59, 2.68s/it] 91%|█████████▏| 11241/12313 [8:24:55<47:58, 2.68s/it] {'loss': 0.4802, 'grad_norm': 7.939078029749193, 'learning_rate': 9.873990898892405e-08, 'epoch': 0.91} 91%|█████████▏| 11241/12313 [8:24:55<47:58, 2.68s/it] 91%|█████████▏| 11242/12313 [8:24:58<47:47, 2.68s/it] {'loss': 0.4418, 'grad_norm': 9.609761146624544, 'learning_rate': 9.855699814156266e-08, 'epoch': 0.91} 91%|█████████▏| 11242/12313 [8:24:58<47:47, 2.68s/it] 91%|█████████▏| 11243/12313 [8:25:00<47:34, 2.67s/it] {'loss': 0.5425, 'grad_norm': 4.6087214575599145, 'learning_rate': 9.837425346134771e-08, 'epoch': 0.91} 91%|█████████▏| 11243/12313 [8:25:00<47:34, 2.67s/it] 91%|█████████▏| 11244/12313 [8:25:03<47:07, 2.64s/it] {'loss': 0.4984, 'grad_norm': 5.893925469137472, 'learning_rate': 9.819167496092352e-08, 'epoch': 0.91} 91%|█████████▏| 11244/12313 [8:25:03<47:07, 2.64s/it] 91%|█████████▏| 11245/12313 [8:25:06<47:27, 2.67s/it] {'loss': 0.4071, 'grad_norm': 5.730428639464196, 'learning_rate': 9.800926265292415e-08, 'epoch': 0.91} 91%|█████████▏| 11245/12313 [8:25:06<47:27, 2.67s/it] 91%|█████████▏| 11246/12313 [8:25:08<46:20, 2.61s/it] {'loss': 0.5314, 'grad_norm': 3.5844043953410853, 'learning_rate': 9.782701654997145e-08, 'epoch': 0.91} 91%|█████████▏| 11246/12313 [8:25:08<46:20, 2.61s/it] 91%|█████████▏| 11247/12313 [8:25:11<46:19, 2.61s/it] {'loss': 0.5415, 'grad_norm': 4.657341875734708, 'learning_rate': 9.764493666467589e-08, 'epoch': 0.91} 91%|█████████▏| 11247/12313 [8:25:11<46:19, 2.61s/it] 91%|█████████▏| 11248/12313 [8:25:13<46:48, 2.64s/it] {'loss': 0.5306, 'grad_norm': 5.362707479330835, 'learning_rate': 9.746302300963656e-08, 'epoch': 0.91} 91%|█████████▏| 11248/12313 [8:25:13<46:48, 2.64s/it] 91%|█████████▏| 11249/12313 [8:25:16<45:31, 2.57s/it] {'loss': 0.4642, 'grad_norm': 5.144669261464033, 'learning_rate': 9.728127559744089e-08, 'epoch': 0.91} 91%|█████████▏| 11249/12313 [8:25:16<45:31, 2.57s/it] 91%|█████████▏| 11250/12313 [8:25:18<46:06, 2.60s/it] {'loss': 0.4926, 'grad_norm': 8.491237111181297, 'learning_rate': 9.709969444066436e-08, 'epoch': 0.91} 91%|█████████▏| 11250/12313 [8:25:18<46:06, 2.60s/it] 91%|█████████▏| 11251/12313 [8:25:21<45:50, 2.59s/it] {'loss': 0.6502, 'grad_norm': 4.176442395340095, 'learning_rate': 9.691827955187222e-08, 'epoch': 0.91} 91%|█████████▏| 11251/12313 [8:25:21<45:50, 2.59s/it] 91%|█████████▏| 11252/12313 [8:25:24<46:00, 2.60s/it] {'loss': 0.609, 'grad_norm': 9.913453531194941, 'learning_rate': 9.673703094361664e-08, 'epoch': 0.91} 91%|█████████▏| 11252/12313 [8:25:24<46:00, 2.60s/it] 91%|█████████▏| 11253/12313 [8:25:26<45:11, 2.56s/it] {'loss': 0.3666, 'grad_norm': 4.875302448335454, 'learning_rate': 9.655594862843953e-08, 'epoch': 0.91} 91%|█████████▏| 11253/12313 [8:25:26<45:11, 2.56s/it] 91%|█████████▏| 11254/12313 [8:25:28<44:17, 2.51s/it] {'loss': 0.4876, 'grad_norm': 4.516782702007909, 'learning_rate': 9.63750326188706e-08, 'epoch': 0.91} 91%|█████████▏| 11254/12313 [8:25:28<44:17, 2.51s/it] 91%|█████████▏| 11255/12313 [8:25:31<44:37, 2.53s/it] {'loss': 0.4328, 'grad_norm': 9.6025358191127, 'learning_rate': 9.619428292742872e-08, 'epoch': 0.91} 91%|█████████▏| 11255/12313 [8:25:31<44:37, 2.53s/it] 91%|█████████▏| 11256/12313 [8:25:34<45:44, 2.60s/it] {'loss': 0.5117, 'grad_norm': 6.319994193854443, 'learning_rate': 9.601369956662054e-08, 'epoch': 0.91} 91%|█████████▏| 11256/12313 [8:25:34<45:44, 2.60s/it] 91%|█████████▏| 11257/12313 [8:25:36<45:53, 2.61s/it] {'loss': 0.657, 'grad_norm': 5.66199645711789, 'learning_rate': 9.583328254894109e-08, 'epoch': 0.91} 91%|█████████▏| 11257/12313 [8:25:36<45:53, 2.61s/it] 91%|█████████▏| 11258/12313 [8:25:39<45:10, 2.57s/it] {'loss': 0.6704, 'grad_norm': 3.0411540989807317, 'learning_rate': 9.565303188687453e-08, 'epoch': 0.91} 91%|█████████▏| 11258/12313 [8:25:39<45:10, 2.57s/it] 91%|█████████▏| 11259/12313 [8:25:41<44:24, 2.53s/it] {'loss': 0.3464, 'grad_norm': 4.937602813989695, 'learning_rate': 9.547294759289366e-08, 'epoch': 0.91} 91%|█████████▏| 11259/12313 [8:25:41<44:24, 2.53s/it] 91%|█████████▏| 11260/12313 [8:25:44<45:42, 2.60s/it] {'loss': 0.4903, 'grad_norm': 4.5620361618138885, 'learning_rate': 9.52930296794588e-08, 'epoch': 0.91} 91%|█████████▏| 11260/12313 [8:25:44<45:42, 2.60s/it] 91%|█████████▏| 11261/12313 [8:25:47<45:19, 2.58s/it] {'loss': 0.5204, 'grad_norm': 4.986561186017303, 'learning_rate': 9.511327815902e-08, 'epoch': 0.91} 91%|█████████▏| 11261/12313 [8:25:47<45:19, 2.58s/it] 91%|█████████▏| 11262/12313 [8:25:49<45:54, 2.62s/it] {'loss': 0.3049, 'grad_norm': 3.2596608136193366, 'learning_rate': 9.493369304401423e-08, 'epoch': 0.91} 91%|█████████▏| 11262/12313 [8:25:49<45:54, 2.62s/it] 91%|█████████▏| 11263/12313 [8:25:52<46:25, 2.65s/it] {'loss': 0.5748, 'grad_norm': 3.798410117128246, 'learning_rate': 9.475427434686824e-08, 'epoch': 0.91} 91%|█████████▏| 11263/12313 [8:25:52<46:25, 2.65s/it] 91%|█████████▏| 11264/12313 [8:25:55<45:42, 2.61s/it] {'loss': 0.5748, 'grad_norm': 6.954981893846417, 'learning_rate': 9.457502207999736e-08, 'epoch': 0.91} 91%|█████████▏| 11264/12313 [8:25:55<45:42, 2.61s/it] 91%|█████████▏| 11265/12313 [8:25:57<45:01, 2.58s/it] {'loss': 0.525, 'grad_norm': 3.8501120004641143, 'learning_rate': 9.43959362558039e-08, 'epoch': 0.91} 91%|█████████▏| 11265/12313 [8:25:57<45:01, 2.58s/it] 91%|█████████▏| 11266/12313 [8:26:00<44:42, 2.56s/it] {'loss': 0.4368, 'grad_norm': 4.333034848842079, 'learning_rate': 9.421701688668017e-08, 'epoch': 0.91} 91%|█████████▏| 11266/12313 [8:26:00<44:42, 2.56s/it] 92%|█████████▏| 11267/12313 [8:26:02<45:57, 2.64s/it] {'loss': 0.4947, 'grad_norm': 5.609179824260918, 'learning_rate': 9.403826398500654e-08, 'epoch': 0.92} 92%|█████████▏| 11267/12313 [8:26:02<45:57, 2.64s/it] 92%|█████████▏| 11268/12313 [8:26:05<44:51, 2.58s/it] {'loss': 0.5003, 'grad_norm': 4.516842058122665, 'learning_rate': 9.385967756315201e-08, 'epoch': 0.92} 92%|█████████▏| 11268/12313 [8:26:05<44:51, 2.58s/it] 92%|█████████▏| 11269/12313 [8:26:08<46:16, 2.66s/it] {'loss': 0.502, 'grad_norm': 5.024303496834808, 'learning_rate': 9.368125763347336e-08, 'epoch': 0.92} 92%|█████████▏| 11269/12313 [8:26:08<46:16, 2.66s/it] 92%|█████████▏| 11270/12313 [8:26:10<45:42, 2.63s/it] {'loss': 0.4421, 'grad_norm': 3.8890995639637858, 'learning_rate': 9.350300420831599e-08, 'epoch': 0.92} 92%|█████████▏| 11270/12313 [8:26:10<45:42, 2.63s/it] 92%|█████████▏| 11271/12313 [8:26:13<45:03, 2.59s/it] {'loss': 0.6776, 'grad_norm': 11.469385591277504, 'learning_rate': 9.332491730001448e-08, 'epoch': 0.92} 92%|█████████▏| 11271/12313 [8:26:13<45:03, 2.59s/it] 92%|█████████▏| 11272/12313 [8:26:15<45:01, 2.60s/it] {'loss': 0.5468, 'grad_norm': 8.781322680809653, 'learning_rate': 9.314699692089202e-08, 'epoch': 0.92} 92%|█████████▏| 11272/12313 [8:26:15<45:01, 2.60s/it] 92%|█████████▏| 11273/12313 [8:26:18<43:58, 2.54s/it] {'loss': 0.5422, 'grad_norm': 5.122536771418473, 'learning_rate': 9.296924308325905e-08, 'epoch': 0.92} 92%|█████████▏| 11273/12313 [8:26:18<43:58, 2.54s/it] 92%|█████████▏| 11274/12313 [8:26:20<44:17, 2.56s/it] {'loss': 0.5195, 'grad_norm': 5.107338953835333, 'learning_rate': 9.279165579941546e-08, 'epoch': 0.92} 92%|█████████▏| 11274/12313 [8:26:20<44:17, 2.56s/it] 92%|█████████▏| 11275/12313 [8:26:23<46:06, 2.67s/it] {'loss': 0.4056, 'grad_norm': 7.603688810286707, 'learning_rate': 9.261423508164947e-08, 'epoch': 0.92} 92%|█████████▏| 11275/12313 [8:26:23<46:06, 2.67s/it] 92%|█████████▏| 11276/12313 [8:26:26<47:15, 2.73s/it] {'loss': 0.3926, 'grad_norm': 3.231334009016635, 'learning_rate': 9.243698094223735e-08, 'epoch': 0.92} 92%|█████████▏| 11276/12313 [8:26:26<47:15, 2.73s/it] 92%|█████████▏| 11277/12313 [8:26:29<47:15, 2.74s/it] {'loss': 0.5653, 'grad_norm': 4.19277721624568, 'learning_rate': 9.225989339344432e-08, 'epoch': 0.92} 92%|█████████▏| 11277/12313 [8:26:29<47:15, 2.74s/it] 92%|█████████▏| 11278/12313 [8:26:32<46:47, 2.71s/it] {'loss': 0.6066, 'grad_norm': 4.727861994059561, 'learning_rate': 9.208297244752362e-08, 'epoch': 0.92} 92%|█████████▏| 11278/12313 [8:26:32<46:47, 2.71s/it] 92%|█████████▏| 11279/12313 [8:26:34<47:02, 2.73s/it] {'loss': 0.4402, 'grad_norm': 4.796597984182049, 'learning_rate': 9.190621811671769e-08, 'epoch': 0.92} 92%|█████████▏| 11279/12313 [8:26:34<47:02, 2.73s/it] 92%|█████████▏| 11280/12313 [8:26:37<46:22, 2.69s/it] {'loss': 0.4518, 'grad_norm': 17.756697315100613, 'learning_rate': 9.1729630413257e-08, 'epoch': 0.92} 92%|█████████▏| 11280/12313 [8:26:37<46:22, 2.69s/it] 92%|█████████▏| 11281/12313 [8:26:40<46:39, 2.71s/it] {'loss': 0.5031, 'grad_norm': 6.095353870840864, 'learning_rate': 9.155320934936041e-08, 'epoch': 0.92} 92%|█████████▏| 11281/12313 [8:26:40<46:39, 2.71s/it] 92%|█████████▏| 11282/12313 [8:26:42<46:40, 2.72s/it] {'loss': 0.4176, 'grad_norm': 6.9639308227226, 'learning_rate': 9.137695493723481e-08, 'epoch': 0.92} 92%|█████████▏| 11282/12313 [8:26:42<46:40, 2.72s/it] 92%|█████████▏| 11283/12313 [8:26:45<46:14, 2.69s/it] {'loss': 0.4873, 'grad_norm': 4.5604405756876485, 'learning_rate': 9.120086718907657e-08, 'epoch': 0.92} 92%|█████████▏| 11283/12313 [8:26:45<46:14, 2.69s/it] 92%|█████████▏| 11284/12313 [8:26:48<45:15, 2.64s/it] {'loss': 0.3554, 'grad_norm': 5.467910329584174, 'learning_rate': 9.10249461170698e-08, 'epoch': 0.92} 92%|█████████▏| 11284/12313 [8:26:48<45:15, 2.64s/it] 92%|█████████▏| 11285/12313 [8:26:50<45:18, 2.64s/it] {'loss': 0.499, 'grad_norm': 5.007337212383075, 'learning_rate': 9.084919173338758e-08, 'epoch': 0.92} 92%|█████████▏| 11285/12313 [8:26:50<45:18, 2.64s/it] 92%|█████████▏| 11286/12313 [8:26:53<44:19, 2.59s/it] {'loss': 0.4328, 'grad_norm': 9.584147277340062, 'learning_rate': 9.067360405019099e-08, 'epoch': 0.92} 92%|█████████▏| 11286/12313 [8:26:53<44:19, 2.59s/it] 92%|█████████▏| 11287/12313 [8:26:56<45:33, 2.66s/it] {'loss': 0.4119, 'grad_norm': 4.7954029212995355, 'learning_rate': 9.049818307963004e-08, 'epoch': 0.92} 92%|█████████▏| 11287/12313 [8:26:56<45:33, 2.66s/it] 92%|█████████▏| 11288/12313 [8:26:58<44:46, 2.62s/it] {'loss': 0.4901, 'grad_norm': 7.375196976362717, 'learning_rate': 9.03229288338428e-08, 'epoch': 0.92} 92%|█████████▏| 11288/12313 [8:26:58<44:46, 2.62s/it] 92%|█████████▏| 11289/12313 [8:27:01<45:25, 2.66s/it] {'loss': 0.5983, 'grad_norm': 5.606112983529189, 'learning_rate': 9.014784132495542e-08, 'epoch': 0.92} 92%|█████████▏| 11289/12313 [8:27:01<45:25, 2.66s/it] 92%|█████████▏| 11290/12313 [8:27:04<46:29, 2.73s/it] {'loss': 0.3459, 'grad_norm': 5.297505673484214, 'learning_rate': 8.997292056508372e-08, 'epoch': 0.92} 92%|█████████▏| 11290/12313 [8:27:04<46:29, 2.73s/it] 92%|█████████▏| 11291/12313 [8:27:06<46:19, 2.72s/it] {'loss': 0.4612, 'grad_norm': 9.227347098755619, 'learning_rate': 8.979816656633084e-08, 'epoch': 0.92} 92%|█████████▏| 11291/12313 [8:27:06<46:19, 2.72s/it] 92%|█████████▏| 11292/12313 [8:27:09<47:06, 2.77s/it] {'loss': 0.5246, 'grad_norm': 2.966819498823606, 'learning_rate': 8.962357934078874e-08, 'epoch': 0.92} 92%|█████████▏| 11292/12313 [8:27:09<47:06, 2.77s/it] 92%|█████████▏| 11293/12313 [8:27:12<46:03, 2.71s/it] {'loss': 0.4562, 'grad_norm': 6.824208201533333, 'learning_rate': 8.944915890053891e-08, 'epoch': 0.92} 92%|█████████▏| 11293/12313 [8:27:12<46:03, 2.71s/it] 92%|█████████▏| 11294/12313 [8:27:14<44:33, 2.62s/it] {'loss': 0.6399, 'grad_norm': 5.376195914843917, 'learning_rate': 8.927490525764942e-08, 'epoch': 0.92} 92%|█████████▏| 11294/12313 [8:27:14<44:33, 2.62s/it] 92%|█████████▏| 11295/12313 [8:27:17<44:24, 2.62s/it] {'loss': 0.5716, 'grad_norm': 4.79660630950142, 'learning_rate': 8.910081842417761e-08, 'epoch': 0.92} 92%|█████████▏| 11295/12313 [8:27:17<44:24, 2.62s/it] 92%|█████████▏| 11296/12313 [8:27:20<44:37, 2.63s/it] {'loss': 0.5504, 'grad_norm': 4.642397495613151, 'learning_rate': 8.892689841216995e-08, 'epoch': 0.92} 92%|█████████▏| 11296/12313 [8:27:20<44:37, 2.63s/it] 92%|█████████▏| 11297/12313 [8:27:22<44:55, 2.65s/it] {'loss': 0.5241, 'grad_norm': 3.4372668067850913, 'learning_rate': 8.875314523366014e-08, 'epoch': 0.92} 92%|█████████▏| 11297/12313 [8:27:22<44:55, 2.65s/it] 92%|█████████▏| 11298/12313 [8:27:25<44:53, 2.65s/it] {'loss': 0.4511, 'grad_norm': 5.952139631988675, 'learning_rate': 8.857955890067132e-08, 'epoch': 0.92} 92%|█████████▏| 11298/12313 [8:27:25<44:53, 2.65s/it] 92%|█████████▏| 11299/12313 [8:27:28<45:36, 2.70s/it] {'loss': 0.4252, 'grad_norm': 6.811625003043075, 'learning_rate': 8.840613942521503e-08, 'epoch': 0.92} 92%|█████████▏| 11299/12313 [8:27:28<45:36, 2.70s/it] 92%|█████████▏| 11300/12313 [8:27:31<45:46, 2.71s/it] {'loss': 0.4875, 'grad_norm': 5.451600879549154, 'learning_rate': 8.823288681929082e-08, 'epoch': 0.92} 92%|█████████▏| 11300/12313 [8:27:31<45:46, 2.71s/it] 92%|█████████▏| 11301/12313 [8:27:33<46:13, 2.74s/it] {'loss': 0.6457, 'grad_norm': 4.68192691825503, 'learning_rate': 8.80598010948866e-08, 'epoch': 0.92} 92%|█████████▏| 11301/12313 [8:27:33<46:13, 2.74s/it] 92%|█████████▏| 11302/12313 [8:27:36<44:58, 2.67s/it] {'loss': 0.4899, 'grad_norm': 4.012662107572786, 'learning_rate': 8.788688226397917e-08, 'epoch': 0.92} 92%|█████████▏| 11302/12313 [8:27:36<44:58, 2.67s/it] 92%|█████████▏| 11303/12313 [8:27:38<43:28, 2.58s/it] {'loss': 0.5515, 'grad_norm': 6.570602578436052, 'learning_rate': 8.771413033853343e-08, 'epoch': 0.92} 92%|█████████▏| 11303/12313 [8:27:38<43:28, 2.58s/it] 92%|█████████▏| 11304/12313 [8:27:41<43:14, 2.57s/it] {'loss': 0.3594, 'grad_norm': 7.278974590840044, 'learning_rate': 8.754154533050285e-08, 'epoch': 0.92} 92%|█████████▏| 11304/12313 [8:27:41<43:14, 2.57s/it] 92%|█████████▏| 11305/12313 [8:27:43<43:00, 2.56s/it] {'loss': 0.4738, 'grad_norm': 5.819321823620186, 'learning_rate': 8.736912725182983e-08, 'epoch': 0.92} 92%|█████████▏| 11305/12313 [8:27:43<43:00, 2.56s/it] 92%|█████████▏| 11306/12313 [8:27:46<43:43, 2.61s/it] {'loss': 0.3854, 'grad_norm': 4.523374291477625, 'learning_rate': 8.719687611444483e-08, 'epoch': 0.92} 92%|█████████▏| 11306/12313 [8:27:46<43:43, 2.61s/it] 92%|█████████▏| 11307/12313 [8:27:49<45:43, 2.73s/it] {'loss': 0.4129, 'grad_norm': 3.277263217530252, 'learning_rate': 8.702479193026608e-08, 'epoch': 0.92} 92%|█████████▏| 11307/12313 [8:27:49<45:43, 2.73s/it] 92%|█████████▏| 11308/12313 [8:27:52<45:49, 2.74s/it] {'loss': 0.3563, 'grad_norm': 4.810473743314004, 'learning_rate': 8.68528747112013e-08, 'epoch': 0.92} 92%|█████████▏| 11308/12313 [8:27:52<45:49, 2.74s/it] 92%|█████████▏| 11309/12313 [8:27:54<45:00, 2.69s/it] {'loss': 0.4657, 'grad_norm': 4.361251259939714, 'learning_rate': 8.668112446914622e-08, 'epoch': 0.92} 92%|█████████▏| 11309/12313 [8:27:54<45:00, 2.69s/it] 92%|█████████▏| 11310/12313 [8:27:57<45:53, 2.74s/it] {'loss': 0.4554, 'grad_norm': 5.725837124786675, 'learning_rate': 8.650954121598471e-08, 'epoch': 0.92} 92%|█████████▏| 11310/12313 [8:27:57<45:53, 2.74s/it] 92%|█████████▏| 11311/12313 [8:28:00<45:29, 2.72s/it] {'loss': 0.4677, 'grad_norm': 7.948527051871499, 'learning_rate': 8.633812496358973e-08, 'epoch': 0.92} 92%|█████████▏| 11311/12313 [8:28:00<45:29, 2.72s/it] 92%|█████████▏| 11312/12313 [8:28:03<45:21, 2.72s/it] {'loss': 0.4757, 'grad_norm': 8.490193251568819, 'learning_rate': 8.616687572382293e-08, 'epoch': 0.92} 92%|█████████▏| 11312/12313 [8:28:03<45:21, 2.72s/it] 92%|█████████▏| 11313/12313 [8:28:05<45:25, 2.73s/it] {'loss': 0.5886, 'grad_norm': 10.643140675212129, 'learning_rate': 8.599579350853288e-08, 'epoch': 0.92} 92%|█████████▏| 11313/12313 [8:28:05<45:25, 2.73s/it] 92%|█████████▏| 11314/12313 [8:28:08<44:14, 2.66s/it] {'loss': 0.4511, 'grad_norm': 5.12172366794015, 'learning_rate': 8.582487832955788e-08, 'epoch': 0.92} 92%|█████████▏| 11314/12313 [8:28:08<44:14, 2.66s/it] 92%|█████████▏| 11315/12313 [8:28:11<45:05, 2.71s/it] {'loss': 0.4457, 'grad_norm': 4.267431188352919, 'learning_rate': 8.565413019872488e-08, 'epoch': 0.92} 92%|█████████▏| 11315/12313 [8:28:11<45:05, 2.71s/it] 92%|█████████▏| 11316/12313 [8:28:13<44:42, 2.69s/it] {'loss': 0.5193, 'grad_norm': 6.46937742703463, 'learning_rate': 8.548354912784801e-08, 'epoch': 0.92} 92%|█████████▏| 11316/12313 [8:28:13<44:42, 2.69s/it] 92%|█████████▏| 11317/12313 [8:28:16<44:17, 2.67s/it] {'loss': 0.4415, 'grad_norm': 10.178148390235016, 'learning_rate': 8.531313512873063e-08, 'epoch': 0.92} 92%|█████████▏| 11317/12313 [8:28:16<44:17, 2.67s/it] 92%|█████████▏| 11318/12313 [8:28:19<44:36, 2.69s/it] {'loss': 0.3627, 'grad_norm': 5.783583498365241, 'learning_rate': 8.514288821316524e-08, 'epoch': 0.92} 92%|█████████▏| 11318/12313 [8:28:19<44:36, 2.69s/it] 92%|█████████▏| 11319/12313 [8:28:22<45:39, 2.76s/it] {'loss': 0.4085, 'grad_norm': 4.717466038627756, 'learning_rate': 8.497280839293159e-08, 'epoch': 0.92} 92%|█████████▏| 11319/12313 [8:28:22<45:39, 2.76s/it] 92%|█████████▏| 11320/12313 [8:28:24<46:19, 2.80s/it] {'loss': 0.5006, 'grad_norm': 4.386734491499693, 'learning_rate': 8.480289567979776e-08, 'epoch': 0.92} 92%|█████████▏| 11320/12313 [8:28:24<46:19, 2.80s/it] 92%|█████████▏| 11321/12313 [8:28:27<46:25, 2.81s/it] {'loss': 0.5219, 'grad_norm': 4.236143189731592, 'learning_rate': 8.463315008552158e-08, 'epoch': 0.92} 92%|█████████▏| 11321/12313 [8:28:27<46:25, 2.81s/it] 92%|█████████▏| 11322/12313 [8:28:30<46:34, 2.82s/it] {'loss': 0.5412, 'grad_norm': 6.377920595401447, 'learning_rate': 8.446357162184838e-08, 'epoch': 0.92} 92%|█████████▏| 11322/12313 [8:28:30<46:34, 2.82s/it] 92%|█████████▏| 11323/12313 [8:28:33<45:25, 2.75s/it] {'loss': 0.6075, 'grad_norm': 4.393949522713649, 'learning_rate': 8.429416030051179e-08, 'epoch': 0.92} 92%|█████████▏| 11323/12313 [8:28:33<45:25, 2.75s/it] 92%|█████████▏| 11324/12313 [8:28:35<45:07, 2.74s/it] {'loss': 0.4949, 'grad_norm': 3.2693096226160097, 'learning_rate': 8.412491613323415e-08, 'epoch': 0.92} 92%|█████████▏| 11324/12313 [8:28:35<45:07, 2.74s/it] 92%|█████████▏| 11325/12313 [8:28:38<44:37, 2.71s/it] {'loss': 0.333, 'grad_norm': 5.690081161082877, 'learning_rate': 8.39558391317269e-08, 'epoch': 0.92} 92%|█████████▏| 11325/12313 [8:28:38<44:37, 2.71s/it] 92%|█████████▏| 11326/12313 [8:28:41<44:07, 2.68s/it] {'loss': 0.5179, 'grad_norm': 4.591799853216551, 'learning_rate': 8.378692930768873e-08, 'epoch': 0.92} 92%|█████████▏| 11326/12313 [8:28:41<44:07, 2.68s/it] 92%|█████████▏| 11327/12313 [8:28:43<44:22, 2.70s/it] {'loss': 0.429, 'grad_norm': 3.4431361012922017, 'learning_rate': 8.361818667280724e-08, 'epoch': 0.92} 92%|█████████▏| 11327/12313 [8:28:43<44:22, 2.70s/it] 92%|█████████▏| 11328/12313 [8:28:46<43:30, 2.65s/it] {'loss': 0.4514, 'grad_norm': 7.374506756234914, 'learning_rate': 8.344961123875895e-08, 'epoch': 0.92} 92%|█████████▏| 11328/12313 [8:28:46<43:30, 2.65s/it] 92%|█████████▏| 11329/12313 [8:28:49<43:21, 2.64s/it] {'loss': 0.5645, 'grad_norm': 3.5074173112977345, 'learning_rate': 8.328120301720783e-08, 'epoch': 0.92} 92%|█████████▏| 11329/12313 [8:28:49<43:21, 2.64s/it]WARNING: tokenization mismatch: 111 vs. 110. (ignored) 92%|█████████▏| 11330/12313 [8:28:51<43:29, 2.65s/it] {'loss': 0.6032, 'grad_norm': 4.89262703046508, 'learning_rate': 8.311296201980734e-08, 'epoch': 0.92} 92%|█████████▏| 11330/12313 [8:28:51<43:29, 2.65s/it] 92%|█████████▏| 11331/12313 [8:28:54<41:50, 2.56s/it] {'loss': 0.9195, 'grad_norm': 3.5016861133769526, 'learning_rate': 8.294488825819875e-08, 'epoch': 0.92} 92%|█████████▏| 11331/12313 [8:28:54<41:50, 2.56s/it] 92%|█████████▏| 11332/12313 [8:28:56<41:52, 2.56s/it] {'loss': 0.644, 'grad_norm': 5.597223977106012, 'learning_rate': 8.277698174401189e-08, 'epoch': 0.92} 92%|█████████▏| 11332/12313 [8:28:56<41:52, 2.56s/it] 92%|█████████▏| 11333/12313 [8:28:59<44:19, 2.71s/it] {'loss': 0.5054, 'grad_norm': 4.477628857500898, 'learning_rate': 8.260924248886471e-08, 'epoch': 0.92} 92%|█████████▏| 11333/12313 [8:28:59<44:19, 2.71s/it] 92%|█████████▏| 11334/12313 [8:29:02<44:37, 2.74s/it] {'loss': 0.4682, 'grad_norm': 5.7562752348647885, 'learning_rate': 8.244167050436402e-08, 'epoch': 0.92} 92%|█████████▏| 11334/12313 [8:29:02<44:37, 2.74s/it] 92%|█████████▏| 11335/12313 [8:29:05<43:39, 2.68s/it] {'loss': 0.6894, 'grad_norm': 5.880650217377458, 'learning_rate': 8.22742658021053e-08, 'epoch': 0.92} 92%|█████████▏| 11335/12313 [8:29:05<43:39, 2.68s/it] 92%|█████████▏| 11336/12313 [8:29:07<43:05, 2.65s/it] {'loss': 0.4346, 'grad_norm': 5.322917229821901, 'learning_rate': 8.210702839367146e-08, 'epoch': 0.92} 92%|█████████▏| 11336/12313 [8:29:07<43:05, 2.65s/it] 92%|█████████▏| 11337/12313 [8:29:10<42:39, 2.62s/it] {'loss': 0.4673, 'grad_norm': 4.796792390299345, 'learning_rate': 8.193995829063467e-08, 'epoch': 0.92} 92%|█████████▏| 11337/12313 [8:29:10<42:39, 2.62s/it] 92%|█████████▏| 11338/12313 [8:29:12<41:59, 2.58s/it] {'loss': 0.444, 'grad_norm': 8.87203052295575, 'learning_rate': 8.177305550455566e-08, 'epoch': 0.92} 92%|█████████▏| 11338/12313 [8:29:12<41:59, 2.58s/it] 92%|█████████▏| 11339/12313 [8:29:15<40:56, 2.52s/it] {'loss': 0.4699, 'grad_norm': 4.841733687058009, 'learning_rate': 8.160632004698271e-08, 'epoch': 0.92} 92%|█████████▏| 11339/12313 [8:29:15<40:56, 2.52s/it] 92%|█████████▏| 11340/12313 [8:29:17<40:15, 2.48s/it] {'loss': 0.7124, 'grad_norm': 4.075654039554347, 'learning_rate': 8.143975192945325e-08, 'epoch': 0.92} 92%|█████████▏| 11340/12313 [8:29:17<40:15, 2.48s/it] 92%|█████████▏| 11341/12313 [8:29:20<41:03, 2.53s/it] {'loss': 0.4646, 'grad_norm': 4.698000621909273, 'learning_rate': 8.127335116349305e-08, 'epoch': 0.92} 92%|█████████▏| 11341/12313 [8:29:20<41:03, 2.53s/it] 92%|█████████▏| 11342/12313 [8:29:23<42:45, 2.64s/it] {'loss': 0.4942, 'grad_norm': 4.494572377526055, 'learning_rate': 8.110711776061597e-08, 'epoch': 0.92} 92%|█████████▏| 11342/12313 [8:29:23<42:45, 2.64s/it] 92%|█████████▏| 11343/12313 [8:29:25<42:45, 2.65s/it] {'loss': 0.4229, 'grad_norm': 7.279761243226819, 'learning_rate': 8.09410517323242e-08, 'epoch': 0.92} 92%|█████████▏| 11343/12313 [8:29:25<42:45, 2.65s/it] 92%|█████████▏| 11344/12313 [8:29:28<43:29, 2.69s/it] {'loss': 0.3941, 'grad_norm': 4.863071295173902, 'learning_rate': 8.077515309010936e-08, 'epoch': 0.92} 92%|█████████▏| 11344/12313 [8:29:28<43:29, 2.69s/it] 92%|█████████▏| 11345/12313 [8:29:31<43:14, 2.68s/it] {'loss': 0.5919, 'grad_norm': 5.892354030279855, 'learning_rate': 8.060942184545034e-08, 'epoch': 0.92} 92%|█████████▏| 11345/12313 [8:29:31<43:14, 2.68s/it] 92%|█████████▏| 11346/12313 [8:29:34<44:18, 2.75s/it] {'loss': 0.3786, 'grad_norm': 4.708709314273925, 'learning_rate': 8.044385800981464e-08, 'epoch': 0.92} 92%|█████████▏| 11346/12313 [8:29:34<44:18, 2.75s/it] 92%|█████████▏| 11347/12313 [8:29:36<43:40, 2.71s/it] {'loss': 0.5678, 'grad_norm': 8.147418646635652, 'learning_rate': 8.02784615946589e-08, 'epoch': 0.92} 92%|█████████▏| 11347/12313 [8:29:36<43:40, 2.71s/it] 92%|█████████▏| 11348/12313 [8:29:39<43:32, 2.71s/it] {'loss': 0.4418, 'grad_norm': 5.376975658563418, 'learning_rate': 8.011323261142734e-08, 'epoch': 0.92} 92%|█████████▏| 11348/12313 [8:29:39<43:32, 2.71s/it] 92%|█████████▏| 11349/12313 [8:29:41<42:50, 2.67s/it] {'loss': 0.4363, 'grad_norm': 5.665831071686028, 'learning_rate': 7.994817107155301e-08, 'epoch': 0.92} 92%|█████████▏| 11349/12313 [8:29:41<42:50, 2.67s/it] 92%|█████████▏| 11350/12313 [8:29:44<43:40, 2.72s/it] {'loss': 0.4622, 'grad_norm': 4.582354055457941, 'learning_rate': 7.978327698645705e-08, 'epoch': 0.92} 92%|█████████▏| 11350/12313 [8:29:44<43:40, 2.72s/it] 92%|█████████▏| 11351/12313 [8:29:47<44:50, 2.80s/it] {'loss': 0.3925, 'grad_norm': 7.540612277290582, 'learning_rate': 7.961855036754978e-08, 'epoch': 0.92} 92%|█████████▏| 11351/12313 [8:29:47<44:50, 2.80s/it] 92%|█████████▏| 11352/12313 [8:29:50<44:54, 2.80s/it] {'loss': 0.494, 'grad_norm': 4.845886462847071, 'learning_rate': 7.945399122622904e-08, 'epoch': 0.92} 92%|█████████▏| 11352/12313 [8:29:50<44:54, 2.80s/it] 92%|█████████▏| 11353/12313 [8:29:53<43:36, 2.73s/it] {'loss': 0.625, 'grad_norm': 3.6709111199072013, 'learning_rate': 7.928959957388154e-08, 'epoch': 0.92} 92%|█████████▏| 11353/12313 [8:29:53<43:36, 2.73s/it] 92%|█████████▏| 11354/12313 [8:29:56<45:07, 2.82s/it] {'loss': 0.4271, 'grad_norm': 4.457865490095833, 'learning_rate': 7.912537542188264e-08, 'epoch': 0.92} 92%|█████████▏| 11354/12313 [8:29:56<45:07, 2.82s/it] 92%|█████████▏| 11355/12313 [8:29:58<44:55, 2.81s/it] {'loss': 0.5399, 'grad_norm': 6.9852450558056365, 'learning_rate': 7.89613187815949e-08, 'epoch': 0.92} 92%|█████████▏| 11355/12313 [8:29:58<44:55, 2.81s/it] 92%|█████████▏| 11356/12313 [8:30:01<44:11, 2.77s/it] {'loss': 0.4892, 'grad_norm': 5.777646011684341, 'learning_rate': 7.879742966437092e-08, 'epoch': 0.92} 92%|█████████▏| 11356/12313 [8:30:01<44:11, 2.77s/it] 92%|█████████▏| 11357/12313 [8:30:04<43:39, 2.74s/it] {'loss': 0.5512, 'grad_norm': 3.543496182377441, 'learning_rate': 7.86337080815508e-08, 'epoch': 0.92} 92%|█████████▏| 11357/12313 [8:30:04<43:39, 2.74s/it] 92%|█████████▏| 11358/12313 [8:30:07<43:53, 2.76s/it] {'loss': 0.3128, 'grad_norm': 4.258648901552076, 'learning_rate': 7.847015404446352e-08, 'epoch': 0.92} 92%|█████████▏| 11358/12313 [8:30:07<43:53, 2.76s/it] 92%|█████████▏| 11359/12313 [8:30:09<43:52, 2.76s/it] {'loss': 0.5077, 'grad_norm': 5.717748386729547, 'learning_rate': 7.830676756442529e-08, 'epoch': 0.92} 92%|█████████▏| 11359/12313 [8:30:09<43:52, 2.76s/it] 92%|█████████▏| 11360/12313 [8:30:12<44:19, 2.79s/it] {'loss': 0.4398, 'grad_norm': 5.465873996372227, 'learning_rate': 7.814354865274237e-08, 'epoch': 0.92} 92%|█████████▏| 11360/12313 [8:30:12<44:19, 2.79s/it] 92%|█████████▏| 11361/12313 [8:30:15<43:53, 2.77s/it] {'loss': 0.5184, 'grad_norm': 9.321969343150906, 'learning_rate': 7.798049732070822e-08, 'epoch': 0.92} 92%|█████████▏| 11361/12313 [8:30:15<43:53, 2.77s/it] 92%|█████████▏| 11362/12313 [8:30:17<42:34, 2.69s/it] {'loss': 0.4808, 'grad_norm': 6.551159045640905, 'learning_rate': 7.78176135796052e-08, 'epoch': 0.92} 92%|█████████▏| 11362/12313 [8:30:17<42:34, 2.69s/it] 92%|█████████▏| 11363/12313 [8:30:20<42:43, 2.70s/it] {'loss': 0.495, 'grad_norm': 6.18279810098398, 'learning_rate': 7.765489744070459e-08, 'epoch': 0.92} 92%|█████████▏| 11363/12313 [8:30:20<42:43, 2.70s/it] 92%|█████████▏| 11364/12313 [8:30:23<41:34, 2.63s/it] {'loss': 0.5977, 'grad_norm': 4.489976108293694, 'learning_rate': 7.749234891526486e-08, 'epoch': 0.92} 92%|█████████▏| 11364/12313 [8:30:23<41:34, 2.63s/it] 92%|█████████▏| 11365/12313 [8:30:25<41:22, 2.62s/it] {'loss': 0.7121, 'grad_norm': 4.0322329291396475, 'learning_rate': 7.732996801453313e-08, 'epoch': 0.92} 92%|█████████▏| 11365/12313 [8:30:25<41:22, 2.62s/it] 92%|█████████▏| 11366/12313 [8:30:28<41:14, 2.61s/it] {'loss': 0.3647, 'grad_norm': 7.911313584451032, 'learning_rate': 7.716775474974625e-08, 'epoch': 0.92} 92%|█████████▏| 11366/12313 [8:30:28<41:14, 2.61s/it] 92%|█████████▏| 11367/12313 [8:30:31<43:46, 2.78s/it] {'loss': 0.3889, 'grad_norm': 4.246785998863406, 'learning_rate': 7.70057091321283e-08, 'epoch': 0.92} 92%|█████████▏| 11367/12313 [8:30:31<43:46, 2.78s/it] 92%|█████████▏| 11368/12313 [8:30:34<42:42, 2.71s/it] {'loss': 0.5681, 'grad_norm': 21.982620189789714, 'learning_rate': 7.684383117289141e-08, 'epoch': 0.92} 92%|█████████▏| 11368/12313 [8:30:34<42:42, 2.71s/it] 92%|█████████▏| 11369/12313 [8:30:36<42:56, 2.73s/it] {'loss': 0.4741, 'grad_norm': 16.563097433915587, 'learning_rate': 7.66821208832369e-08, 'epoch': 0.92} 92%|█████████▏| 11369/12313 [8:30:36<42:56, 2.73s/it] 92%|█████████▏| 11370/12313 [8:30:39<43:14, 2.75s/it] {'loss': 0.5937, 'grad_norm': 4.738887055673006, 'learning_rate': 7.652057827435444e-08, 'epoch': 0.92} 92%|█████████▏| 11370/12313 [8:30:39<43:14, 2.75s/it] 92%|█████████▏| 11371/12313 [8:30:42<44:12, 2.82s/it] {'loss': 0.458, 'grad_norm': 5.91709828726191, 'learning_rate': 7.635920335742203e-08, 'epoch': 0.92} 92%|█████████▏| 11371/12313 [8:30:42<44:12, 2.82s/it] 92%|█████████▏| 11372/12313 [8:30:45<44:41, 2.85s/it] {'loss': 0.5648, 'grad_norm': 4.025266759061047, 'learning_rate': 7.619799614360573e-08, 'epoch': 0.92} 92%|█████████▏| 11372/12313 [8:30:45<44:41, 2.85s/it] 92%|█████████▏| 11373/12313 [8:30:48<43:09, 2.76s/it] {'loss': 0.4054, 'grad_norm': 5.402057661799795, 'learning_rate': 7.603695664406053e-08, 'epoch': 0.92} 92%|█████████▏| 11373/12313 [8:30:48<43:09, 2.76s/it] 92%|█████████▏| 11374/12313 [8:30:50<43:11, 2.76s/it] {'loss': 0.4741, 'grad_norm': 5.575279520689622, 'learning_rate': 7.587608486992915e-08, 'epoch': 0.92} 92%|█████████▏| 11374/12313 [8:30:50<43:11, 2.76s/it] 92%|█████████▏| 11375/12313 [8:30:53<42:54, 2.75s/it] {'loss': 0.4311, 'grad_norm': 3.7480482517554115, 'learning_rate': 7.571538083234298e-08, 'epoch': 0.92} 92%|█████████▏| 11375/12313 [8:30:53<42:54, 2.75s/it] 92%|█████████▏| 11376/12313 [8:30:56<44:10, 2.83s/it] {'loss': 0.5094, 'grad_norm': 3.873434245465527, 'learning_rate': 7.555484454242229e-08, 'epoch': 0.92} 92%|█████████▏| 11376/12313 [8:30:56<44:10, 2.83s/it] 92%|█████████▏| 11377/12313 [8:30:59<43:51, 2.81s/it] {'loss': 0.4184, 'grad_norm': 4.404320153846423, 'learning_rate': 7.539447601127542e-08, 'epoch': 0.92} 92%|█████████▏| 11377/12313 [8:30:59<43:51, 2.81s/it] 92%|█████████▏| 11378/12313 [8:31:02<44:06, 2.83s/it] {'loss': 0.4376, 'grad_norm': 4.253052401961252, 'learning_rate': 7.523427524999822e-08, 'epoch': 0.92} 92%|█████████▏| 11378/12313 [8:31:02<44:06, 2.83s/it] 92%|█████████▏| 11379/12313 [8:31:04<43:33, 2.80s/it] {'loss': 0.4695, 'grad_norm': 7.182394888454542, 'learning_rate': 7.507424226967681e-08, 'epoch': 0.92} 92%|█████████▏| 11379/12313 [8:31:04<43:33, 2.80s/it] 92%|█████████▏| 11380/12313 [8:31:07<42:53, 2.76s/it] {'loss': 0.5214, 'grad_norm': 6.303359290206109, 'learning_rate': 7.491437708138372e-08, 'epoch': 0.92} 92%|█████████▏| 11380/12313 [8:31:07<42:53, 2.76s/it] 92%|█████████▏| 11381/12313 [8:31:10<42:36, 2.74s/it] {'loss': 0.4966, 'grad_norm': 4.296067574108232, 'learning_rate': 7.475467969618122e-08, 'epoch': 0.92} 92%|█████████▏| 11381/12313 [8:31:10<42:36, 2.74s/it] 92%|█████████▏| 11382/12313 [8:31:12<41:07, 2.65s/it] {'loss': 0.6921, 'grad_norm': 3.9154299069883187, 'learning_rate': 7.459515012511937e-08, 'epoch': 0.92} 92%|█████████▏| 11382/12313 [8:31:12<41:07, 2.65s/it] 92%|█████████▏| 11383/12313 [8:31:15<41:08, 2.65s/it] {'loss': 0.4532, 'grad_norm': 3.5610883326964538, 'learning_rate': 7.443578837923709e-08, 'epoch': 0.92} 92%|█████████▏| 11383/12313 [8:31:15<41:08, 2.65s/it] 92%|█████████▏| 11384/12313 [8:31:17<40:44, 2.63s/it] {'loss': 0.6126, 'grad_norm': 4.0069328169869625, 'learning_rate': 7.427659446956087e-08, 'epoch': 0.92} 92%|█████████▏| 11384/12313 [8:31:17<40:44, 2.63s/it] 92%|█████████▏| 11385/12313 [8:31:20<40:42, 2.63s/it] {'loss': 0.5281, 'grad_norm': 3.4747848677201554, 'learning_rate': 7.41175684071066e-08, 'epoch': 0.92} 92%|█████████▏| 11385/12313 [8:31:20<40:42, 2.63s/it] 92%|█████████▏| 11386/12313 [8:31:23<40:33, 2.63s/it] {'loss': 0.448, 'grad_norm': 4.640308832890803, 'learning_rate': 7.39587102028777e-08, 'epoch': 0.92} 92%|█████████▏| 11386/12313 [8:31:23<40:33, 2.63s/it] 92%|█████████▏| 11387/12313 [8:31:25<40:29, 2.62s/it] {'loss': 0.5122, 'grad_norm': 4.482953706151723, 'learning_rate': 7.38000198678665e-08, 'epoch': 0.92} 92%|█████████▏| 11387/12313 [8:31:25<40:29, 2.62s/it] 92%|█████████▏| 11388/12313 [8:31:28<40:08, 2.60s/it] {'loss': 0.4532, 'grad_norm': 7.944047198700474, 'learning_rate': 7.36414974130531e-08, 'epoch': 0.92} 92%|█████████▏| 11388/12313 [8:31:28<40:08, 2.60s/it] 92%|█████████▏| 11389/12313 [8:31:31<40:55, 2.66s/it] {'loss': 0.4904, 'grad_norm': 3.2439380546377583, 'learning_rate': 7.348314284940706e-08, 'epoch': 0.92} 92%|█████████▏| 11389/12313 [8:31:31<40:55, 2.66s/it] 93%|█████████▎| 11390/12313 [8:31:34<42:13, 2.74s/it] {'loss': 0.5272, 'grad_norm': 4.141858638205531, 'learning_rate': 7.332495618788516e-08, 'epoch': 0.93} 93%|█████████▎| 11390/12313 [8:31:34<42:13, 2.74s/it] 93%|█████████▎| 11391/12313 [8:31:36<41:59, 2.73s/it] {'loss': 0.4314, 'grad_norm': 3.3315239217657773, 'learning_rate': 7.316693743943364e-08, 'epoch': 0.93} 93%|█████████▎| 11391/12313 [8:31:36<41:59, 2.73s/it] 93%|█████████▎| 11392/12313 [8:31:39<41:56, 2.73s/it] {'loss': 0.545, 'grad_norm': 3.9642024980963573, 'learning_rate': 7.300908661498602e-08, 'epoch': 0.93} 93%|█████████▎| 11392/12313 [8:31:39<41:56, 2.73s/it] 93%|█████████▎| 11393/12313 [8:31:42<41:44, 2.72s/it] {'loss': 0.4979, 'grad_norm': 6.271707028230578, 'learning_rate': 7.28514037254649e-08, 'epoch': 0.93} 93%|█████████▎| 11393/12313 [8:31:42<41:44, 2.72s/it] 93%|█████████▎| 11394/12313 [8:31:45<43:41, 2.85s/it] {'loss': 0.5243, 'grad_norm': 6.887788740548668, 'learning_rate': 7.26938887817813e-08, 'epoch': 0.93} 93%|█████████▎| 11394/12313 [8:31:45<43:41, 2.85s/it] 93%|█████████▎| 11395/12313 [8:31:48<43:52, 2.87s/it] {'loss': 0.563, 'grad_norm': 4.351861411172992, 'learning_rate': 7.2536541794834e-08, 'epoch': 0.93} 93%|█████████▎| 11395/12313 [8:31:48<43:52, 2.87s/it] 93%|█████████▎| 11396/12313 [8:31:51<44:37, 2.92s/it] {'loss': 0.4579, 'grad_norm': 13.306179850830969, 'learning_rate': 7.237936277551095e-08, 'epoch': 0.93} 93%|█████████▎| 11396/12313 [8:31:51<44:37, 2.92s/it] 93%|█████████▎| 11397/12313 [8:31:53<42:19, 2.77s/it] {'loss': 0.4368, 'grad_norm': 8.741284990101086, 'learning_rate': 7.22223517346879e-08, 'epoch': 0.93} 93%|█████████▎| 11397/12313 [8:31:53<42:19, 2.77s/it] 93%|█████████▎| 11398/12313 [8:31:56<42:03, 2.76s/it] {'loss': 0.5499, 'grad_norm': 4.724782847856221, 'learning_rate': 7.206550868322947e-08, 'epoch': 0.93} 93%|█████████▎| 11398/12313 [8:31:56<42:03, 2.76s/it] 93%|█████████▎| 11399/12313 [8:31:59<43:06, 2.83s/it] {'loss': 0.4174, 'grad_norm': 4.443264413967751, 'learning_rate': 7.190883363198815e-08, 'epoch': 0.93} 93%|█████████▎| 11399/12313 [8:31:59<43:06, 2.83s/it] 93%|█████████▎| 11400/12313 [8:32:02<42:13, 2.78s/it] {'loss': 0.6192, 'grad_norm': 10.053412856400966, 'learning_rate': 7.175232659180492e-08, 'epoch': 0.93} 93%|█████████▎| 11400/12313 [8:32:02<42:13, 2.78s/it] 93%|█████████▎| 11401/12313 [8:32:04<41:32, 2.73s/it] {'loss': 0.4117, 'grad_norm': 8.327053855258349, 'learning_rate': 7.159598757350922e-08, 'epoch': 0.93} 93%|█████████▎| 11401/12313 [8:32:04<41:32, 2.73s/it] 93%|█████████▎| 11402/12313 [8:32:07<40:39, 2.68s/it] {'loss': 0.4699, 'grad_norm': 4.431594842283679, 'learning_rate': 7.143981658791933e-08, 'epoch': 0.93} 93%|█████████▎| 11402/12313 [8:32:07<40:39, 2.68s/it] 93%|█████████▎| 11403/12313 [8:32:09<40:03, 2.64s/it] {'loss': 0.5342, 'grad_norm': 3.250722005433043, 'learning_rate': 7.128381364584075e-08, 'epoch': 0.93} 93%|█████████▎| 11403/12313 [8:32:09<40:03, 2.64s/it] 93%|█████████▎| 11404/12313 [8:32:12<39:47, 2.63s/it] {'loss': 0.4175, 'grad_norm': 2.957886228743401, 'learning_rate': 7.112797875806904e-08, 'epoch': 0.93} 93%|█████████▎| 11404/12313 [8:32:12<39:47, 2.63s/it] 93%|█████████▎| 11405/12313 [8:32:15<40:09, 2.65s/it] {'loss': 0.4268, 'grad_norm': 6.59101582446327, 'learning_rate': 7.09723119353864e-08, 'epoch': 0.93} 93%|█████████▎| 11405/12313 [8:32:15<40:09, 2.65s/it] 93%|█████████▎| 11406/12313 [8:32:17<40:13, 2.66s/it] {'loss': 0.4142, 'grad_norm': 4.053080793203663, 'learning_rate': 7.081681318856392e-08, 'epoch': 0.93} 93%|█████████▎| 11406/12313 [8:32:17<40:13, 2.66s/it] 93%|█████████▎| 11407/12313 [8:32:20<39:12, 2.60s/it] {'loss': 0.4385, 'grad_norm': 48.79556971471609, 'learning_rate': 7.066148252836219e-08, 'epoch': 0.93} 93%|█████████▎| 11407/12313 [8:32:20<39:12, 2.60s/it] 93%|█████████▎| 11408/12313 [8:32:23<39:43, 2.63s/it] {'loss': 0.4813, 'grad_norm': 4.586752487959576, 'learning_rate': 7.050631996552842e-08, 'epoch': 0.93} 93%|█████████▎| 11408/12313 [8:32:23<39:43, 2.63s/it] 93%|█████████▎| 11409/12313 [8:32:26<43:50, 2.91s/it] {'loss': 0.3704, 'grad_norm': 5.714365886095958, 'learning_rate': 7.035132551079932e-08, 'epoch': 0.93} 93%|█████████▎| 11409/12313 [8:32:26<43:50, 2.91s/it] 93%|█████████▎| 11410/12313 [8:32:29<42:00, 2.79s/it] {'loss': 0.3965, 'grad_norm': 8.579523295772237, 'learning_rate': 7.019649917490018e-08, 'epoch': 0.93} 93%|█████████▎| 11410/12313 [8:32:29<42:00, 2.79s/it] 93%|█████████▎| 11411/12313 [8:32:32<43:21, 2.88s/it] {'loss': 0.572, 'grad_norm': 3.0236689032912842, 'learning_rate': 7.004184096854356e-08, 'epoch': 0.93} 93%|█████████▎| 11411/12313 [8:32:32<43:21, 2.88s/it] 93%|█████████▎| 11412/12313 [8:32:34<42:09, 2.81s/it] {'loss': 0.4516, 'grad_norm': 6.812565797682661, 'learning_rate': 6.988735090243142e-08, 'epoch': 0.93} 93%|█████████▎| 11412/12313 [8:32:34<42:09, 2.81s/it] 93%|█████████▎| 11413/12313 [8:32:37<40:52, 2.72s/it] {'loss': 0.4734, 'grad_norm': 7.700307129522538, 'learning_rate': 6.973302898725303e-08, 'epoch': 0.93} 93%|█████████▎| 11413/12313 [8:32:37<40:52, 2.72s/it] 93%|█████████▎| 11414/12313 [8:32:39<39:51, 2.66s/it] {'loss': 0.4478, 'grad_norm': 4.263009068805309, 'learning_rate': 6.957887523368678e-08, 'epoch': 0.93} 93%|█████████▎| 11414/12313 [8:32:39<39:51, 2.66s/it] 93%|█████████▎| 11415/12313 [8:32:42<41:14, 2.76s/it] {'loss': 0.4749, 'grad_norm': 5.759822571490719, 'learning_rate': 6.942488965240024e-08, 'epoch': 0.93} 93%|█████████▎| 11415/12313 [8:32:42<41:14, 2.76s/it] 93%|█████████▎| 11416/12313 [8:32:45<40:06, 2.68s/it] {'loss': 0.398, 'grad_norm': 7.781687526540662, 'learning_rate': 6.92710722540471e-08, 'epoch': 0.93} 93%|█████████▎| 11416/12313 [8:32:45<40:06, 2.68s/it] 93%|█████████▎| 11417/12313 [8:32:47<39:20, 2.63s/it] {'loss': 0.3865, 'grad_norm': 6.317228165360641, 'learning_rate': 6.911742304927166e-08, 'epoch': 0.93} 93%|█████████▎| 11417/12313 [8:32:47<39:20, 2.63s/it] 93%|█████████▎| 11418/12313 [8:32:50<39:30, 2.65s/it] {'loss': 0.4675, 'grad_norm': 6.2711857470132095, 'learning_rate': 6.896394204870538e-08, 'epoch': 0.93} 93%|█████████▎| 11418/12313 [8:32:50<39:30, 2.65s/it] 93%|█████████▎| 11419/12313 [8:32:53<39:57, 2.68s/it] {'loss': 0.5341, 'grad_norm': 4.60041184087192, 'learning_rate': 6.881062926296783e-08, 'epoch': 0.93} 93%|█████████▎| 11419/12313 [8:32:53<39:57, 2.68s/it] 93%|█████████▎| 11420/12313 [8:32:56<40:41, 2.73s/it] {'loss': 0.5314, 'grad_norm': 5.061233791327525, 'learning_rate': 6.865748470266803e-08, 'epoch': 0.93} 93%|█████████▎| 11420/12313 [8:32:56<40:41, 2.73s/it] 93%|█████████▎| 11421/12313 [8:32:58<40:17, 2.71s/it] {'loss': 0.3024, 'grad_norm': 3.7105712181882007, 'learning_rate': 6.85045083784025e-08, 'epoch': 0.93} 93%|█████████▎| 11421/12313 [8:32:58<40:17, 2.71s/it] 93%|█████████▎| 11422/12313 [8:33:01<39:14, 2.64s/it] {'loss': 0.4527, 'grad_norm': 12.81065039260799, 'learning_rate': 6.835170030075638e-08, 'epoch': 0.93} 93%|█████████▎| 11422/12313 [8:33:01<39:14, 2.64s/it] 93%|█████████▎| 11423/12313 [8:33:04<39:33, 2.67s/it] {'loss': 0.4631, 'grad_norm': 6.058599141728342, 'learning_rate': 6.819906048030345e-08, 'epoch': 0.93} 93%|█████████▎| 11423/12313 [8:33:04<39:33, 2.67s/it] 93%|█████████▎| 11424/12313 [8:33:06<39:37, 2.67s/it] {'loss': 0.4521, 'grad_norm': 6.482547478325477, 'learning_rate': 6.804658892760552e-08, 'epoch': 0.93} 93%|█████████▎| 11424/12313 [8:33:06<39:37, 2.67s/it] 93%|█████████▎| 11425/12313 [8:33:09<39:04, 2.64s/it] {'loss': 0.3769, 'grad_norm': 14.54413960358652, 'learning_rate': 6.789428565321249e-08, 'epoch': 0.93} 93%|█████████▎| 11425/12313 [8:33:09<39:04, 2.64s/it] 93%|█████████▎| 11426/12313 [8:33:11<38:54, 2.63s/it] {'loss': 0.535, 'grad_norm': 5.183303848496903, 'learning_rate': 6.774215066766344e-08, 'epoch': 0.93} 93%|█████████▎| 11426/12313 [8:33:11<38:54, 2.63s/it] 93%|█████████▎| 11427/12313 [8:33:14<39:15, 2.66s/it] {'loss': 0.3777, 'grad_norm': 4.383546792589708, 'learning_rate': 6.759018398148464e-08, 'epoch': 0.93} 93%|█████████▎| 11427/12313 [8:33:14<39:15, 2.66s/it] 93%|█████████▎| 11428/12313 [8:33:17<38:22, 2.60s/it] {'loss': 0.508, 'grad_norm': 14.039870214824262, 'learning_rate': 6.743838560519189e-08, 'epoch': 0.93} 93%|█████████▎| 11428/12313 [8:33:17<38:22, 2.60s/it] 93%|█████████▎| 11429/12313 [8:33:19<37:22, 2.54s/it] {'loss': 0.3648, 'grad_norm': 6.608844886900423, 'learning_rate': 6.728675554928898e-08, 'epoch': 0.93} 93%|█████████▎| 11429/12313 [8:33:19<37:22, 2.54s/it] 93%|█████████▎| 11430/12313 [8:33:22<37:12, 2.53s/it] {'loss': 0.3972, 'grad_norm': 4.663438602760716, 'learning_rate': 6.713529382426726e-08, 'epoch': 0.93} 93%|█████████▎| 11430/12313 [8:33:22<37:12, 2.53s/it] 93%|█████████▎| 11431/12313 [8:33:25<39:09, 2.66s/it] {'loss': 0.4688, 'grad_norm': 3.8710006305176226, 'learning_rate': 6.698400044060777e-08, 'epoch': 0.93} 93%|█████████▎| 11431/12313 [8:33:25<39:09, 2.66s/it] 93%|█████████▎| 11432/12313 [8:33:27<39:32, 2.69s/it] {'loss': 0.539, 'grad_norm': 3.7449549283487324, 'learning_rate': 6.683287540877853e-08, 'epoch': 0.93} 93%|█████████▎| 11432/12313 [8:33:27<39:32, 2.69s/it] 93%|█████████▎| 11433/12313 [8:33:30<39:23, 2.69s/it] {'loss': 0.5233, 'grad_norm': 4.662838653734012, 'learning_rate': 6.668191873923701e-08, 'epoch': 0.93} 93%|█████████▎| 11433/12313 [8:33:30<39:23, 2.69s/it] 93%|█████████▎| 11434/12313 [8:33:33<39:51, 2.72s/it] {'loss': 0.5587, 'grad_norm': 4.3917574513396955, 'learning_rate': 6.653113044242904e-08, 'epoch': 0.93} 93%|█████████▎| 11434/12313 [8:33:33<39:51, 2.72s/it] 93%|█████████▎| 11435/12313 [8:33:36<40:15, 2.75s/it] {'loss': 0.5373, 'grad_norm': 9.052354440821007, 'learning_rate': 6.638051052878736e-08, 'epoch': 0.93} 93%|█████████▎| 11435/12313 [8:33:36<40:15, 2.75s/it] 93%|█████████▎| 11436/12313 [8:33:39<41:36, 2.85s/it] {'loss': 0.4253, 'grad_norm': 3.219724415933315, 'learning_rate': 6.623005900873474e-08, 'epoch': 0.93} 93%|█████████▎| 11436/12313 [8:33:39<41:36, 2.85s/it] 93%|█████████▎| 11437/12313 [8:33:41<40:58, 2.81s/it] {'loss': 0.447, 'grad_norm': 3.5215886145233584, 'learning_rate': 6.607977589268177e-08, 'epoch': 0.93} 93%|█████████▎| 11437/12313 [8:33:41<40:58, 2.81s/it] 93%|█████████▎| 11438/12313 [8:33:45<43:42, 3.00s/it] {'loss': 0.4416, 'grad_norm': 6.121975994953795, 'learning_rate': 6.59296611910265e-08, 'epoch': 0.93} 93%|█████████▎| 11438/12313 [8:33:45<43:42, 3.00s/it] 93%|█████████▎| 11439/12313 [8:33:48<43:15, 2.97s/it] {'loss': 0.4477, 'grad_norm': 7.404459895688475, 'learning_rate': 6.577971491415674e-08, 'epoch': 0.93} 93%|█████████▎| 11439/12313 [8:33:48<43:15, 2.97s/it] 93%|█████████▎| 11440/12313 [8:33:50<42:12, 2.90s/it] {'loss': 0.5049, 'grad_norm': 9.284787320280776, 'learning_rate': 6.56299370724478e-08, 'epoch': 0.93} 93%|█████████▎| 11440/12313 [8:33:50<42:12, 2.90s/it] 93%|█████████▎| 11441/12313 [8:33:53<40:35, 2.79s/it] {'loss': 0.4512, 'grad_norm': 7.115335886705074, 'learning_rate': 6.548032767626333e-08, 'epoch': 0.93} 93%|█████████▎| 11441/12313 [8:33:53<40:35, 2.79s/it] 93%|█████████▎| 11442/12313 [8:33:56<39:58, 2.75s/it] {'loss': 0.3873, 'grad_norm': 6.530284682735083, 'learning_rate': 6.533088673595589e-08, 'epoch': 0.93} 93%|█████████▎| 11442/12313 [8:33:56<39:58, 2.75s/it] 93%|█████████▎| 11443/12313 [8:33:58<38:48, 2.68s/it] {'loss': 0.5982, 'grad_norm': 5.05350752987569, 'learning_rate': 6.51816142618658e-08, 'epoch': 0.93} 93%|█████████▎| 11443/12313 [8:33:58<38:48, 2.68s/it] 93%|█████████▎| 11444/12313 [8:34:01<37:53, 2.62s/it] {'loss': 0.517, 'grad_norm': 8.402220447701428, 'learning_rate': 6.503251026432179e-08, 'epoch': 0.93} 93%|█████████▎| 11444/12313 [8:34:01<37:53, 2.62s/it] 93%|█████████▎| 11445/12313 [8:34:03<38:14, 2.64s/it] {'loss': 0.4236, 'grad_norm': 5.712552445076794, 'learning_rate': 6.48835747536411e-08, 'epoch': 0.93} 93%|█████████▎| 11445/12313 [8:34:03<38:14, 2.64s/it] 93%|█████████▎| 11446/12313 [8:34:06<37:22, 2.59s/it] {'loss': 0.4614, 'grad_norm': 8.723406889128468, 'learning_rate': 6.473480774012941e-08, 'epoch': 0.93} 93%|█████████▎| 11446/12313 [8:34:06<37:22, 2.59s/it] 93%|█████████▎| 11447/12313 [8:34:08<37:11, 2.58s/it] {'loss': 0.697, 'grad_norm': 3.518539374576921, 'learning_rate': 6.458620923408044e-08, 'epoch': 0.93} 93%|█████████▎| 11447/12313 [8:34:08<37:11, 2.58s/it] 93%|█████████▎| 11448/12313 [8:34:11<37:37, 2.61s/it] {'loss': 0.7767, 'grad_norm': 4.996661458005611, 'learning_rate': 6.443777924577676e-08, 'epoch': 0.93} 93%|█████████▎| 11448/12313 [8:34:11<37:37, 2.61s/it] 93%|█████████▎| 11449/12313 [8:34:14<37:26, 2.60s/it] {'loss': 0.5338, 'grad_norm': 6.029619825429544, 'learning_rate': 6.428951778548881e-08, 'epoch': 0.93} 93%|█████████▎| 11449/12313 [8:34:14<37:26, 2.60s/it] 93%|█████████▎| 11450/12313 [8:34:16<37:40, 2.62s/it] {'loss': 0.53, 'grad_norm': 6.072175958874135, 'learning_rate': 6.414142486347557e-08, 'epoch': 0.93} 93%|█████████▎| 11450/12313 [8:34:16<37:40, 2.62s/it] 93%|█████████▎| 11451/12313 [8:34:19<38:38, 2.69s/it] {'loss': 0.5014, 'grad_norm': 4.106779337844846, 'learning_rate': 6.39935004899836e-08, 'epoch': 0.93} 93%|█████████▎| 11451/12313 [8:34:19<38:38, 2.69s/it] 93%|█████████▎| 11452/12313 [8:34:22<38:00, 2.65s/it] {'loss': 0.3896, 'grad_norm': 5.028928964661914, 'learning_rate': 6.38457446752494e-08, 'epoch': 0.93} 93%|█████████▎| 11452/12313 [8:34:22<38:00, 2.65s/it] 93%|█████████▎| 11453/12313 [8:34:24<37:56, 2.65s/it] {'loss': 0.4988, 'grad_norm': 3.916926797126131, 'learning_rate': 6.36981574294962e-08, 'epoch': 0.93} 93%|█████████▎| 11453/12313 [8:34:24<37:56, 2.65s/it] 93%|█████████▎| 11454/12313 [8:34:27<39:08, 2.73s/it] {'loss': 0.4861, 'grad_norm': 11.573875343354155, 'learning_rate': 6.355073876293638e-08, 'epoch': 0.93} 93%|█████████▎| 11454/12313 [8:34:27<39:08, 2.73s/it] 93%|█████████▎| 11455/12313 [8:34:30<38:29, 2.69s/it] {'loss': 0.4556, 'grad_norm': 2.953512192237326, 'learning_rate': 6.340348868577123e-08, 'epoch': 0.93} 93%|█████████▎| 11455/12313 [8:34:30<38:29, 2.69s/it] 93%|█████████▎| 11456/12313 [8:34:33<39:51, 2.79s/it] {'loss': 0.4905, 'grad_norm': 5.593041199500861, 'learning_rate': 6.325640720818899e-08, 'epoch': 0.93} 93%|█████████▎| 11456/12313 [8:34:33<39:51, 2.79s/it] 93%|█████████▎| 11457/12313 [8:34:36<40:23, 2.83s/it] {'loss': 0.5173, 'grad_norm': 5.055425216937131, 'learning_rate': 6.310949434036707e-08, 'epoch': 0.93} 93%|█████████▎| 11457/12313 [8:34:36<40:23, 2.83s/it] 93%|█████████▎| 11458/12313 [8:34:38<38:37, 2.71s/it] {'loss': 0.4225, 'grad_norm': 5.280470361998423, 'learning_rate': 6.296275009247121e-08, 'epoch': 0.93} 93%|█████████▎| 11458/12313 [8:34:38<38:37, 2.71s/it] 93%|█████████▎| 11459/12313 [8:34:41<38:47, 2.73s/it] {'loss': 0.453, 'grad_norm': 3.6832709176499785, 'learning_rate': 6.28161744746547e-08, 'epoch': 0.93} 93%|█████████▎| 11459/12313 [8:34:41<38:47, 2.73s/it] 93%|█████████▎| 11460/12313 [8:34:44<38:35, 2.71s/it] {'loss': 0.4662, 'grad_norm': 7.653920651385478, 'learning_rate': 6.266976749706055e-08, 'epoch': 0.93} 93%|█████████▎| 11460/12313 [8:34:44<38:35, 2.71s/it] 93%|█████████▎| 11461/12313 [8:34:46<37:35, 2.65s/it] {'loss': 0.4553, 'grad_norm': 5.873778228921937, 'learning_rate': 6.252352916981924e-08, 'epoch': 0.93} 93%|█████████▎| 11461/12313 [8:34:46<37:35, 2.65s/it] 93%|█████████▎| 11462/12313 [8:34:49<37:22, 2.63s/it] {'loss': 0.6166, 'grad_norm': 4.554092544444619, 'learning_rate': 6.237745950304963e-08, 'epoch': 0.93} 93%|█████████▎| 11462/12313 [8:34:49<37:22, 2.63s/it] 93%|█████████▎| 11463/12313 [8:34:51<37:05, 2.62s/it] {'loss': 0.5027, 'grad_norm': 5.080682795743635, 'learning_rate': 6.223155850685864e-08, 'epoch': 0.93} 93%|█████████▎| 11463/12313 [8:34:51<37:05, 2.62s/it] 93%|█████████▎| 11464/12313 [8:34:54<36:48, 2.60s/it] {'loss': 0.5546, 'grad_norm': 16.051986684845595, 'learning_rate': 6.208582619134234e-08, 'epoch': 0.93} 93%|█████████▎| 11464/12313 [8:34:54<36:48, 2.60s/it] 93%|█████████▎| 11465/12313 [8:34:57<36:48, 2.60s/it] {'loss': 0.412, 'grad_norm': 7.21160759933066, 'learning_rate': 6.194026256658437e-08, 'epoch': 0.93} 93%|█████████▎| 11465/12313 [8:34:57<36:48, 2.60s/it] 93%|█████████▎| 11466/12313 [8:34:59<36:33, 2.59s/it] {'loss': 0.5635, 'grad_norm': 7.017201794326772, 'learning_rate': 6.179486764265663e-08, 'epoch': 0.93} 93%|█████████▎| 11466/12313 [8:34:59<36:33, 2.59s/it] 93%|█████████▎| 11467/12313 [8:35:02<36:40, 2.60s/it] {'loss': 0.4289, 'grad_norm': 6.919895696187786, 'learning_rate': 6.164964142962027e-08, 'epoch': 0.93} 93%|█████████▎| 11467/12313 [8:35:02<36:40, 2.60s/it] 93%|█████████▎| 11468/12313 [8:35:05<38:06, 2.71s/it] {'loss': 0.5504, 'grad_norm': 6.03620856801876, 'learning_rate': 6.15045839375239e-08, 'epoch': 0.93} 93%|█████████▎| 11468/12313 [8:35:05<38:06, 2.71s/it] 93%|█████████▎| 11469/12313 [8:35:07<36:49, 2.62s/it] {'loss': 0.3537, 'grad_norm': 6.279814421491574, 'learning_rate': 6.135969517640506e-08, 'epoch': 0.93} 93%|█████████▎| 11469/12313 [8:35:07<36:49, 2.62s/it] 93%|█████████▎| 11470/12313 [8:35:10<37:07, 2.64s/it] {'loss': 0.6001, 'grad_norm': 9.154160836609854, 'learning_rate': 6.12149751562885e-08, 'epoch': 0.93} 93%|█████████▎| 11470/12313 [8:35:10<37:07, 2.64s/it] 93%|█████████▎| 11471/12313 [8:35:12<37:20, 2.66s/it] {'loss': 0.3881, 'grad_norm': 6.53932524076683, 'learning_rate': 6.107042388718898e-08, 'epoch': 0.93} 93%|█████████▎| 11471/12313 [8:35:12<37:20, 2.66s/it] 93%|█████████▎| 11472/12313 [8:35:15<37:18, 2.66s/it] {'loss': 0.5065, 'grad_norm': 6.633007691995465, 'learning_rate': 6.092604137910768e-08, 'epoch': 0.93} 93%|█████████▎| 11472/12313 [8:35:15<37:18, 2.66s/it] 93%|█████████▎| 11473/12313 [8:35:18<37:16, 2.66s/it] {'loss': 0.5226, 'grad_norm': 5.199512015394908, 'learning_rate': 6.078182764203605e-08, 'epoch': 0.93} 93%|█████████▎| 11473/12313 [8:35:18<37:16, 2.66s/it] 93%|█████████▎| 11474/12313 [8:35:20<36:29, 2.61s/it] {'loss': 0.468, 'grad_norm': 7.046681958143693, 'learning_rate': 6.063778268595278e-08, 'epoch': 0.93} 93%|█████████▎| 11474/12313 [8:35:20<36:29, 2.61s/it] 93%|█████████▎| 11475/12313 [8:35:23<36:15, 2.60s/it] {'loss': 0.4611, 'grad_norm': 7.741942160235145, 'learning_rate': 6.04939065208246e-08, 'epoch': 0.93} 93%|█████████▎| 11475/12313 [8:35:23<36:15, 2.60s/it] 93%|█████████▎| 11476/12313 [8:35:25<36:03, 2.59s/it] {'loss': 0.4898, 'grad_norm': 6.7144122182681745, 'learning_rate': 6.035019915660717e-08, 'epoch': 0.93} 93%|█████████▎| 11476/12313 [8:35:25<36:03, 2.59s/it] 93%|█████████▎| 11477/12313 [8:35:28<35:37, 2.56s/it] {'loss': 0.4645, 'grad_norm': 4.677019846346351, 'learning_rate': 6.020666060324448e-08, 'epoch': 0.93} 93%|█████████▎| 11477/12313 [8:35:28<35:37, 2.56s/it] 93%|█████████▎| 11478/12313 [8:35:31<37:28, 2.69s/it] {'loss': 0.36, 'grad_norm': 4.133353885157457, 'learning_rate': 6.006329087066831e-08, 'epoch': 0.93} 93%|█████████▎| 11478/12313 [8:35:31<37:28, 2.69s/it] 93%|█████████▎| 11479/12313 [8:35:34<37:24, 2.69s/it] {'loss': 0.4392, 'grad_norm': 6.172109607468845, 'learning_rate': 5.992008996879906e-08, 'epoch': 0.93} 93%|█████████▎| 11479/12313 [8:35:34<37:24, 2.69s/it] 93%|█████████▎| 11480/12313 [8:35:37<38:46, 2.79s/it] {'loss': 0.5154, 'grad_norm': 4.757665364741502, 'learning_rate': 5.977705790754546e-08, 'epoch': 0.93} 93%|█████████▎| 11480/12313 [8:35:37<38:46, 2.79s/it] 93%|█████████▎| 11481/12313 [8:35:39<38:39, 2.79s/it] {'loss': 0.4637, 'grad_norm': 9.820132371604062, 'learning_rate': 5.963419469680543e-08, 'epoch': 0.93} 93%|█████████▎| 11481/12313 [8:35:39<38:39, 2.79s/it] 93%|█████████▎| 11482/12313 [8:35:42<37:35, 2.71s/it] {'loss': 0.5009, 'grad_norm': 4.899371695926488, 'learning_rate': 5.9491500346463005e-08, 'epoch': 0.93} 93%|█████████▎| 11482/12313 [8:35:42<37:35, 2.71s/it] 93%|█████████▎| 11483/12313 [8:35:45<37:23, 2.70s/it] {'loss': 0.4272, 'grad_norm': 3.7801882505750366, 'learning_rate': 5.934897486639307e-08, 'epoch': 0.93} 93%|█████████▎| 11483/12313 [8:35:45<37:23, 2.70s/it] 93%|█████████▎| 11484/12313 [8:35:47<37:22, 2.71s/it] {'loss': 0.5861, 'grad_norm': 7.561614245206794, 'learning_rate': 5.9206618266456904e-08, 'epoch': 0.93} 93%|█████████▎| 11484/12313 [8:35:47<37:22, 2.71s/it] 93%|█████████▎| 11485/12313 [8:35:50<38:03, 2.76s/it] {'loss': 0.6691, 'grad_norm': 4.538003207030416, 'learning_rate': 5.906443055650496e-08, 'epoch': 0.93} 93%|█████████▎| 11485/12313 [8:35:50<38:03, 2.76s/it] 93%|█████████▎| 11486/12313 [8:35:53<37:45, 2.74s/it] {'loss': 0.3982, 'grad_norm': 5.474716210455667, 'learning_rate': 5.892241174637575e-08, 'epoch': 0.93} 93%|█████████▎| 11486/12313 [8:35:53<37:45, 2.74s/it] 93%|█████████▎| 11487/12313 [8:35:55<36:58, 2.69s/it] {'loss': 0.52, 'grad_norm': 8.069204894922803, 'learning_rate': 5.8780561845896697e-08, 'epoch': 0.93} 93%|█████████▎| 11487/12313 [8:35:55<36:58, 2.69s/it] 93%|█████████▎| 11488/12313 [8:35:58<36:41, 2.67s/it] {'loss': 0.5574, 'grad_norm': 5.907171852729622, 'learning_rate': 5.863888086488301e-08, 'epoch': 0.93} 93%|█████████▎| 11488/12313 [8:35:58<36:41, 2.67s/it] 93%|█████████▎| 11489/12313 [8:36:01<37:00, 2.69s/it] {'loss': 0.4025, 'grad_norm': 6.090121506693037, 'learning_rate': 5.849736881313767e-08, 'epoch': 0.93} 93%|█████████▎| 11489/12313 [8:36:01<37:00, 2.69s/it] 93%|█████████▎| 11490/12313 [8:36:03<35:34, 2.59s/it] {'loss': 0.4993, 'grad_norm': 8.516034082485882, 'learning_rate': 5.835602570045312e-08, 'epoch': 0.93} 93%|█████████▎| 11490/12313 [8:36:03<35:34, 2.59s/it] 93%|█████████▎| 11491/12313 [8:36:06<35:55, 2.62s/it] {'loss': 0.4205, 'grad_norm': 6.277551920267633, 'learning_rate': 5.8214851536609326e-08, 'epoch': 0.93} 93%|█████████▎| 11491/12313 [8:36:06<35:55, 2.62s/it] 93%|█████████▎| 11492/12313 [8:36:09<35:54, 2.62s/it] {'loss': 0.4213, 'grad_norm': 8.065629651818462, 'learning_rate': 5.807384633137459e-08, 'epoch': 0.93} 93%|█████████▎| 11492/12313 [8:36:09<35:54, 2.62s/it] 93%|█████████▎| 11493/12313 [8:36:12<37:59, 2.78s/it] {'loss': 0.4459, 'grad_norm': 4.95647438456384, 'learning_rate': 5.793301009450636e-08, 'epoch': 0.93} 93%|█████████▎| 11493/12313 [8:36:12<37:59, 2.78s/it] 93%|█████████▎| 11494/12313 [8:36:14<37:05, 2.72s/it] {'loss': 0.5453, 'grad_norm': 4.983399216729912, 'learning_rate': 5.779234283574936e-08, 'epoch': 0.93} 93%|█████████▎| 11494/12313 [8:36:14<37:05, 2.72s/it] 93%|█████████▎| 11495/12313 [8:36:17<36:58, 2.71s/it] {'loss': 0.453, 'grad_norm': 5.591248101931953, 'learning_rate': 5.765184456483664e-08, 'epoch': 0.93} 93%|█████████▎| 11495/12313 [8:36:17<36:58, 2.71s/it] 93%|█████████▎| 11496/12313 [8:36:20<39:20, 2.89s/it] {'loss': 0.4371, 'grad_norm': 5.18268156667367, 'learning_rate': 5.7511515291490686e-08, 'epoch': 0.93} 93%|█████████▎| 11496/12313 [8:36:20<39:20, 2.89s/it] 93%|█████████▎| 11497/12313 [8:36:23<37:51, 2.78s/it] {'loss': 0.5132, 'grad_norm': 6.395159139697958, 'learning_rate': 5.737135502542124e-08, 'epoch': 0.93} 93%|█████████▎| 11497/12313 [8:36:23<37:51, 2.78s/it] 93%|█████████▎| 11498/12313 [8:36:25<37:32, 2.76s/it] {'loss': 0.4138, 'grad_norm': 5.026018785036258, 'learning_rate': 5.7231363776326096e-08, 'epoch': 0.93} 93%|█████████▎| 11498/12313 [8:36:25<37:32, 2.76s/it] 93%|█████████▎| 11499/12313 [8:36:28<38:07, 2.81s/it] {'loss': 0.5284, 'grad_norm': 4.941851075904659, 'learning_rate': 5.709154155389279e-08, 'epoch': 0.93} 93%|█████████▎| 11499/12313 [8:36:28<38:07, 2.81s/it] 93%|█████████▎| 11500/12313 [8:36:31<37:41, 2.78s/it] {'loss': 0.5964, 'grad_norm': 5.637616770839022, 'learning_rate': 5.6951888367795804e-08, 'epoch': 0.93} 93%|█████████▎| 11500/12313 [8:36:31<37:41, 2.78s/it] 93%|█████████▎| 11501/12313 [8:36:34<36:50, 2.72s/it] {'loss': 0.4451, 'grad_norm': 6.004225890183753, 'learning_rate': 5.681240422769879e-08, 'epoch': 0.93} 93%|█████████▎| 11501/12313 [8:36:34<36:50, 2.72s/it] 93%|█████████▎| 11502/12313 [8:36:36<36:08, 2.67s/it] {'loss': 0.3078, 'grad_norm': 4.723000628789615, 'learning_rate': 5.6673089143252646e-08, 'epoch': 0.93} 93%|█████████▎| 11502/12313 [8:36:36<36:08, 2.67s/it] 93%|█████████▎| 11503/12313 [8:36:39<35:33, 2.63s/it] {'loss': 0.6627, 'grad_norm': 6.077929375109591, 'learning_rate': 5.653394312409771e-08, 'epoch': 0.93} 93%|█████████▎| 11503/12313 [8:36:39<35:33, 2.63s/it] 93%|█████████▎| 11504/12313 [8:36:41<35:33, 2.64s/it] {'loss': 0.4792, 'grad_norm': 6.472316849971879, 'learning_rate': 5.639496617986184e-08, 'epoch': 0.93} 93%|█████████▎| 11504/12313 [8:36:41<35:33, 2.64s/it] 93%|█████████▎| 11505/12313 [8:36:44<35:23, 2.63s/it] {'loss': 0.438, 'grad_norm': 6.293958361792907, 'learning_rate': 5.625615832016179e-08, 'epoch': 0.93} 93%|█████████▎| 11505/12313 [8:36:44<35:23, 2.63s/it] 93%|█████████▎| 11506/12313 [8:36:47<35:41, 2.65s/it] {'loss': 0.5255, 'grad_norm': 5.09268953575727, 'learning_rate': 5.6117519554602375e-08, 'epoch': 0.93} 93%|█████████▎| 11506/12313 [8:36:47<35:41, 2.65s/it] 93%|█████████▎| 11507/12313 [8:36:50<35:55, 2.67s/it] {'loss': 0.437, 'grad_norm': 5.054483428815471, 'learning_rate': 5.597904989277675e-08, 'epoch': 0.93} 93%|█████████▎| 11507/12313 [8:36:50<35:55, 2.67s/it] 93%|█████████▎| 11508/12313 [8:36:52<36:25, 2.72s/it] {'loss': 0.5096, 'grad_norm': 3.9858596078394943, 'learning_rate': 5.584074934426559e-08, 'epoch': 0.93} 93%|█████████▎| 11508/12313 [8:36:52<36:25, 2.72s/it] 93%|█████████▎| 11509/12313 [8:36:55<36:23, 2.72s/it] {'loss': 0.4268, 'grad_norm': 8.170161084504295, 'learning_rate': 5.570261791863957e-08, 'epoch': 0.93} 93%|█████████▎| 11509/12313 [8:36:55<36:23, 2.72s/it] 93%|█████████▎| 11510/12313 [8:36:57<34:58, 2.61s/it] {'loss': 0.7232, 'grad_norm': 5.199339732120409, 'learning_rate': 5.5564655625455766e-08, 'epoch': 0.93} 93%|█████████▎| 11510/12313 [8:36:57<34:58, 2.61s/it] 93%|█████████▎| 11511/12313 [8:37:00<34:19, 2.57s/it] {'loss': 0.4998, 'grad_norm': 4.643135079968665, 'learning_rate': 5.5426862474260986e-08, 'epoch': 0.93} 93%|█████████▎| 11511/12313 [8:37:00<34:19, 2.57s/it] 93%|█████████▎| 11512/12313 [8:37:03<34:37, 2.59s/it] {'loss': 0.4594, 'grad_norm': 4.163342887629166, 'learning_rate': 5.528923847458928e-08, 'epoch': 0.93} 93%|█████████▎| 11512/12313 [8:37:03<34:37, 2.59s/it] 94%|█████████▎| 11513/12313 [8:37:05<35:22, 2.65s/it] {'loss': 0.46, 'grad_norm': 5.970898841399132, 'learning_rate': 5.5151783635964126e-08, 'epoch': 0.94} 94%|█████████▎| 11513/12313 [8:37:05<35:22, 2.65s/it] 94%|█████████▎| 11514/12313 [8:37:08<36:11, 2.72s/it] {'loss': 0.4887, 'grad_norm': 5.478270806254457, 'learning_rate': 5.5014497967896266e-08, 'epoch': 0.94} 94%|█████████▎| 11514/12313 [8:37:08<36:11, 2.72s/it] 94%|█████████▎| 11515/12313 [8:37:11<36:23, 2.74s/it] {'loss': 0.5598, 'grad_norm': 4.460190714960654, 'learning_rate': 5.4877381479885307e-08, 'epoch': 0.94} 94%|█████████▎| 11515/12313 [8:37:11<36:23, 2.74s/it] 94%|█████████▎| 11516/12313 [8:37:14<36:19, 2.73s/it] {'loss': 0.47, 'grad_norm': 4.1338050878154355, 'learning_rate': 5.4740434181418945e-08, 'epoch': 0.94} 94%|█████████▎| 11516/12313 [8:37:14<36:19, 2.73s/it] 94%|█████████▎| 11517/12313 [8:37:16<36:13, 2.73s/it] {'loss': 0.4158, 'grad_norm': 6.435871850940438, 'learning_rate': 5.460365608197293e-08, 'epoch': 0.94} 94%|█████████▎| 11517/12313 [8:37:16<36:13, 2.73s/it] 94%|█████████▎| 11518/12313 [8:37:19<36:02, 2.72s/it] {'loss': 0.3233, 'grad_norm': 7.643223731934275, 'learning_rate': 5.4467047191011924e-08, 'epoch': 0.94} 94%|█████████▎| 11518/12313 [8:37:19<36:02, 2.72s/it] 94%|█████████▎| 11519/12313 [8:37:22<35:10, 2.66s/it] {'loss': 0.5388, 'grad_norm': 5.4260804495525825, 'learning_rate': 5.4330607517988635e-08, 'epoch': 0.94} 94%|█████████▎| 11519/12313 [8:37:22<35:10, 2.66s/it] 94%|█████████▎| 11520/12313 [8:37:24<35:06, 2.66s/it] {'loss': 0.4487, 'grad_norm': 4.953162854169758, 'learning_rate': 5.419433707234356e-08, 'epoch': 0.94} 94%|█████████▎| 11520/12313 [8:37:24<35:06, 2.66s/it] 94%|█████████▎| 11521/12313 [8:37:27<34:44, 2.63s/it] {'loss': 0.5167, 'grad_norm': 3.6992673910297293, 'learning_rate': 5.4058235863506116e-08, 'epoch': 0.94} 94%|█████████▎| 11521/12313 [8:37:27<34:44, 2.63s/it] 94%|█████████▎| 11522/12313 [8:37:29<34:31, 2.62s/it] {'loss': 0.5104, 'grad_norm': 4.996367276417503, 'learning_rate': 5.392230390089404e-08, 'epoch': 0.94} 94%|█████████▎| 11522/12313 [8:37:29<34:31, 2.62s/it] 94%|█████████▎| 11523/12313 [8:37:32<34:20, 2.61s/it] {'loss': 0.5088, 'grad_norm': 4.62381128315659, 'learning_rate': 5.3786541193912854e-08, 'epoch': 0.94} 94%|█████████▎| 11523/12313 [8:37:32<34:20, 2.61s/it] 94%|█████████▎| 11524/12313 [8:37:35<34:46, 2.64s/it] {'loss': 0.3284, 'grad_norm': 7.587560626305793, 'learning_rate': 5.3650947751956174e-08, 'epoch': 0.94} 94%|█████████▎| 11524/12313 [8:37:35<34:46, 2.64s/it] 94%|█████████▎| 11525/12313 [8:37:37<34:57, 2.66s/it] {'loss': 0.4261, 'grad_norm': 5.005872247413698, 'learning_rate': 5.351552358440704e-08, 'epoch': 0.94} 94%|█████████▎| 11525/12313 [8:37:37<34:57, 2.66s/it] 94%|█████████▎| 11526/12313 [8:37:40<35:24, 2.70s/it] {'loss': 0.4862, 'grad_norm': 5.695031699056669, 'learning_rate': 5.3380268700636006e-08, 'epoch': 0.94} 94%|█████████▎| 11526/12313 [8:37:40<35:24, 2.70s/it] 94%|█████████▎| 11527/12313 [8:37:43<35:39, 2.72s/it] {'loss': 0.5233, 'grad_norm': 5.066704771887348, 'learning_rate': 5.324518311000143e-08, 'epoch': 0.94} 94%|█████████▎| 11527/12313 [8:37:43<35:39, 2.72s/it] 94%|█████████▎| 11528/12313 [8:37:46<36:58, 2.83s/it] {'loss': 0.4053, 'grad_norm': 3.891689553812398, 'learning_rate': 5.311026682185139e-08, 'epoch': 0.94} 94%|█████████▎| 11528/12313 [8:37:46<36:58, 2.83s/it] 94%|█████████▎| 11529/12313 [8:37:49<36:25, 2.79s/it] {'loss': 0.5201, 'grad_norm': 4.149090685885997, 'learning_rate': 5.297551984552063e-08, 'epoch': 0.94} 94%|█████████▎| 11529/12313 [8:37:49<36:25, 2.79s/it] 94%|█████████▎| 11530/12313 [8:37:52<36:37, 2.81s/it] {'loss': 0.4647, 'grad_norm': 4.452851895114947, 'learning_rate': 5.2840942190333086e-08, 'epoch': 0.94} 94%|█████████▎| 11530/12313 [8:37:52<36:37, 2.81s/it] 94%|█████████▎| 11531/12313 [8:37:54<36:31, 2.80s/it] {'loss': 0.4444, 'grad_norm': 5.464100401282328, 'learning_rate': 5.270653386560104e-08, 'epoch': 0.94} 94%|█████████▎| 11531/12313 [8:37:54<36:31, 2.80s/it] 94%|█████████▎| 11532/12313 [8:37:57<36:18, 2.79s/it] {'loss': 0.4242, 'grad_norm': 7.585338070087896, 'learning_rate': 5.257229488062482e-08, 'epoch': 0.94} 94%|█████████▎| 11532/12313 [8:37:57<36:18, 2.79s/it] 94%|█████████▎| 11533/12313 [8:38:00<36:00, 2.77s/it] {'loss': 0.3792, 'grad_norm': 9.756341519461458, 'learning_rate': 5.243822524469283e-08, 'epoch': 0.94} 94%|█████████▎| 11533/12313 [8:38:00<36:00, 2.77s/it] 94%|█████████▎| 11534/12313 [8:38:03<36:29, 2.81s/it] {'loss': 0.4986, 'grad_norm': 6.0305409951018225, 'learning_rate': 5.23043249670821e-08, 'epoch': 0.94} 94%|█████████▎| 11534/12313 [8:38:03<36:29, 2.81s/it] 94%|█████████▎| 11535/12313 [8:38:06<35:58, 2.77s/it] {'loss': 0.647, 'grad_norm': 8.933422659020032, 'learning_rate': 5.2170594057058264e-08, 'epoch': 0.94} 94%|█████████▎| 11535/12313 [8:38:06<35:58, 2.77s/it] 94%|█████████▎| 11536/12313 [8:38:08<35:50, 2.77s/it] {'loss': 0.4578, 'grad_norm': 7.945666217999835, 'learning_rate': 5.2037032523873654e-08, 'epoch': 0.94} 94%|█████████▎| 11536/12313 [8:38:08<35:50, 2.77s/it] 94%|█████████▎| 11537/12313 [8:38:11<35:42, 2.76s/it] {'loss': 0.4933, 'grad_norm': 8.210836981044634, 'learning_rate': 5.190364037677142e-08, 'epoch': 0.94} 94%|█████████▎| 11537/12313 [8:38:11<35:42, 2.76s/it] 94%|█████████▎| 11538/12313 [8:38:14<35:59, 2.79s/it] {'loss': 0.4541, 'grad_norm': 4.671044337470819, 'learning_rate': 5.1770417624980306e-08, 'epoch': 0.94} 94%|█████████▎| 11538/12313 [8:38:14<35:59, 2.79s/it] 94%|█████████▎| 11539/12313 [8:38:17<36:14, 2.81s/it] {'loss': 0.5181, 'grad_norm': 6.043521315334802, 'learning_rate': 5.1637364277719595e-08, 'epoch': 0.94} 94%|█████████▎| 11539/12313 [8:38:17<36:14, 2.81s/it] 94%|█████████▎| 11540/12313 [8:38:20<36:17, 2.82s/it] {'loss': 0.5324, 'grad_norm': 5.1011629271022185, 'learning_rate': 5.150448034419525e-08, 'epoch': 0.94} 94%|█████████▎| 11540/12313 [8:38:20<36:17, 2.82s/it] 94%|█████████▎| 11541/12313 [8:38:22<35:57, 2.79s/it] {'loss': 0.4803, 'grad_norm': 3.8201585871136734, 'learning_rate': 5.1371765833602703e-08, 'epoch': 0.94} 94%|█████████▎| 11541/12313 [8:38:22<35:57, 2.79s/it] 94%|█████████▎| 11542/12313 [8:38:25<35:30, 2.76s/it] {'loss': 0.3395, 'grad_norm': 5.758818308718375, 'learning_rate': 5.123922075512461e-08, 'epoch': 0.94} 94%|█████████▎| 11542/12313 [8:38:25<35:30, 2.76s/it] 94%|█████████▎| 11543/12313 [8:38:28<35:40, 2.78s/it] {'loss': 0.6348, 'grad_norm': 3.435971230625552, 'learning_rate': 5.110684511793251e-08, 'epoch': 0.94} 94%|█████████▎| 11543/12313 [8:38:28<35:40, 2.78s/it] 94%|█████████▍| 11544/12313 [8:38:31<35:26, 2.77s/it] {'loss': 0.3566, 'grad_norm': 7.13731160639292, 'learning_rate': 5.0974638931186036e-08, 'epoch': 0.94} 94%|█████████▍| 11544/12313 [8:38:31<35:26, 2.77s/it] 94%|█████████▍| 11545/12313 [8:38:33<34:44, 2.71s/it] {'loss': 0.4923, 'grad_norm': 9.350076135939414, 'learning_rate': 5.084260220403342e-08, 'epoch': 0.94} 94%|█████████▍| 11545/12313 [8:38:33<34:44, 2.71s/it] 94%|█████████▍| 11546/12313 [8:38:36<35:15, 2.76s/it] {'loss': 0.4283, 'grad_norm': 4.901127986761642, 'learning_rate': 5.0710734945610686e-08, 'epoch': 0.94} 94%|█████████▍| 11546/12313 [8:38:36<35:15, 2.76s/it] 94%|█████████▍| 11547/12313 [8:38:39<35:34, 2.79s/it] {'loss': 0.491, 'grad_norm': 4.685757580589546, 'learning_rate': 5.057903716504248e-08, 'epoch': 0.94} 94%|█████████▍| 11547/12313 [8:38:39<35:34, 2.79s/it] 94%|█████████▍| 11548/12313 [8:38:42<35:17, 2.77s/it] {'loss': 0.2839, 'grad_norm': 5.048648910312251, 'learning_rate': 5.044750887144151e-08, 'epoch': 0.94} 94%|█████████▍| 11548/12313 [8:38:42<35:17, 2.77s/it] 94%|█████████▍| 11549/12313 [8:38:44<34:56, 2.74s/it] {'loss': 0.4318, 'grad_norm': 6.1324537629824345, 'learning_rate': 5.0316150073908555e-08, 'epoch': 0.94} 94%|█████████▍| 11549/12313 [8:38:44<34:56, 2.74s/it] 94%|█████████▍| 11550/12313 [8:38:47<34:54, 2.74s/it] {'loss': 0.3291, 'grad_norm': 5.300490311250102, 'learning_rate': 5.0184960781533844e-08, 'epoch': 0.94} 94%|█████████▍| 11550/12313 [8:38:47<34:54, 2.74s/it] 94%|█████████▍| 11551/12313 [8:38:50<36:54, 2.91s/it] {'loss': 0.6305, 'grad_norm': 4.481525918571603, 'learning_rate': 5.005394100339373e-08, 'epoch': 0.94} 94%|█████████▍| 11551/12313 [8:38:50<36:54, 2.91s/it] 94%|█████████▍| 11552/12313 [8:38:53<35:31, 2.80s/it] {'loss': 0.4441, 'grad_norm': 5.628772875047986, 'learning_rate': 4.992309074855484e-08, 'epoch': 0.94} 94%|█████████▍| 11552/12313 [8:38:53<35:31, 2.80s/it] 94%|█████████▍| 11553/12313 [8:38:56<35:00, 2.76s/it] {'loss': 0.362, 'grad_norm': 5.631338094522123, 'learning_rate': 4.97924100260716e-08, 'epoch': 0.94} 94%|█████████▍| 11553/12313 [8:38:56<35:00, 2.76s/it] 94%|█████████▍| 11554/12313 [8:38:58<35:14, 2.79s/it] {'loss': 0.4233, 'grad_norm': 4.450334044578378, 'learning_rate': 4.966189884498596e-08, 'epoch': 0.94} 94%|█████████▍| 11554/12313 [8:38:58<35:14, 2.79s/it] 94%|█████████▍| 11555/12313 [8:39:01<35:19, 2.80s/it] {'loss': 0.4751, 'grad_norm': 4.584301414669546, 'learning_rate': 4.953155721432873e-08, 'epoch': 0.94} 94%|█████████▍| 11555/12313 [8:39:01<35:19, 2.80s/it] 94%|█████████▍| 11556/12313 [8:39:04<34:30, 2.74s/it] {'loss': 0.5722, 'grad_norm': 4.749314594480385, 'learning_rate': 4.940138514311854e-08, 'epoch': 0.94} 94%|█████████▍| 11556/12313 [8:39:04<34:30, 2.74s/it] 94%|█████████▍| 11557/12313 [8:39:06<33:47, 2.68s/it] {'loss': 0.4222, 'grad_norm': 4.863986365737658, 'learning_rate': 4.927138264036291e-08, 'epoch': 0.94} 94%|█████████▍| 11557/12313 [8:39:06<33:47, 2.68s/it] 94%|█████████▍| 11558/12313 [8:39:09<33:33, 2.67s/it] {'loss': 0.4626, 'grad_norm': 5.649739794167962, 'learning_rate': 4.9141549715057415e-08, 'epoch': 0.94} 94%|█████████▍| 11558/12313 [8:39:09<33:33, 2.67s/it] 94%|█████████▍| 11559/12313 [8:39:11<32:51, 2.62s/it] {'loss': 0.412, 'grad_norm': 3.4200650534492807, 'learning_rate': 4.90118863761857e-08, 'epoch': 0.94} 94%|█████████▍| 11559/12313 [8:39:11<32:51, 2.62s/it] 94%|█████████▍| 11560/12313 [8:39:14<32:53, 2.62s/it] {'loss': 0.5341, 'grad_norm': 8.38158690698774, 'learning_rate': 4.888239263271977e-08, 'epoch': 0.94} 94%|█████████▍| 11560/12313 [8:39:14<32:53, 2.62s/it] 94%|█████████▍| 11561/12313 [8:39:17<33:03, 2.64s/it] {'loss': 0.4586, 'grad_norm': 3.2373374116022893, 'learning_rate': 4.875306849361994e-08, 'epoch': 0.94} 94%|█████████▍| 11561/12313 [8:39:17<33:03, 2.64s/it] 94%|█████████▍| 11562/12313 [8:39:19<33:10, 2.65s/it] {'loss': 0.3589, 'grad_norm': 4.301803597627936, 'learning_rate': 4.862391396783461e-08, 'epoch': 0.94} 94%|█████████▍| 11562/12313 [8:39:19<33:10, 2.65s/it] 94%|█████████▍| 11563/12313 [8:39:22<32:41, 2.62s/it] {'loss': 0.4699, 'grad_norm': 5.310812898753721, 'learning_rate': 4.849492906430081e-08, 'epoch': 0.94} 94%|█████████▍| 11563/12313 [8:39:22<32:41, 2.62s/it] 94%|█████████▍| 11564/12313 [8:39:25<33:03, 2.65s/it] {'loss': 0.5494, 'grad_norm': 5.309327329673865, 'learning_rate': 4.836611379194334e-08, 'epoch': 0.94} 94%|█████████▍| 11564/12313 [8:39:25<33:03, 2.65s/it] 94%|█████████▍| 11565/12313 [8:39:27<33:20, 2.67s/it] {'loss': 0.3454, 'grad_norm': 8.887102738638378, 'learning_rate': 4.8237468159675896e-08, 'epoch': 0.94} 94%|█████████▍| 11565/12313 [8:39:27<33:20, 2.67s/it] 94%|█████████▍| 11566/12313 [8:39:30<34:22, 2.76s/it] {'loss': 0.3838, 'grad_norm': 4.344793062039761, 'learning_rate': 4.810899217639997e-08, 'epoch': 0.94} 94%|█████████▍| 11566/12313 [8:39:30<34:22, 2.76s/it] 94%|█████████▍| 11567/12313 [8:39:33<33:28, 2.69s/it] {'loss': 0.463, 'grad_norm': 3.970983027129392, 'learning_rate': 4.798068585100513e-08, 'epoch': 0.94} 94%|█████████▍| 11567/12313 [8:39:33<33:28, 2.69s/it] 94%|█████████▍| 11568/12313 [8:39:36<35:10, 2.83s/it] {'loss': 0.4031, 'grad_norm': 6.57362417638814, 'learning_rate': 4.785254919236954e-08, 'epoch': 0.94} 94%|█████████▍| 11568/12313 [8:39:36<35:10, 2.83s/it] 94%|█████████▍| 11569/12313 [8:39:39<34:39, 2.80s/it] {'loss': 0.4996, 'grad_norm': 3.7805553967871552, 'learning_rate': 4.772458220936027e-08, 'epoch': 0.94} 94%|█████████▍| 11569/12313 [8:39:39<34:39, 2.80s/it] 94%|█████████▍| 11570/12313 [8:39:41<33:55, 2.74s/it] {'loss': 0.4602, 'grad_norm': 5.108413544198211, 'learning_rate': 4.7596784910830804e-08, 'epoch': 0.94} 94%|█████████▍| 11570/12313 [8:39:41<33:55, 2.74s/it] 94%|█████████▍| 11571/12313 [8:39:44<33:57, 2.75s/it] {'loss': 0.3966, 'grad_norm': 4.497648219050411, 'learning_rate': 4.74691573056249e-08, 'epoch': 0.94} 94%|█████████▍| 11571/12313 [8:39:44<33:57, 2.75s/it] 94%|█████████▍| 11572/12313 [8:39:47<33:42, 2.73s/it] {'loss': 0.3698, 'grad_norm': 4.1059390155620115, 'learning_rate': 4.7341699402573546e-08, 'epoch': 0.94} 94%|█████████▍| 11572/12313 [8:39:47<33:42, 2.73s/it] 94%|█████████▍| 11573/12313 [8:39:50<33:49, 2.74s/it] {'loss': 0.4798, 'grad_norm': 4.091197560447974, 'learning_rate': 4.721441121049608e-08, 'epoch': 0.94} 94%|█████████▍| 11573/12313 [8:39:50<33:49, 2.74s/it] 94%|█████████▍| 11574/12313 [8:39:52<33:36, 2.73s/it] {'loss': 0.3846, 'grad_norm': 5.006114532781862, 'learning_rate': 4.7087292738200454e-08, 'epoch': 0.94} 94%|█████████▍| 11574/12313 [8:39:52<33:36, 2.73s/it] 94%|█████████▍| 11575/12313 [8:39:55<33:51, 2.75s/it] {'loss': 0.3729, 'grad_norm': 7.010800653937726, 'learning_rate': 4.696034399448185e-08, 'epoch': 0.94} 94%|█████████▍| 11575/12313 [8:39:55<33:51, 2.75s/it] 94%|█████████▍| 11576/12313 [8:39:58<33:26, 2.72s/it] {'loss': 0.4974, 'grad_norm': 5.687679281302264, 'learning_rate': 4.6833564988124914e-08, 'epoch': 0.94} 94%|█████████▍| 11576/12313 [8:39:58<33:26, 2.72s/it] 94%|█████████▍| 11577/12313 [8:40:00<32:26, 2.64s/it] {'loss': 0.4069, 'grad_norm': 5.740323370278037, 'learning_rate': 4.670695572790235e-08, 'epoch': 0.94} 94%|█████████▍| 11577/12313 [8:40:00<32:26, 2.64s/it] 94%|█████████▍| 11578/12313 [8:40:03<32:28, 2.65s/it] {'loss': 0.6067, 'grad_norm': 8.241228285955831, 'learning_rate': 4.658051622257437e-08, 'epoch': 0.94} 94%|█████████▍| 11578/12313 [8:40:03<32:28, 2.65s/it] 94%|█████████▍| 11579/12313 [8:40:06<33:00, 2.70s/it] {'loss': 0.5117, 'grad_norm': 4.495511178750625, 'learning_rate': 4.6454246480890084e-08, 'epoch': 0.94} 94%|█████████▍| 11579/12313 [8:40:06<33:00, 2.70s/it] 94%|█████████▍| 11580/12313 [8:40:08<33:09, 2.71s/it] {'loss': 0.4359, 'grad_norm': 6.551988717623826, 'learning_rate': 4.632814651158696e-08, 'epoch': 0.94} 94%|█████████▍| 11580/12313 [8:40:08<33:09, 2.71s/it] 94%|█████████▍| 11581/12313 [8:40:11<33:08, 2.72s/it] {'loss': 0.4789, 'grad_norm': 9.683649036003146, 'learning_rate': 4.620221632338995e-08, 'epoch': 0.94} 94%|█████████▍| 11581/12313 [8:40:11<33:08, 2.72s/it] 94%|█████████▍| 11582/12313 [8:40:14<34:08, 2.80s/it] {'loss': 0.4102, 'grad_norm': 5.298725894305153, 'learning_rate': 4.607645592501347e-08, 'epoch': 0.94} 94%|█████████▍| 11582/12313 [8:40:14<34:08, 2.80s/it] 94%|█████████▍| 11583/12313 [8:40:17<34:04, 2.80s/it] {'loss': 0.5595, 'grad_norm': 5.2892762046601165, 'learning_rate': 4.5950865325158636e-08, 'epoch': 0.94} 94%|█████████▍| 11583/12313 [8:40:17<34:04, 2.80s/it] 94%|█████████▍| 11584/12313 [8:40:20<34:27, 2.84s/it] {'loss': 0.3798, 'grad_norm': 6.532711062097573, 'learning_rate': 4.582544453251597e-08, 'epoch': 0.94} 94%|█████████▍| 11584/12313 [8:40:20<34:27, 2.84s/it] 94%|█████████▍| 11585/12313 [8:40:22<33:27, 2.76s/it] {'loss': 0.444, 'grad_norm': 7.025731896892658, 'learning_rate': 4.57001935557641e-08, 'epoch': 0.94} 94%|█████████▍| 11585/12313 [8:40:22<33:27, 2.76s/it] 94%|█████████▍| 11586/12313 [8:40:26<36:17, 3.00s/it] {'loss': 0.4212, 'grad_norm': 4.380484134636201, 'learning_rate': 4.5575112403569985e-08, 'epoch': 0.94} 94%|█████████▍| 11586/12313 [8:40:26<36:17, 3.00s/it] 94%|█████████▍| 11587/12313 [8:40:29<34:54, 2.88s/it] {'loss': 0.428, 'grad_norm': 78.3339524286403, 'learning_rate': 4.545020108458781e-08, 'epoch': 0.94} 94%|█████████▍| 11587/12313 [8:40:29<34:54, 2.88s/it] 94%|█████████▍| 11588/12313 [8:40:31<34:38, 2.87s/it] {'loss': 0.374, 'grad_norm': 4.997755103806688, 'learning_rate': 4.5325459607461485e-08, 'epoch': 0.94} 94%|█████████▍| 11588/12313 [8:40:31<34:38, 2.87s/it] 94%|█████████▍| 11589/12313 [8:40:34<34:15, 2.84s/it] {'loss': 0.6705, 'grad_norm': 4.4071049437250505, 'learning_rate': 4.5200887980821897e-08, 'epoch': 0.94} 94%|█████████▍| 11589/12313 [8:40:34<34:15, 2.84s/it] 94%|█████████▍| 11590/12313 [8:40:37<33:46, 2.80s/it] {'loss': 0.4301, 'grad_norm': 4.496642306863949, 'learning_rate': 4.5076486213289086e-08, 'epoch': 0.94} 94%|█████████▍| 11590/12313 [8:40:37<33:46, 2.80s/it] 94%|█████████▍| 11591/12313 [8:40:41<37:00, 3.08s/it] {'loss': 0.4866, 'grad_norm': 6.033449529408409, 'learning_rate': 4.495225431347089e-08, 'epoch': 0.94} 94%|█████████▍| 11591/12313 [8:40:41<37:00, 3.08s/it] 94%|█████████▍| 11592/12313 [8:40:43<34:49, 2.90s/it] {'loss': 0.6155, 'grad_norm': 8.981792164965075, 'learning_rate': 4.482819228996377e-08, 'epoch': 0.94} 94%|█████████▍| 11592/12313 [8:40:43<34:49, 2.90s/it] 94%|█████████▍| 11593/12313 [8:40:46<34:13, 2.85s/it] {'loss': 0.5014, 'grad_norm': 6.80461573911351, 'learning_rate': 4.470430015135197e-08, 'epoch': 0.94} 94%|█████████▍| 11593/12313 [8:40:46<34:13, 2.85s/it] 94%|█████████▍| 11594/12313 [8:40:49<33:41, 2.81s/it] {'loss': 0.4495, 'grad_norm': 3.2551970239032495, 'learning_rate': 4.458057790620779e-08, 'epoch': 0.94} 94%|█████████▍| 11594/12313 [8:40:49<33:41, 2.81s/it] 94%|█████████▍| 11595/12313 [8:40:51<33:19, 2.79s/it] {'loss': 0.4876, 'grad_norm': 6.288968375149101, 'learning_rate': 4.4457025563092724e-08, 'epoch': 0.94} 94%|█████████▍| 11595/12313 [8:40:51<33:19, 2.79s/it] 94%|█████████▍| 11596/12313 [8:40:54<32:35, 2.73s/it] {'loss': 0.6019, 'grad_norm': 5.114911602404954, 'learning_rate': 4.433364313055549e-08, 'epoch': 0.94} 94%|█████████▍| 11596/12313 [8:40:54<32:35, 2.73s/it] 94%|█████████▍| 11597/12313 [8:40:57<31:56, 2.68s/it] {'loss': 0.3655, 'grad_norm': 10.285824379728702, 'learning_rate': 4.42104306171337e-08, 'epoch': 0.94} 94%|█████████▍| 11597/12313 [8:40:57<31:56, 2.68s/it] 94%|█████████▍| 11598/12313 [8:40:59<32:44, 2.75s/it] {'loss': 0.5237, 'grad_norm': 4.250288445214354, 'learning_rate': 4.4087388031353316e-08, 'epoch': 0.94} 94%|█████████▍| 11598/12313 [8:40:59<32:44, 2.75s/it] 94%|█████████▍| 11599/12313 [8:41:02<32:41, 2.75s/it] {'loss': 0.5283, 'grad_norm': 4.607829438175469, 'learning_rate': 4.39645153817278e-08, 'epoch': 0.94} 94%|█████████▍| 11599/12313 [8:41:02<32:41, 2.75s/it] 94%|█████████▍| 11600/12313 [8:41:05<32:06, 2.70s/it] {'loss': 0.4855, 'grad_norm': 7.915466583935277, 'learning_rate': 4.384181267675952e-08, 'epoch': 0.94} 94%|█████████▍| 11600/12313 [8:41:05<32:06, 2.70s/it] 94%|█████████▍| 11601/12313 [8:41:08<32:23, 2.73s/it] {'loss': 0.4598, 'grad_norm': 5.46136708291508, 'learning_rate': 4.3719279924938626e-08, 'epoch': 0.94} 94%|█████████▍| 11601/12313 [8:41:08<32:23, 2.73s/it] 94%|█████████▍| 11602/12313 [8:41:10<31:11, 2.63s/it] {'loss': 0.6132, 'grad_norm': 9.165200607430856, 'learning_rate': 4.35969171347439e-08, 'epoch': 0.94} 94%|█████████▍| 11602/12313 [8:41:10<31:11, 2.63s/it] 94%|█████████▍| 11603/12313 [8:41:13<31:25, 2.66s/it] {'loss': 0.5896, 'grad_norm': 5.415018652981148, 'learning_rate': 4.347472431464217e-08, 'epoch': 0.94} 94%|█████████▍| 11603/12313 [8:41:13<31:25, 2.66s/it] 94%|█████████▍| 11604/12313 [8:41:15<31:15, 2.65s/it] {'loss': 0.378, 'grad_norm': 7.296700648217176, 'learning_rate': 4.335270147308862e-08, 'epoch': 0.94} 94%|█████████▍| 11604/12313 [8:41:15<31:15, 2.65s/it] 94%|█████████▍| 11605/12313 [8:41:18<30:52, 2.62s/it] {'loss': 0.4858, 'grad_norm': 4.175947964206813, 'learning_rate': 4.32308486185265e-08, 'epoch': 0.94} 94%|█████████▍| 11605/12313 [8:41:18<30:52, 2.62s/it] 94%|█████████▍| 11606/12313 [8:41:21<32:46, 2.78s/it] {'loss': 0.3888, 'grad_norm': 5.139558164610286, 'learning_rate': 4.3109165759387115e-08, 'epoch': 0.94} 94%|█████████▍| 11606/12313 [8:41:21<32:46, 2.78s/it] 94%|█████████▍| 11607/12313 [8:41:24<33:34, 2.85s/it] {'loss': 0.5018, 'grad_norm': 4.109603857789413, 'learning_rate': 4.298765290409096e-08, 'epoch': 0.94} 94%|█████████▍| 11607/12313 [8:41:24<33:34, 2.85s/it] 94%|█████████▍| 11608/12313 [8:41:27<32:52, 2.80s/it] {'loss': 0.523, 'grad_norm': 5.23991591103504, 'learning_rate': 4.286631006104547e-08, 'epoch': 0.94} 94%|█████████▍| 11608/12313 [8:41:27<32:52, 2.80s/it] 94%|█████████▍| 11609/12313 [8:41:30<32:59, 2.81s/it] {'loss': 0.4094, 'grad_norm': 5.661548234950453, 'learning_rate': 4.2745137238646984e-08, 'epoch': 0.94} 94%|█████████▍| 11609/12313 [8:41:30<32:59, 2.81s/it] 94%|█████████▍| 11610/12313 [8:41:32<32:59, 2.82s/it] {'loss': 0.6268, 'grad_norm': 5.499414318975432, 'learning_rate': 4.2624134445280186e-08, 'epoch': 0.94} 94%|█████████▍| 11610/12313 [8:41:32<32:59, 2.82s/it] 94%|█████████▍| 11611/12313 [8:41:35<33:32, 2.87s/it] {'loss': 0.548, 'grad_norm': 4.495827244130177, 'learning_rate': 4.25033016893181e-08, 'epoch': 0.94} 94%|█████████▍| 11611/12313 [8:41:35<33:32, 2.87s/it] 94%|█████████▍| 11612/12313 [8:41:38<32:42, 2.80s/it] {'loss': 0.7681, 'grad_norm': 3.8103963823755094, 'learning_rate': 4.238263897912126e-08, 'epoch': 0.94} 94%|█████████▍| 11612/12313 [8:41:38<32:42, 2.80s/it] 94%|█████████▍| 11613/12313 [8:41:40<31:17, 2.68s/it] {'loss': 0.3692, 'grad_norm': 9.357203074745977, 'learning_rate': 4.22621463230391e-08, 'epoch': 0.94} 94%|█████████▍| 11613/12313 [8:41:40<31:17, 2.68s/it] 94%|█████████▍| 11614/12313 [8:41:43<32:09, 2.76s/it] {'loss': 0.601, 'grad_norm': 3.6750136809793412, 'learning_rate': 4.214182372940884e-08, 'epoch': 0.94} 94%|█████████▍| 11614/12313 [8:41:43<32:09, 2.76s/it] 94%|█████████▍| 11615/12313 [8:41:46<32:01, 2.75s/it] {'loss': 0.4761, 'grad_norm': 7.385444469733557, 'learning_rate': 4.202167120655631e-08, 'epoch': 0.94} 94%|█████████▍| 11615/12313 [8:41:46<32:01, 2.75s/it] 94%|█████████▍| 11616/12313 [8:41:49<31:50, 2.74s/it] {'loss': 0.4371, 'grad_norm': 5.303774677219012, 'learning_rate': 4.190168876279571e-08, 'epoch': 0.94} 94%|█████████▍| 11616/12313 [8:41:49<31:50, 2.74s/it] 94%|█████████▍| 11617/12313 [8:41:51<30:50, 2.66s/it] {'loss': 0.4442, 'grad_norm': 4.763056521648861, 'learning_rate': 4.1781876406428725e-08, 'epoch': 0.94} 94%|█████████▍| 11617/12313 [8:41:51<30:50, 2.66s/it] 94%|█████████▍| 11618/12313 [8:41:54<30:55, 2.67s/it] {'loss': 0.5426, 'grad_norm': 5.484993623084503, 'learning_rate': 4.1662234145746214e-08, 'epoch': 0.94} 94%|█████████▍| 11618/12313 [8:41:54<30:55, 2.67s/it] 94%|█████████▍| 11619/12313 [8:41:57<31:49, 2.75s/it] {'loss': 0.5846, 'grad_norm': 4.303398425305994, 'learning_rate': 4.154276198902629e-08, 'epoch': 0.94} 94%|█████████▍| 11619/12313 [8:41:57<31:49, 2.75s/it] 94%|█████████▍| 11620/12313 [8:41:59<31:02, 2.69s/it] {'loss': 0.6349, 'grad_norm': 5.295687518325545, 'learning_rate': 4.1423459944536224e-08, 'epoch': 0.94} 94%|█████████▍| 11620/12313 [8:41:59<31:02, 2.69s/it] 94%|█████████▍| 11621/12313 [8:42:03<32:18, 2.80s/it] {'loss': 0.4874, 'grad_norm': 4.737076827144997, 'learning_rate': 4.1304328020530804e-08, 'epoch': 0.94} 94%|█████████▍| 11621/12313 [8:42:03<32:18, 2.80s/it] 94%|█████████▍| 11622/12313 [8:42:05<31:43, 2.75s/it] {'loss': 0.3641, 'grad_norm': 15.165489379351301, 'learning_rate': 4.118536622525315e-08, 'epoch': 0.94} 94%|█████████▍| 11622/12313 [8:42:05<31:43, 2.75s/it] 94%|█████████▍| 11623/12313 [8:42:08<32:32, 2.83s/it] {'loss': 0.5441, 'grad_norm': 3.5716624018017877, 'learning_rate': 4.10665745669353e-08, 'epoch': 0.94} 94%|█████████▍| 11623/12313 [8:42:08<32:32, 2.83s/it] 94%|█████████▍| 11624/12313 [8:42:11<31:55, 2.78s/it] {'loss': 0.4997, 'grad_norm': 10.121245968679512, 'learning_rate': 4.094795305379679e-08, 'epoch': 0.94} 94%|█████████▍| 11624/12313 [8:42:11<31:55, 2.78s/it] 94%|█████████▍| 11625/12313 [8:42:13<30:54, 2.70s/it] {'loss': 0.4358, 'grad_norm': 7.249621116373978, 'learning_rate': 4.082950169404548e-08, 'epoch': 0.94} 94%|█████████▍| 11625/12313 [8:42:13<30:54, 2.70s/it] 94%|█████████▍| 11626/12313 [8:42:16<30:53, 2.70s/it] {'loss': 0.4766, 'grad_norm': 7.423045226596472, 'learning_rate': 4.071122049587789e-08, 'epoch': 0.94} 94%|█████████▍| 11626/12313 [8:42:16<30:53, 2.70s/it] 94%|█████████▍| 11627/12313 [8:42:19<31:17, 2.74s/it] {'loss': 0.3625, 'grad_norm': 7.43889332245654, 'learning_rate': 4.059310946747802e-08, 'epoch': 0.94} 94%|█████████▍| 11627/12313 [8:42:19<31:17, 2.74s/it] 94%|█████████▍| 11628/12313 [8:42:22<31:15, 2.74s/it] {'loss': 0.4215, 'grad_norm': 5.999420084859953, 'learning_rate': 4.047516861701878e-08, 'epoch': 0.94} 94%|█████████▍| 11628/12313 [8:42:22<31:15, 2.74s/it] 94%|█████████▍| 11629/12313 [8:42:24<31:21, 2.75s/it] {'loss': 0.5308, 'grad_norm': 3.9495688216293363, 'learning_rate': 4.035739795266086e-08, 'epoch': 0.94} 94%|█████████▍| 11629/12313 [8:42:24<31:21, 2.75s/it] 94%|█████████▍| 11630/12313 [8:42:27<30:37, 2.69s/it] {'loss': 0.4483, 'grad_norm': 8.225837590539006, 'learning_rate': 4.0239797482553856e-08, 'epoch': 0.94} 94%|█████████▍| 11630/12313 [8:42:27<30:37, 2.69s/it] 94%|█████████▍| 11631/12313 [8:42:30<31:28, 2.77s/it] {'loss': 0.5678, 'grad_norm': 4.507895023066098, 'learning_rate': 4.012236721483487e-08, 'epoch': 0.94} 94%|█████████▍| 11631/12313 [8:42:30<31:28, 2.77s/it] 94%|█████████▍| 11632/12313 [8:42:33<31:08, 2.74s/it] {'loss': 0.5971, 'grad_norm': 3.795653600393925, 'learning_rate': 4.0005107157628786e-08, 'epoch': 0.94} 94%|█████████▍| 11632/12313 [8:42:33<31:08, 2.74s/it] 94%|█████████▍| 11633/12313 [8:42:36<31:47, 2.81s/it] {'loss': 0.4818, 'grad_norm': 5.829598757301327, 'learning_rate': 3.988801731905051e-08, 'epoch': 0.94} 94%|█████████▍| 11633/12313 [8:42:36<31:47, 2.81s/it] 94%|█████████▍| 11634/12313 [8:42:38<31:00, 2.74s/it] {'loss': 0.3942, 'grad_norm': 6.893765169454947, 'learning_rate': 3.9771097707201056e-08, 'epoch': 0.94} 94%|█████████▍| 11634/12313 [8:42:38<31:00, 2.74s/it] 94%|█████████▍| 11635/12313 [8:42:41<30:15, 2.68s/it] {'loss': 0.5054, 'grad_norm': 7.0587960070765075, 'learning_rate': 3.965434833017118e-08, 'epoch': 0.94} 94%|█████████▍| 11635/12313 [8:42:41<30:15, 2.68s/it] 95%|█████████▍| 11636/12313 [8:42:44<31:04, 2.75s/it] {'loss': 0.3643, 'grad_norm': 8.051189199148112, 'learning_rate': 3.9537769196039134e-08, 'epoch': 0.95} 95%|█████████▍| 11636/12313 [8:42:44<31:04, 2.75s/it] 95%|█████████▍| 11637/12313 [8:42:46<31:31, 2.80s/it] {'loss': 0.6727, 'grad_norm': 4.735681936699135, 'learning_rate': 3.9421360312871804e-08, 'epoch': 0.95} 95%|█████████▍| 11637/12313 [8:42:46<31:31, 2.80s/it] 95%|█████████▍| 11638/12313 [8:42:50<32:31, 2.89s/it] {'loss': 0.5668, 'grad_norm': 3.9449239206953925, 'learning_rate': 3.9305121688723855e-08, 'epoch': 0.95} 95%|█████████▍| 11638/12313 [8:42:50<32:31, 2.89s/it] 95%|█████████▍| 11639/12313 [8:42:52<31:14, 2.78s/it] {'loss': 0.6188, 'grad_norm': 6.946560704332956, 'learning_rate': 3.918905333163858e-08, 'epoch': 0.95} 95%|█████████▍| 11639/12313 [8:42:52<31:14, 2.78s/it] 95%|█████████▍| 11640/12313 [8:42:55<30:42, 2.74s/it] {'loss': 0.6601, 'grad_norm': 8.04163254796676, 'learning_rate': 3.9073155249647055e-08, 'epoch': 0.95} 95%|█████████▍| 11640/12313 [8:42:55<30:42, 2.74s/it] 95%|█████████▍| 11641/12313 [8:42:57<29:49, 2.66s/it] {'loss': 0.4597, 'grad_norm': 7.0432614921051675, 'learning_rate': 3.895742745076869e-08, 'epoch': 0.95} 95%|█████████▍| 11641/12313 [8:42:57<29:49, 2.66s/it] 95%|█████████▍| 11642/12313 [8:43:00<29:51, 2.67s/it] {'loss': 0.4694, 'grad_norm': 5.740385711050734, 'learning_rate': 3.8841869943011534e-08, 'epoch': 0.95} 95%|█████████▍| 11642/12313 [8:43:00<29:51, 2.67s/it] 95%|█████████▍| 11643/12313 [8:43:03<29:57, 2.68s/it] {'loss': 0.4556, 'grad_norm': 3.826367448009286, 'learning_rate': 3.872648273437168e-08, 'epoch': 0.95} 95%|█████████▍| 11643/12313 [8:43:03<29:57, 2.68s/it] 95%|█████████▍| 11644/12313 [8:43:05<29:56, 2.68s/it] {'loss': 0.3736, 'grad_norm': 3.153187096292459, 'learning_rate': 3.861126583283303e-08, 'epoch': 0.95} 95%|█████████▍| 11644/12313 [8:43:05<29:56, 2.68s/it] 95%|█████████▍| 11645/12313 [8:43:08<29:15, 2.63s/it] {'loss': 0.5045, 'grad_norm': 5.786888401113132, 'learning_rate': 3.849621924636809e-08, 'epoch': 0.95} 95%|█████████▍| 11645/12313 [8:43:08<29:15, 2.63s/it] 95%|█████████▍| 11646/12313 [8:43:10<29:14, 2.63s/it] {'loss': 0.3993, 'grad_norm': 5.70036944601648, 'learning_rate': 3.838134298293744e-08, 'epoch': 0.95} 95%|█████████▍| 11646/12313 [8:43:10<29:14, 2.63s/it] 95%|█████████▍| 11647/12313 [8:43:13<30:03, 2.71s/it] {'loss': 0.6705, 'grad_norm': 5.14700753120722, 'learning_rate': 3.8266637050489716e-08, 'epoch': 0.95} 95%|█████████▍| 11647/12313 [8:43:13<30:03, 2.71s/it] 95%|█████████▍| 11648/12313 [8:43:16<30:54, 2.79s/it] {'loss': 0.5472, 'grad_norm': 5.028518558037041, 'learning_rate': 3.815210145696219e-08, 'epoch': 0.95} 95%|█████████▍| 11648/12313 [8:43:16<30:54, 2.79s/it] 95%|█████████▍| 11649/12313 [8:43:19<30:28, 2.75s/it] {'loss': 0.4755, 'grad_norm': 5.08541011233092, 'learning_rate': 3.803773621028045e-08, 'epoch': 0.95} 95%|█████████▍| 11649/12313 [8:43:19<30:28, 2.75s/it] 95%|█████████▍| 11650/12313 [8:43:21<29:35, 2.68s/it] {'loss': 0.4256, 'grad_norm': 5.674196216201576, 'learning_rate': 3.792354131835735e-08, 'epoch': 0.95} 95%|█████████▍| 11650/12313 [8:43:21<29:35, 2.68s/it] 95%|█████████▍| 11651/12313 [8:43:24<29:46, 2.70s/it] {'loss': 0.5258, 'grad_norm': 5.863265288069775, 'learning_rate': 3.780951678909489e-08, 'epoch': 0.95} 95%|█████████▍| 11651/12313 [8:43:24<29:46, 2.70s/it] 95%|█████████▍| 11652/12313 [8:43:27<29:54, 2.71s/it] {'loss': 0.534, 'grad_norm': 3.5058013583786476, 'learning_rate': 3.769566263038288e-08, 'epoch': 0.95} 95%|█████████▍| 11652/12313 [8:43:27<29:54, 2.71s/it] 95%|█████████▍| 11653/12313 [8:43:30<29:22, 2.67s/it] {'loss': 0.5344, 'grad_norm': 12.342326134411538, 'learning_rate': 3.7581978850099456e-08, 'epoch': 0.95} 95%|█████████▍| 11653/12313 [8:43:30<29:22, 2.67s/it] 95%|█████████▍| 11654/12313 [8:43:32<28:41, 2.61s/it] {'loss': 0.4811, 'grad_norm': 4.819965937928789, 'learning_rate': 3.7468465456110825e-08, 'epoch': 0.95} 95%|█████████▍| 11654/12313 [8:43:32<28:41, 2.61s/it] 95%|█████████▍| 11655/12313 [8:43:35<29:00, 2.65s/it] {'loss': 0.3502, 'grad_norm': 7.405750445686644, 'learning_rate': 3.735512245627182e-08, 'epoch': 0.95} 95%|█████████▍| 11655/12313 [8:43:35<29:00, 2.65s/it] 95%|█████████▍| 11656/12313 [8:43:37<28:51, 2.63s/it] {'loss': 0.4258, 'grad_norm': 9.446771351873101, 'learning_rate': 3.7241949858424777e-08, 'epoch': 0.95} 95%|█████████▍| 11656/12313 [8:43:37<28:51, 2.63s/it] 95%|█████████▍| 11657/12313 [8:43:40<29:02, 2.66s/it] {'loss': 0.5589, 'grad_norm': 6.936102522174853, 'learning_rate': 3.712894767040093e-08, 'epoch': 0.95} 95%|█████████▍| 11657/12313 [8:43:40<29:02, 2.66s/it] 95%|█████████▍| 11658/12313 [8:43:42<28:13, 2.58s/it] {'loss': 0.4372, 'grad_norm': 4.033630186749134, 'learning_rate': 3.7016115900019575e-08, 'epoch': 0.95} 95%|█████████▍| 11658/12313 [8:43:42<28:13, 2.58s/it] 95%|█████████▍| 11659/12313 [8:43:45<28:51, 2.65s/it] {'loss': 0.5026, 'grad_norm': 5.941857393690612, 'learning_rate': 3.690345455508754e-08, 'epoch': 0.95} 95%|█████████▍| 11659/12313 [8:43:45<28:51, 2.65s/it] 95%|█████████▍| 11660/12313 [8:43:48<28:49, 2.65s/it] {'loss': 0.4604, 'grad_norm': 9.14091774177606, 'learning_rate': 3.679096364340079e-08, 'epoch': 0.95} 95%|█████████▍| 11660/12313 [8:43:48<28:49, 2.65s/it] 95%|█████████▍| 11661/12313 [8:43:50<28:11, 2.59s/it] {'loss': 0.4393, 'grad_norm': 4.223097671499599, 'learning_rate': 3.6678643172742836e-08, 'epoch': 0.95} 95%|█████████▍| 11661/12313 [8:43:50<28:11, 2.59s/it] 95%|█████████▍| 11662/12313 [8:43:53<28:51, 2.66s/it] {'loss': 0.5762, 'grad_norm': 3.9468831332746643, 'learning_rate': 3.656649315088606e-08, 'epoch': 0.95} 95%|█████████▍| 11662/12313 [8:43:53<28:51, 2.66s/it] 95%|█████████▍| 11663/12313 [8:43:56<28:14, 2.61s/it] {'loss': 0.4932, 'grad_norm': 4.743190373097494, 'learning_rate': 3.6454513585590376e-08, 'epoch': 0.95} 95%|█████████▍| 11663/12313 [8:43:56<28:14, 2.61s/it] 95%|█████████▍| 11664/12313 [8:43:58<28:15, 2.61s/it] {'loss': 0.5384, 'grad_norm': 3.896032850566264, 'learning_rate': 3.634270448460403e-08, 'epoch': 0.95} 95%|█████████▍| 11664/12313 [8:43:58<28:15, 2.61s/it] 95%|█████████▍| 11665/12313 [8:44:01<28:30, 2.64s/it] {'loss': 0.3909, 'grad_norm': 4.452680361174725, 'learning_rate': 3.623106585566388e-08, 'epoch': 0.95} 95%|█████████▍| 11665/12313 [8:44:01<28:30, 2.64s/it] 95%|█████████▍| 11666/12313 [8:44:04<28:43, 2.66s/it] {'loss': 0.4937, 'grad_norm': 6.087481905533953, 'learning_rate': 3.611959770649487e-08, 'epoch': 0.95} 95%|█████████▍| 11666/12313 [8:44:04<28:43, 2.66s/it] 95%|█████████▍| 11667/12313 [8:44:07<30:18, 2.82s/it] {'loss': 0.5005, 'grad_norm': 4.040614567901036, 'learning_rate': 3.600830004480943e-08, 'epoch': 0.95} 95%|█████████▍| 11667/12313 [8:44:07<30:18, 2.82s/it] 95%|█████████▍| 11668/12313 [8:44:10<30:13, 2.81s/it] {'loss': 0.4575, 'grad_norm': 4.44420665256172, 'learning_rate': 3.589717287830946e-08, 'epoch': 0.95} 95%|█████████▍| 11668/12313 [8:44:10<30:13, 2.81s/it] 95%|█████████▍| 11669/12313 [8:44:13<30:09, 2.81s/it] {'loss': 0.5212, 'grad_norm': 5.616498527932605, 'learning_rate': 3.578621621468381e-08, 'epoch': 0.95} 95%|█████████▍| 11669/12313 [8:44:13<30:09, 2.81s/it] 95%|█████████▍| 11670/12313 [8:44:15<28:57, 2.70s/it] {'loss': 0.4217, 'grad_norm': 5.201502802582349, 'learning_rate': 3.567543006161051e-08, 'epoch': 0.95} 95%|█████████▍| 11670/12313 [8:44:15<28:57, 2.70s/it] 95%|█████████▍| 11671/12313 [8:44:18<28:46, 2.69s/it] {'loss': 0.4972, 'grad_norm': 16.378504345168306, 'learning_rate': 3.556481442675508e-08, 'epoch': 0.95} 95%|█████████▍| 11671/12313 [8:44:18<28:46, 2.69s/it] 95%|█████████▍| 11672/12313 [8:44:20<28:37, 2.68s/it] {'loss': 0.4814, 'grad_norm': 5.083869322555315, 'learning_rate': 3.5454369317771686e-08, 'epoch': 0.95} 95%|█████████▍| 11672/12313 [8:44:20<28:37, 2.68s/it] 95%|█████████▍| 11673/12313 [8:44:23<27:50, 2.61s/it] {'loss': 0.4003, 'grad_norm': 7.335908845803885, 'learning_rate': 3.534409474230255e-08, 'epoch': 0.95} 95%|█████████▍| 11673/12313 [8:44:23<27:50, 2.61s/it] 95%|█████████▍| 11674/12313 [8:44:26<29:04, 2.73s/it] {'loss': 0.4699, 'grad_norm': 10.162528148573347, 'learning_rate': 3.523399070797795e-08, 'epoch': 0.95} 95%|█████████▍| 11674/12313 [8:44:26<29:04, 2.73s/it] 95%|█████████▍| 11675/12313 [8:44:28<28:16, 2.66s/it] {'loss': 0.4235, 'grad_norm': 4.64890275823902, 'learning_rate': 3.512405722241652e-08, 'epoch': 0.95} 95%|█████████▍| 11675/12313 [8:44:28<28:16, 2.66s/it] 95%|█████████▍| 11676/12313 [8:44:31<28:08, 2.65s/it] {'loss': 0.5031, 'grad_norm': 5.016364672410445, 'learning_rate': 3.501429429322522e-08, 'epoch': 0.95} 95%|█████████▍| 11676/12313 [8:44:31<28:08, 2.65s/it] 95%|█████████▍| 11677/12313 [8:44:34<28:13, 2.66s/it] {'loss': 0.4388, 'grad_norm': 5.858582608768708, 'learning_rate': 3.4904701927999385e-08, 'epoch': 0.95} 95%|█████████▍| 11677/12313 [8:44:34<28:13, 2.66s/it] 95%|█████████▍| 11678/12313 [8:44:36<27:51, 2.63s/it] {'loss': 0.4828, 'grad_norm': 4.81391584316998, 'learning_rate': 3.479528013432154e-08, 'epoch': 0.95} 95%|█████████▍| 11678/12313 [8:44:36<27:51, 2.63s/it] 95%|█████████▍| 11679/12313 [8:44:39<28:05, 2.66s/it] {'loss': 0.5023, 'grad_norm': 8.006015757453834, 'learning_rate': 3.468602891976314e-08, 'epoch': 0.95} 95%|█████████▍| 11679/12313 [8:44:39<28:05, 2.66s/it] 95%|█████████▍| 11680/12313 [8:44:41<27:14, 2.58s/it] {'loss': 0.4342, 'grad_norm': 13.29141782498199, 'learning_rate': 3.457694829188452e-08, 'epoch': 0.95} 95%|█████████▍| 11680/12313 [8:44:41<27:14, 2.58s/it] 95%|█████████▍| 11681/12313 [8:44:44<27:45, 2.63s/it] {'loss': 0.3647, 'grad_norm': 10.20237535145513, 'learning_rate': 3.446803825823269e-08, 'epoch': 0.95} 95%|█████████▍| 11681/12313 [8:44:44<27:45, 2.63s/it] 95%|█████████▍| 11682/12313 [8:44:47<27:59, 2.66s/it] {'loss': 0.3275, 'grad_norm': 5.890040192499068, 'learning_rate': 3.435929882634415e-08, 'epoch': 0.95} 95%|█████████▍| 11682/12313 [8:44:47<27:59, 2.66s/it] 95%|█████████▍| 11683/12313 [8:44:49<28:03, 2.67s/it] {'loss': 0.4327, 'grad_norm': 10.83456434309621, 'learning_rate': 3.425073000374257e-08, 'epoch': 0.95} 95%|█████████▍| 11683/12313 [8:44:49<28:03, 2.67s/it] 95%|█████████▍| 11684/12313 [8:44:52<28:19, 2.70s/it] {'loss': 0.5753, 'grad_norm': 4.338209138202084, 'learning_rate': 3.4142331797940855e-08, 'epoch': 0.95} 95%|█████████▍| 11684/12313 [8:44:52<28:19, 2.70s/it] 95%|█████████▍| 11685/12313 [8:44:55<28:01, 2.68s/it] {'loss': 0.5673, 'grad_norm': 4.739057059307357, 'learning_rate': 3.4034104216439655e-08, 'epoch': 0.95} 95%|█████████▍| 11685/12313 [8:44:55<28:01, 2.68s/it] 95%|█████████▍| 11686/12313 [8:44:57<27:19, 2.62s/it] {'loss': 0.3894, 'grad_norm': 7.495421522605058, 'learning_rate': 3.3926047266727155e-08, 'epoch': 0.95} 95%|█████████▍| 11686/12313 [8:44:57<27:19, 2.62s/it] 95%|█████████▍| 11687/12313 [8:45:00<28:06, 2.69s/it] {'loss': 0.5159, 'grad_norm': 7.2165806929599166, 'learning_rate': 3.381816095628071e-08, 'epoch': 0.95} 95%|█████████▍| 11687/12313 [8:45:00<28:06, 2.69s/it] 95%|█████████▍| 11688/12313 [8:45:03<27:57, 2.68s/it] {'loss': 0.389, 'grad_norm': 3.9257600126354624, 'learning_rate': 3.371044529256573e-08, 'epoch': 0.95} 95%|█████████▍| 11688/12313 [8:45:03<27:57, 2.68s/it] 95%|█████████▍| 11689/12313 [8:45:05<27:52, 2.68s/it] {'loss': 0.4364, 'grad_norm': 9.018221626405468, 'learning_rate': 3.360290028303487e-08, 'epoch': 0.95} 95%|█████████▍| 11689/12313 [8:45:05<27:52, 2.68s/it] 95%|█████████▍| 11690/12313 [8:45:08<27:52, 2.68s/it] {'loss': 0.5857, 'grad_norm': 4.246553322453479, 'learning_rate': 3.34955259351305e-08, 'epoch': 0.95} 95%|█████████▍| 11690/12313 [8:45:08<27:52, 2.68s/it] 95%|█████████▍| 11691/12313 [8:45:11<27:53, 2.69s/it] {'loss': 0.4385, 'grad_norm': 8.521074043393677, 'learning_rate': 3.3388322256281694e-08, 'epoch': 0.95} 95%|█████████▍| 11691/12313 [8:45:11<27:53, 2.69s/it] 95%|█████████▍| 11692/12313 [8:45:14<27:54, 2.70s/it] {'loss': 0.5179, 'grad_norm': 8.037563989138466, 'learning_rate': 3.328128925390667e-08, 'epoch': 0.95} 95%|█████████▍| 11692/12313 [8:45:14<27:54, 2.70s/it] 95%|█████████▍| 11693/12313 [8:45:16<27:30, 2.66s/it] {'loss': 0.5983, 'grad_norm': 5.337969212336468, 'learning_rate': 3.317442693541145e-08, 'epoch': 0.95} 95%|█████████▍| 11693/12313 [8:45:16<27:30, 2.66s/it] 95%|█████████▍| 11694/12313 [8:45:19<28:35, 2.77s/it] {'loss': 0.5231, 'grad_norm': 5.550986004035013, 'learning_rate': 3.306773530819041e-08, 'epoch': 0.95} 95%|█████████▍| 11694/12313 [8:45:19<28:35, 2.77s/it] 95%|█████████▍| 11695/12313 [8:45:22<29:18, 2.85s/it] {'loss': 0.6063, 'grad_norm': 5.0987699618930975, 'learning_rate': 3.296121437962624e-08, 'epoch': 0.95} 95%|█████████▍| 11695/12313 [8:45:22<29:18, 2.85s/it] 95%|█████████▍| 11696/12313 [8:45:25<28:51, 2.81s/it] {'loss': 0.3196, 'grad_norm': 6.3006147509879105, 'learning_rate': 3.2854864157089164e-08, 'epoch': 0.95} 95%|█████████▍| 11696/12313 [8:45:25<28:51, 2.81s/it] 95%|█████████▍| 11697/12313 [8:45:28<28:51, 2.81s/it] {'loss': 0.5137, 'grad_norm': 4.720216946251891, 'learning_rate': 3.2748684647938564e-08, 'epoch': 0.95} 95%|█████████▍| 11697/12313 [8:45:28<28:51, 2.81s/it] 95%|█████████▌| 11698/12313 [8:45:30<27:30, 2.68s/it] {'loss': 0.5927, 'grad_norm': 17.425444481965904, 'learning_rate': 3.264267585952108e-08, 'epoch': 0.95} 95%|█████████▌| 11698/12313 [8:45:30<27:30, 2.68s/it] 95%|█████████▌| 11699/12313 [8:45:33<27:22, 2.67s/it] {'loss': 0.4547, 'grad_norm': 6.961626143865487, 'learning_rate': 3.253683779917194e-08, 'epoch': 0.95} 95%|█████████▌| 11699/12313 [8:45:33<27:22, 2.67s/it] 95%|█████████▌| 11700/12313 [8:45:35<26:33, 2.60s/it] {'loss': 0.5546, 'grad_norm': 7.6904422344104955, 'learning_rate': 3.243117047421501e-08, 'epoch': 0.95} 95%|█████████▌| 11700/12313 [8:45:35<26:33, 2.60s/it] 95%|█████████▌| 11701/12313 [8:45:38<26:59, 2.65s/it] {'loss': 0.641, 'grad_norm': 5.910333731484439, 'learning_rate': 3.2325673891961394e-08, 'epoch': 0.95} 95%|█████████▌| 11701/12313 [8:45:38<26:59, 2.65s/it] 95%|█████████▌| 11702/12313 [8:45:41<26:42, 2.62s/it] {'loss': 0.4257, 'grad_norm': 7.209494401473691, 'learning_rate': 3.222034805971136e-08, 'epoch': 0.95} 95%|█████████▌| 11702/12313 [8:45:41<26:42, 2.62s/it] 95%|█████████▌| 11703/12313 [8:45:43<27:29, 2.70s/it] {'loss': 0.3847, 'grad_norm': 5.915799539490019, 'learning_rate': 3.2115192984752684e-08, 'epoch': 0.95} 95%|█████████▌| 11703/12313 [8:45:43<27:29, 2.70s/it] 95%|█████████▌| 11704/12313 [8:45:47<28:41, 2.83s/it] {'loss': 0.6126, 'grad_norm': 8.657367196517342, 'learning_rate': 3.2010208674361774e-08, 'epoch': 0.95} 95%|█████████▌| 11704/12313 [8:45:47<28:41, 2.83s/it] 95%|█████████▌| 11705/12313 [8:45:49<27:47, 2.74s/it] {'loss': 0.4298, 'grad_norm': 5.775245158747669, 'learning_rate': 3.190539513580226e-08, 'epoch': 0.95} 95%|█████████▌| 11705/12313 [8:45:49<27:47, 2.74s/it] 95%|█████████▌| 11706/12313 [8:45:52<27:23, 2.71s/it] {'loss': 0.5534, 'grad_norm': 5.310693978477437, 'learning_rate': 3.1800752376327515e-08, 'epoch': 0.95} 95%|█████████▌| 11706/12313 [8:45:52<27:23, 2.71s/it] 95%|█████████▌| 11707/12313 [8:45:54<26:58, 2.67s/it] {'loss': 0.3459, 'grad_norm': 4.560752491636321, 'learning_rate': 3.169628040317785e-08, 'epoch': 0.95} 95%|█████████▌| 11707/12313 [8:45:54<26:58, 2.67s/it] 95%|█████████▌| 11708/12313 [8:45:57<26:52, 2.66s/it] {'loss': 0.4466, 'grad_norm': 3.5975956386159296, 'learning_rate': 3.15919792235822e-08, 'epoch': 0.95} 95%|█████████▌| 11708/12313 [8:45:57<26:52, 2.66s/it] 95%|█████████▌| 11709/12313 [8:46:00<26:57, 2.68s/it] {'loss': 0.5566, 'grad_norm': 4.190834372560731, 'learning_rate': 3.1487848844757865e-08, 'epoch': 0.95} 95%|█████████▌| 11709/12313 [8:46:00<26:57, 2.68s/it] 95%|█████████▌| 11710/12313 [8:46:02<26:50, 2.67s/it] {'loss': 0.5138, 'grad_norm': 4.688075724324062, 'learning_rate': 3.138388927391017e-08, 'epoch': 0.95} 95%|█████████▌| 11710/12313 [8:46:02<26:50, 2.67s/it] 95%|█████████▌| 11711/12313 [8:46:05<26:11, 2.61s/it] {'loss': 0.4949, 'grad_norm': 6.558872976862139, 'learning_rate': 3.1280100518231994e-08, 'epoch': 0.95} 95%|█████████▌| 11711/12313 [8:46:05<26:11, 2.61s/it] 95%|█████████▌| 11712/12313 [8:46:08<26:41, 2.66s/it] {'loss': 0.6129, 'grad_norm': 6.1291704390663515, 'learning_rate': 3.1176482584905356e-08, 'epoch': 0.95} 95%|█████████▌| 11712/12313 [8:46:08<26:41, 2.66s/it] 95%|█████████▌| 11713/12313 [8:46:10<26:35, 2.66s/it] {'loss': 0.5307, 'grad_norm': 4.609364781861959, 'learning_rate': 3.107303548110008e-08, 'epoch': 0.95} 95%|█████████▌| 11713/12313 [8:46:10<26:35, 2.66s/it] 95%|█████████▌| 11714/12313 [8:46:13<27:27, 2.75s/it] {'loss': 0.4039, 'grad_norm': 4.543991780518672, 'learning_rate': 3.0969759213974324e-08, 'epoch': 0.95} 95%|█████████▌| 11714/12313 [8:46:13<27:27, 2.75s/it] 95%|█████████▌| 11715/12313 [8:46:16<27:36, 2.77s/it] {'loss': 0.4611, 'grad_norm': 5.065242928777515, 'learning_rate': 3.086665379067405e-08, 'epoch': 0.95} 95%|█████████▌| 11715/12313 [8:46:16<27:36, 2.77s/it] 95%|█████████▌| 11716/12313 [8:46:19<27:19, 2.75s/it] {'loss': 0.4837, 'grad_norm': 7.643230789019235, 'learning_rate': 3.0763719218333545e-08, 'epoch': 0.95} 95%|█████████▌| 11716/12313 [8:46:19<27:19, 2.75s/it] 95%|█████████▌| 11717/12313 [8:46:21<27:03, 2.72s/it] {'loss': 0.4451, 'grad_norm': 3.926885842127398, 'learning_rate': 3.066095550407544e-08, 'epoch': 0.95} 95%|█████████▌| 11717/12313 [8:46:21<27:03, 2.72s/it] 95%|█████████▌| 11718/12313 [8:46:24<26:21, 2.66s/it] {'loss': 0.5284, 'grad_norm': 5.510721045832869, 'learning_rate': 3.0558362655010443e-08, 'epoch': 0.95} 95%|█████████▌| 11718/12313 [8:46:24<26:21, 2.66s/it] 95%|█████████▌| 11719/12313 [8:46:26<25:53, 2.62s/it] {'loss': 0.4785, 'grad_norm': 4.815199357131891, 'learning_rate': 3.045594067823704e-08, 'epoch': 0.95} 95%|█████████▌| 11719/12313 [8:46:26<25:53, 2.62s/it] 95%|█████████▌| 11720/12313 [8:46:29<26:01, 2.63s/it] {'loss': 0.5275, 'grad_norm': 9.394045320573086, 'learning_rate': 3.0353689580843174e-08, 'epoch': 0.95} 95%|█████████▌| 11720/12313 [8:46:29<26:01, 2.63s/it] 95%|█████████▌| 11721/12313 [8:46:32<26:13, 2.66s/it] {'loss': 0.4034, 'grad_norm': 7.043341703763583, 'learning_rate': 3.025160936990318e-08, 'epoch': 0.95} 95%|█████████▌| 11721/12313 [8:46:32<26:13, 2.66s/it] 95%|█████████▌| 11722/12313 [8:46:34<26:18, 2.67s/it] {'loss': 0.4658, 'grad_norm': 5.105234133065711, 'learning_rate': 3.0149700052481135e-08, 'epoch': 0.95} 95%|█████████▌| 11722/12313 [8:46:34<26:18, 2.67s/it] 95%|█████████▌| 11723/12313 [8:46:37<25:45, 2.62s/it] {'loss': 0.4, 'grad_norm': 20.843773029070668, 'learning_rate': 3.004796163562834e-08, 'epoch': 0.95} 95%|█████████▌| 11723/12313 [8:46:37<25:45, 2.62s/it] 95%|█████████▌| 11724/12313 [8:46:40<26:09, 2.66s/it] {'loss': 0.4094, 'grad_norm': 4.834619084091482, 'learning_rate': 2.994639412638445e-08, 'epoch': 0.95} 95%|█████████▌| 11724/12313 [8:46:40<26:09, 2.66s/it] 95%|█████████▌| 11725/12313 [8:46:43<26:23, 2.69s/it] {'loss': 0.4347, 'grad_norm': 5.958564630244604, 'learning_rate': 2.984499753177772e-08, 'epoch': 0.95} 95%|█████████▌| 11725/12313 [8:46:43<26:23, 2.69s/it] 95%|█████████▌| 11726/12313 [8:46:45<26:01, 2.66s/it] {'loss': 0.3909, 'grad_norm': 7.525314088951139, 'learning_rate': 2.9743771858823657e-08, 'epoch': 0.95} 95%|█████████▌| 11726/12313 [8:46:45<26:01, 2.66s/it] 95%|█████████▌| 11727/12313 [8:46:48<25:43, 2.63s/it] {'loss': 0.4202, 'grad_norm': 4.660331555963379, 'learning_rate': 2.9642717114527208e-08, 'epoch': 0.95} 95%|█████████▌| 11727/12313 [8:46:48<25:43, 2.63s/it] 95%|█████████▌| 11728/12313 [8:46:50<25:48, 2.65s/it] {'loss': 0.3993, 'grad_norm': 4.986110534638319, 'learning_rate': 2.9541833305880287e-08, 'epoch': 0.95} 95%|█████████▌| 11728/12313 [8:46:50<25:48, 2.65s/it] 95%|█████████▌| 11729/12313 [8:46:53<25:31, 2.62s/it] {'loss': 0.5475, 'grad_norm': 3.464614961858637, 'learning_rate': 2.9441120439864246e-08, 'epoch': 0.95} 95%|█████████▌| 11729/12313 [8:46:53<25:31, 2.62s/it] 95%|█████████▌| 11730/12313 [8:46:56<25:35, 2.63s/it] {'loss': 0.4057, 'grad_norm': 21.551152924357776, 'learning_rate': 2.9340578523447127e-08, 'epoch': 0.95} 95%|█████████▌| 11730/12313 [8:46:56<25:35, 2.63s/it] 95%|█████████▌| 11731/12313 [8:46:58<25:21, 2.61s/it] {'loss': 0.5294, 'grad_norm': 3.6579610094520283, 'learning_rate': 2.9240207563586142e-08, 'epoch': 0.95} 95%|█████████▌| 11731/12313 [8:46:58<25:21, 2.61s/it] 95%|█████████▌| 11732/12313 [8:47:01<25:30, 2.63s/it] {'loss': 0.4105, 'grad_norm': 5.768773438976465, 'learning_rate': 2.914000756722657e-08, 'epoch': 0.95} 95%|█████████▌| 11732/12313 [8:47:01<25:30, 2.63s/it] 95%|█████████▌| 11733/12313 [8:47:04<25:46, 2.67s/it] {'loss': 0.5214, 'grad_norm': 6.3584167885272285, 'learning_rate': 2.903997854130147e-08, 'epoch': 0.95} 95%|█████████▌| 11733/12313 [8:47:04<25:46, 2.67s/it] 95%|█████████▌| 11734/12313 [8:47:06<25:00, 2.59s/it] {'loss': 0.4954, 'grad_norm': 7.692904252140693, 'learning_rate': 2.8940120492732537e-08, 'epoch': 0.95} 95%|█████████▌| 11734/12313 [8:47:06<25:00, 2.59s/it] 95%|█████████▌| 11735/12313 [8:47:09<25:15, 2.62s/it] {'loss': 0.7178, 'grad_norm': 5.655701924850993, 'learning_rate': 2.8840433428429514e-08, 'epoch': 0.95} 95%|█████████▌| 11735/12313 [8:47:09<25:15, 2.62s/it] 95%|█████████▌| 11736/12313 [8:47:12<26:20, 2.74s/it] {'loss': 0.5701, 'grad_norm': 4.970188795620924, 'learning_rate': 2.8740917355290222e-08, 'epoch': 0.95} 95%|█████████▌| 11736/12313 [8:47:12<26:20, 2.74s/it] 95%|█████████▌| 11737/12313 [8:47:14<26:14, 2.73s/it] {'loss': 0.4525, 'grad_norm': 5.290922267712118, 'learning_rate': 2.864157228019998e-08, 'epoch': 0.95} 95%|█████████▌| 11737/12313 [8:47:14<26:14, 2.73s/it] 95%|█████████▌| 11738/12313 [8:47:17<25:55, 2.71s/it] {'loss': 0.4818, 'grad_norm': 4.627627408383265, 'learning_rate': 2.854239821003385e-08, 'epoch': 0.95} 95%|█████████▌| 11738/12313 [8:47:17<25:55, 2.71s/it] 95%|█████████▌| 11739/12313 [8:47:20<26:47, 2.80s/it] {'loss': 0.5673, 'grad_norm': 7.8256164051009325, 'learning_rate': 2.8443395151653562e-08, 'epoch': 0.95} 95%|█████████▌| 11739/12313 [8:47:20<26:47, 2.80s/it] 95%|█████████▌| 11740/12313 [8:47:23<26:31, 2.78s/it] {'loss': 0.3818, 'grad_norm': 6.249505047993975, 'learning_rate': 2.834456311190975e-08, 'epoch': 0.95} 95%|█████████▌| 11740/12313 [8:47:23<26:31, 2.78s/it] 95%|█████████▌| 11741/12313 [8:47:25<26:11, 2.75s/it] {'loss': 0.6104, 'grad_norm': 6.005374198159657, 'learning_rate': 2.8245902097641388e-08, 'epoch': 0.95} 95%|█████████▌| 11741/12313 [8:47:25<26:11, 2.75s/it] 95%|█████████▌| 11742/12313 [8:47:28<26:13, 2.76s/it] {'loss': 0.4915, 'grad_norm': 5.110767892723842, 'learning_rate': 2.8147412115674955e-08, 'epoch': 0.95} 95%|█████████▌| 11742/12313 [8:47:28<26:13, 2.76s/it] 95%|█████████▌| 11743/12313 [8:47:31<26:09, 2.75s/it] {'loss': 0.4256, 'grad_norm': 4.5731007153331955, 'learning_rate': 2.8049093172825282e-08, 'epoch': 0.95} 95%|█████████▌| 11743/12313 [8:47:31<26:09, 2.75s/it] 95%|█████████▌| 11744/12313 [8:47:34<25:35, 2.70s/it] {'loss': 0.4245, 'grad_norm': 10.106582363128767, 'learning_rate': 2.795094527589609e-08, 'epoch': 0.95} 95%|█████████▌| 11744/12313 [8:47:34<25:35, 2.70s/it] 95%|█████████▌| 11745/12313 [8:47:36<25:33, 2.70s/it] {'loss': 0.4379, 'grad_norm': 4.273795431273181, 'learning_rate': 2.7852968431678064e-08, 'epoch': 0.95} 95%|█████████▌| 11745/12313 [8:47:36<25:33, 2.70s/it] 95%|█████████▌| 11746/12313 [8:47:39<26:05, 2.76s/it] {'loss': 0.6883, 'grad_norm': 4.307573087745445, 'learning_rate': 2.7755162646950773e-08, 'epoch': 0.95} 95%|█████████▌| 11746/12313 [8:47:39<26:05, 2.76s/it] 95%|█████████▌| 11747/12313 [8:47:42<25:46, 2.73s/it] {'loss': 0.503, 'grad_norm': 5.5704933532695105, 'learning_rate': 2.7657527928482418e-08, 'epoch': 0.95} 95%|█████████▌| 11747/12313 [8:47:42<25:46, 2.73s/it] 95%|█████████▌| 11748/12313 [8:47:45<25:37, 2.72s/it] {'loss': 0.5112, 'grad_norm': 5.927129889565711, 'learning_rate': 2.756006428302843e-08, 'epoch': 0.95} 95%|█████████▌| 11748/12313 [8:47:45<25:37, 2.72s/it] 95%|█████████▌| 11749/12313 [8:47:47<25:32, 2.72s/it] {'loss': 0.4949, 'grad_norm': 7.739005076724821, 'learning_rate': 2.746277171733258e-08, 'epoch': 0.95} 95%|█████████▌| 11749/12313 [8:47:47<25:32, 2.72s/it] 95%|█████████▌| 11750/12313 [8:47:50<25:31, 2.72s/it] {'loss': 0.4114, 'grad_norm': 4.857345101026582, 'learning_rate': 2.736565023812754e-08, 'epoch': 0.95} 95%|█████████▌| 11750/12313 [8:47:50<25:31, 2.72s/it] 95%|█████████▌| 11751/12313 [8:47:52<24:45, 2.64s/it] {'loss': 0.4542, 'grad_norm': 5.671784265799197, 'learning_rate': 2.726869985213293e-08, 'epoch': 0.95} 95%|█████████▌| 11751/12313 [8:47:52<24:45, 2.64s/it] 95%|█████████▌| 11752/12313 [8:47:55<24:33, 2.63s/it] {'loss': 0.4281, 'grad_norm': 4.542817164476761, 'learning_rate': 2.717192056605783e-08, 'epoch': 0.95} 95%|█████████▌| 11752/12313 [8:47:55<24:33, 2.63s/it] 95%|█████████▌| 11753/12313 [8:47:58<24:53, 2.67s/it] {'loss': 0.5609, 'grad_norm': 4.657810531667513, 'learning_rate': 2.7075312386598274e-08, 'epoch': 0.95} 95%|█████████▌| 11753/12313 [8:47:58<24:53, 2.67s/it] 95%|█████████▌| 11754/12313 [8:48:00<24:53, 2.67s/it] {'loss': 0.4775, 'grad_norm': 4.663673614763122, 'learning_rate': 2.697887532043947e-08, 'epoch': 0.95} 95%|█████████▌| 11754/12313 [8:48:00<24:53, 2.67s/it] 95%|█████████▌| 11755/12313 [8:48:03<24:57, 2.68s/it] {'loss': 0.5095, 'grad_norm': 5.2321773718184765, 'learning_rate': 2.688260937425413e-08, 'epoch': 0.95} 95%|█████████▌| 11755/12313 [8:48:03<24:57, 2.68s/it] 95%|█████████▌| 11756/12313 [8:48:06<24:46, 2.67s/it] {'loss': 0.4052, 'grad_norm': 4.5112988407221835, 'learning_rate': 2.67865145547036e-08, 'epoch': 0.95} 95%|█████████▌| 11756/12313 [8:48:06<24:46, 2.67s/it] 95%|█████████▌| 11757/12313 [8:48:08<24:36, 2.65s/it] {'loss': 0.5502, 'grad_norm': 4.994296884071479, 'learning_rate': 2.6690590868436728e-08, 'epoch': 0.95} 95%|█████████▌| 11757/12313 [8:48:08<24:36, 2.65s/it] 95%|█████████▌| 11758/12313 [8:48:11<25:27, 2.75s/it] {'loss': 0.5292, 'grad_norm': 5.907644955879376, 'learning_rate': 2.6594838322091255e-08, 'epoch': 0.95} 95%|█████████▌| 11758/12313 [8:48:11<25:27, 2.75s/it] 96%|█████████▌| 11759/12313 [8:48:14<25:18, 2.74s/it] {'loss': 0.532, 'grad_norm': 4.76086278575601, 'learning_rate': 2.6499256922292715e-08, 'epoch': 0.96} 96%|█████████▌| 11759/12313 [8:48:14<25:18, 2.74s/it] 96%|█████████▌| 11760/12313 [8:48:17<24:56, 2.71s/it] {'loss': 0.5241, 'grad_norm': 10.420115891569926, 'learning_rate': 2.640384667565471e-08, 'epoch': 0.96} 96%|█████████▌| 11760/12313 [8:48:17<24:56, 2.71s/it] 96%|█████████▌| 11761/12313 [8:48:19<24:13, 2.63s/it] {'loss': 0.3722, 'grad_norm': 8.326204967482434, 'learning_rate': 2.6308607588779177e-08, 'epoch': 0.96} 96%|█████████▌| 11761/12313 [8:48:19<24:13, 2.63s/it] 96%|█████████▌| 11762/12313 [8:48:22<24:28, 2.66s/it] {'loss': 0.4494, 'grad_norm': 4.358771745047045, 'learning_rate': 2.6213539668256126e-08, 'epoch': 0.96} 96%|█████████▌| 11762/12313 [8:48:22<24:28, 2.66s/it] 96%|█████████▌| 11763/12313 [8:48:25<24:05, 2.63s/it] {'loss': 0.4823, 'grad_norm': 3.6696692614668676, 'learning_rate': 2.6118642920663906e-08, 'epoch': 0.96} 96%|█████████▌| 11763/12313 [8:48:25<24:05, 2.63s/it] 96%|█████████▌| 11764/12313 [8:48:27<23:29, 2.57s/it] {'loss': 0.4218, 'grad_norm': 3.432418349481183, 'learning_rate': 2.6023917352568652e-08, 'epoch': 0.96} 96%|█████████▌| 11764/12313 [8:48:27<23:29, 2.57s/it] 96%|█████████▌| 11765/12313 [8:48:30<24:10, 2.65s/it] {'loss': 0.5265, 'grad_norm': 5.4984450992011515, 'learning_rate': 2.592936297052512e-08, 'epoch': 0.96} 96%|█████████▌| 11765/12313 [8:48:30<24:10, 2.65s/it] 96%|█████████▌| 11766/12313 [8:48:32<24:07, 2.65s/it] {'loss': 0.3908, 'grad_norm': 4.899194056779067, 'learning_rate': 2.5834979781075854e-08, 'epoch': 0.96} 96%|█████████▌| 11766/12313 [8:48:32<24:07, 2.65s/it] 96%|█████████▌| 11767/12313 [8:48:36<25:23, 2.79s/it] {'loss': 0.3868, 'grad_norm': 3.7039379030025317, 'learning_rate': 2.5740767790751463e-08, 'epoch': 0.96} 96%|█████████▌| 11767/12313 [8:48:36<25:23, 2.79s/it] 96%|█████████▌| 11768/12313 [8:48:38<25:09, 2.77s/it] {'loss': 0.4753, 'grad_norm': 7.5200178859973725, 'learning_rate': 2.5646727006071182e-08, 'epoch': 0.96} 96%|█████████▌| 11768/12313 [8:48:38<25:09, 2.77s/it] 96%|█████████▌| 11769/12313 [8:48:41<24:51, 2.74s/it] {'loss': 0.5694, 'grad_norm': 5.533797272476705, 'learning_rate': 2.55528574335423e-08, 'epoch': 0.96} 96%|█████████▌| 11769/12313 [8:48:41<24:51, 2.74s/it] 96%|█████████▌| 11770/12313 [8:48:44<24:54, 2.75s/it] {'loss': 0.4299, 'grad_norm': 4.804791486887786, 'learning_rate': 2.5459159079659625e-08, 'epoch': 0.96} 96%|█████████▌| 11770/12313 [8:48:44<24:54, 2.75s/it] 96%|█████████▌| 11771/12313 [8:48:46<24:46, 2.74s/it] {'loss': 0.2743, 'grad_norm': 5.494381293218291, 'learning_rate': 2.5365631950906856e-08, 'epoch': 0.96} 96%|█████████▌| 11771/12313 [8:48:46<24:46, 2.74s/it] 96%|█████████▌| 11772/12313 [8:48:49<24:37, 2.73s/it] {'loss': 0.5166, 'grad_norm': 5.394100187529947, 'learning_rate': 2.5272276053755207e-08, 'epoch': 0.96} 96%|█████████▌| 11772/12313 [8:48:49<24:37, 2.73s/it] 96%|█████████▌| 11773/12313 [8:48:52<24:49, 2.76s/it] {'loss': 0.3647, 'grad_norm': 3.979565293315754, 'learning_rate': 2.5179091394665346e-08, 'epoch': 0.96} 96%|█████████▌| 11773/12313 [8:48:52<24:49, 2.76s/it] 96%|█████████▌| 11774/12313 [8:48:55<25:08, 2.80s/it] {'loss': 0.4242, 'grad_norm': 5.9411168688277245, 'learning_rate': 2.5086077980084057e-08, 'epoch': 0.96} 96%|█████████▌| 11774/12313 [8:48:55<25:08, 2.80s/it] 96%|█████████▌| 11775/12313 [8:48:58<24:45, 2.76s/it] {'loss': 0.4753, 'grad_norm': 6.4302270705636975, 'learning_rate': 2.4993235816448136e-08, 'epoch': 0.96} 96%|█████████▌| 11775/12313 [8:48:58<24:45, 2.76s/it] 96%|█████████▌| 11776/12313 [8:49:00<24:39, 2.75s/it] {'loss': 0.4724, 'grad_norm': 4.533132653222595, 'learning_rate': 2.4900564910181334e-08, 'epoch': 0.96} 96%|█████████▌| 11776/12313 [8:49:00<24:39, 2.75s/it] 96%|█████████▌| 11777/12313 [8:49:03<24:14, 2.71s/it] {'loss': 0.3712, 'grad_norm': 6.793870761327284, 'learning_rate': 2.4808065267696303e-08, 'epoch': 0.96} 96%|█████████▌| 11777/12313 [8:49:03<24:14, 2.71s/it] 96%|█████████▌| 11778/12313 [8:49:06<25:15, 2.83s/it] {'loss': 0.4916, 'grad_norm': 6.118222175954092, 'learning_rate': 2.4715736895393195e-08, 'epoch': 0.96} 96%|█████████▌| 11778/12313 [8:49:06<25:15, 2.83s/it] 96%|█████████▌| 11779/12313 [8:49:08<24:20, 2.73s/it] {'loss': 0.4406, 'grad_norm': 6.201932383609763, 'learning_rate': 2.462357979966107e-08, 'epoch': 0.96} 96%|█████████▌| 11779/12313 [8:49:08<24:20, 2.73s/it] 96%|█████████▌| 11780/12313 [8:49:11<23:48, 2.68s/it] {'loss': 0.47, 'grad_norm': 5.730634163072851, 'learning_rate': 2.453159398687649e-08, 'epoch': 0.96} 96%|█████████▌| 11780/12313 [8:49:11<23:48, 2.68s/it] 96%|█████████▌| 11781/12313 [8:49:14<24:09, 2.72s/it] {'loss': 0.5144, 'grad_norm': 5.240490008736156, 'learning_rate': 2.443977946340409e-08, 'epoch': 0.96} 96%|█████████▌| 11781/12313 [8:49:14<24:09, 2.72s/it] 96%|█████████▌| 11782/12313 [8:49:17<25:05, 2.84s/it] {'loss': 0.4603, 'grad_norm': 2.6161093483379783, 'learning_rate': 2.4348136235597398e-08, 'epoch': 0.96} 96%|█████████▌| 11782/12313 [8:49:17<25:05, 2.84s/it] 96%|█████████▌| 11783/12313 [8:49:20<24:36, 2.79s/it] {'loss': 0.5706, 'grad_norm': 4.15867662957049, 'learning_rate': 2.425666430979773e-08, 'epoch': 0.96} 96%|█████████▌| 11783/12313 [8:49:20<24:36, 2.79s/it] 96%|█████████▌| 11784/12313 [8:49:22<23:39, 2.68s/it] {'loss': 0.477, 'grad_norm': 9.450975853907856, 'learning_rate': 2.416536369233391e-08, 'epoch': 0.96} 96%|█████████▌| 11784/12313 [8:49:22<23:39, 2.68s/it] 96%|█████████▌| 11785/12313 [8:49:24<22:48, 2.59s/it] {'loss': 0.502, 'grad_norm': 7.347938995622321, 'learning_rate': 2.4074234389523665e-08, 'epoch': 0.96} 96%|█████████▌| 11785/12313 [8:49:24<22:48, 2.59s/it] 96%|█████████▌| 11786/12313 [8:49:27<22:16, 2.54s/it] {'loss': 0.4742, 'grad_norm': 4.253055407962608, 'learning_rate': 2.3983276407672784e-08, 'epoch': 0.96} 96%|█████████▌| 11786/12313 [8:49:27<22:16, 2.54s/it] 96%|█████████▌| 11787/12313 [8:49:30<22:32, 2.57s/it] {'loss': 0.6164, 'grad_norm': 3.0805970942274135, 'learning_rate': 2.389248975307512e-08, 'epoch': 0.96} 96%|█████████▌| 11787/12313 [8:49:30<22:32, 2.57s/it] 96%|█████████▌| 11788/12313 [8:49:32<22:06, 2.53s/it] {'loss': 0.5219, 'grad_norm': 5.749655956883042, 'learning_rate': 2.3801874432012594e-08, 'epoch': 0.96} 96%|█████████▌| 11788/12313 [8:49:32<22:06, 2.53s/it] 96%|█████████▌| 11789/12313 [8:49:34<22:00, 2.52s/it] {'loss': 0.5919, 'grad_norm': 4.8340592635662345, 'learning_rate': 2.371143045075519e-08, 'epoch': 0.96} 96%|█████████▌| 11789/12313 [8:49:34<22:00, 2.52s/it] 96%|█████████▌| 11790/12313 [8:49:37<21:36, 2.48s/it] {'loss': 0.6845, 'grad_norm': 3.8977820934124066, 'learning_rate': 2.3621157815561237e-08, 'epoch': 0.96} 96%|█████████▌| 11790/12313 [8:49:37<21:36, 2.48s/it] 96%|█████████▌| 11791/12313 [8:49:39<21:14, 2.44s/it] {'loss': 0.5546, 'grad_norm': 9.678166602329553, 'learning_rate': 2.3531056532677122e-08, 'epoch': 0.96} 96%|█████████▌| 11791/12313 [8:49:39<21:14, 2.44s/it] 96%|█████████▌| 11792/12313 [8:49:42<22:00, 2.53s/it] {'loss': 0.455, 'grad_norm': 7.305637570872725, 'learning_rate': 2.3441126608337304e-08, 'epoch': 0.96} 96%|█████████▌| 11792/12313 [8:49:42<22:00, 2.53s/it] 96%|█████████▌| 11793/12313 [8:49:44<21:44, 2.51s/it] {'loss': 0.5323, 'grad_norm': 3.8733070193102384, 'learning_rate': 2.335136804876459e-08, 'epoch': 0.96} 96%|█████████▌| 11793/12313 [8:49:44<21:44, 2.51s/it] 96%|█████████▌| 11794/12313 [8:49:47<22:05, 2.55s/it] {'loss': 0.4325, 'grad_norm': 3.9403704900752636, 'learning_rate': 2.3261780860169558e-08, 'epoch': 0.96} 96%|█████████▌| 11794/12313 [8:49:47<22:05, 2.55s/it] 96%|█████████▌| 11795/12313 [8:49:50<22:03, 2.56s/it] {'loss': 0.4517, 'grad_norm': 5.6576542673173575, 'learning_rate': 2.31723650487517e-08, 'epoch': 0.96} 96%|█████████▌| 11795/12313 [8:49:50<22:03, 2.56s/it] 96%|█████████▌| 11796/12313 [8:49:52<22:45, 2.64s/it] {'loss': 0.3581, 'grad_norm': 6.1828309207281755, 'learning_rate': 2.3083120620697453e-08, 'epoch': 0.96} 96%|█████████▌| 11796/12313 [8:49:52<22:45, 2.64s/it] 96%|█████████▌| 11797/12313 [8:49:55<22:30, 2.62s/it] {'loss': 0.3434, 'grad_norm': 4.945073828835356, 'learning_rate': 2.2994047582182433e-08, 'epoch': 0.96} 96%|█████████▌| 11797/12313 [8:49:55<22:30, 2.62s/it] 96%|█████████▌| 11798/12313 [8:49:58<22:56, 2.67s/it] {'loss': 0.4171, 'grad_norm': 6.8876285655205285, 'learning_rate': 2.2905145939369765e-08, 'epoch': 0.96} 96%|█████████▌| 11798/12313 [8:49:58<22:56, 2.67s/it] 96%|█████████▌| 11799/12313 [8:50:01<23:02, 2.69s/it] {'loss': 0.518, 'grad_norm': 5.814280241290536, 'learning_rate': 2.2816415698411475e-08, 'epoch': 0.96} 96%|█████████▌| 11799/12313 [8:50:01<23:02, 2.69s/it] 96%|█████████▌| 11800/12313 [8:50:03<23:09, 2.71s/it] {'loss': 0.4784, 'grad_norm': 6.165510069966149, 'learning_rate': 2.272785686544682e-08, 'epoch': 0.96} 96%|█████████▌| 11800/12313 [8:50:03<23:09, 2.71s/it] 96%|█████████▌| 11801/12313 [8:50:06<22:51, 2.68s/it] {'loss': 0.4269, 'grad_norm': 5.755892898880744, 'learning_rate': 2.263946944660367e-08, 'epoch': 0.96} 96%|█████████▌| 11801/12313 [8:50:06<22:51, 2.68s/it] 96%|█████████▌| 11802/12313 [8:50:08<22:33, 2.65s/it] {'loss': 0.3444, 'grad_norm': 10.987936416240586, 'learning_rate': 2.2551253447997968e-08, 'epoch': 0.96} 96%|█████████▌| 11802/12313 [8:50:08<22:33, 2.65s/it] 96%|█████████▌| 11803/12313 [8:50:11<22:43, 2.67s/it] {'loss': 0.5635, 'grad_norm': 15.074930100573221, 'learning_rate': 2.2463208875733723e-08, 'epoch': 0.96} 96%|█████████▌| 11803/12313 [8:50:11<22:43, 2.67s/it] 96%|█████████▌| 11804/12313 [8:50:14<22:50, 2.69s/it] {'loss': 0.5389, 'grad_norm': 4.1773061107164935, 'learning_rate': 2.237533573590328e-08, 'epoch': 0.96} 96%|█████████▌| 11804/12313 [8:50:14<22:50, 2.69s/it] 96%|█████████▌| 11805/12313 [8:50:17<23:42, 2.80s/it] {'loss': 0.6207, 'grad_norm': 4.63238528075123, 'learning_rate': 2.228763403458706e-08, 'epoch': 0.96} 96%|█████████▌| 11805/12313 [8:50:17<23:42, 2.80s/it] 96%|█████████▌| 11806/12313 [8:50:20<23:00, 2.72s/it] {'loss': 0.5594, 'grad_norm': 5.9729583458173945, 'learning_rate': 2.2200103777853255e-08, 'epoch': 0.96} 96%|█████████▌| 11806/12313 [8:50:20<23:00, 2.72s/it] 96%|█████████▌| 11807/12313 [8:50:22<23:00, 2.73s/it] {'loss': 0.58, 'grad_norm': 9.831624315329341, 'learning_rate': 2.211274497175897e-08, 'epoch': 0.96} 96%|█████████▌| 11807/12313 [8:50:22<23:00, 2.73s/it] 96%|█████████▌| 11808/12313 [8:50:25<23:23, 2.78s/it] {'loss': 0.5508, 'grad_norm': 6.030561079423977, 'learning_rate': 2.2025557622348537e-08, 'epoch': 0.96} 96%|█████████▌| 11808/12313 [8:50:25<23:23, 2.78s/it] 96%|█████████▌| 11809/12313 [8:50:28<22:59, 2.74s/it] {'loss': 0.5854, 'grad_norm': 5.732610023291474, 'learning_rate': 2.1938541735655183e-08, 'epoch': 0.96} 96%|█████████▌| 11809/12313 [8:50:28<22:59, 2.74s/it] 96%|█████████▌| 11810/12313 [8:50:30<22:23, 2.67s/it] {'loss': 0.3753, 'grad_norm': 16.34752832829806, 'learning_rate': 2.1851697317699373e-08, 'epoch': 0.96} 96%|█████████▌| 11810/12313 [8:50:30<22:23, 2.67s/it] 96%|█████████▌| 11811/12313 [8:50:33<22:22, 2.67s/it] {'loss': 0.4773, 'grad_norm': 8.516533326876681, 'learning_rate': 2.1765024374491018e-08, 'epoch': 0.96} 96%|█████████▌| 11811/12313 [8:50:33<22:22, 2.67s/it] 96%|█████████▌| 11812/12313 [8:50:36<22:10, 2.65s/it] {'loss': 0.4447, 'grad_norm': 4.534008961064627, 'learning_rate': 2.1678522912026988e-08, 'epoch': 0.96} 96%|█████████▌| 11812/12313 [8:50:36<22:10, 2.65s/it] 96%|█████████▌| 11813/12313 [8:50:38<22:24, 2.69s/it] {'loss': 0.4867, 'grad_norm': 5.048729203766718, 'learning_rate': 2.1592192936292777e-08, 'epoch': 0.96} 96%|█████████▌| 11813/12313 [8:50:38<22:24, 2.69s/it] 96%|█████████▌| 11814/12313 [8:50:41<22:26, 2.70s/it] {'loss': 0.4589, 'grad_norm': 5.401071455659973, 'learning_rate': 2.1506034453262214e-08, 'epoch': 0.96} 96%|█████████▌| 11814/12313 [8:50:41<22:26, 2.70s/it] 96%|█████████▌| 11815/12313 [8:50:44<22:58, 2.77s/it] {'loss': 0.4399, 'grad_norm': 4.550782933767591, 'learning_rate': 2.142004746889692e-08, 'epoch': 0.96} 96%|█████████▌| 11815/12313 [8:50:44<22:58, 2.77s/it] 96%|█████████▌| 11816/12313 [8:50:47<22:23, 2.70s/it] {'loss': 0.5894, 'grad_norm': 7.657034015101817, 'learning_rate': 2.1334231989146304e-08, 'epoch': 0.96} 96%|█████████▌| 11816/12313 [8:50:47<22:23, 2.70s/it] 96%|█████████▌| 11817/12313 [8:50:49<22:33, 2.73s/it] {'loss': 0.5836, 'grad_norm': 4.68873295706115, 'learning_rate': 2.124858801994867e-08, 'epoch': 0.96} 96%|█████████▌| 11817/12313 [8:50:49<22:33, 2.73s/it] 96%|█████████▌| 11818/12313 [8:50:52<22:28, 2.72s/it] {'loss': 0.6573, 'grad_norm': 5.932018957670042, 'learning_rate': 2.1163115567230386e-08, 'epoch': 0.96} 96%|█████████▌| 11818/12313 [8:50:52<22:28, 2.72s/it] 96%|█████████▌| 11819/12313 [8:50:55<21:40, 2.63s/it] {'loss': 0.5133, 'grad_norm': 6.610287249201689, 'learning_rate': 2.1077814636905337e-08, 'epoch': 0.96} 96%|█████████▌| 11819/12313 [8:50:55<21:40, 2.63s/it] 96%|█████████▌| 11820/12313 [8:50:57<21:10, 2.58s/it] {'loss': 0.406, 'grad_norm': 4.576678178324454, 'learning_rate': 2.099268523487602e-08, 'epoch': 0.96} 96%|█████████▌| 11820/12313 [8:50:57<21:10, 2.58s/it] 96%|█████████▌| 11821/12313 [8:51:00<21:12, 2.59s/it] {'loss': 0.5465, 'grad_norm': 5.480905722183299, 'learning_rate': 2.0907727367033005e-08, 'epoch': 0.96} 96%|█████████▌| 11821/12313 [8:51:00<21:12, 2.59s/it] 96%|█████████▌| 11822/12313 [8:51:02<20:52, 2.55s/it] {'loss': 0.5529, 'grad_norm': 10.7296131406781, 'learning_rate': 2.0822941039254642e-08, 'epoch': 0.96} 96%|█████████▌| 11822/12313 [8:51:02<20:52, 2.55s/it] 96%|█████████▌| 11823/12313 [8:51:05<21:15, 2.60s/it] {'loss': 0.4012, 'grad_norm': 4.499843737247379, 'learning_rate': 2.0738326257407904e-08, 'epoch': 0.96} 96%|█████████▌| 11823/12313 [8:51:05<21:15, 2.60s/it] 96%|█████████▌| 11824/12313 [8:51:07<21:27, 2.63s/it] {'loss': 0.4078, 'grad_norm': 5.585081499968746, 'learning_rate': 2.0653883027347832e-08, 'epoch': 0.96} 96%|█████████▌| 11824/12313 [8:51:07<21:27, 2.63s/it] 96%|█████████▌| 11825/12313 [8:51:10<21:00, 2.58s/it] {'loss': 0.4095, 'grad_norm': 5.781778186861071, 'learning_rate': 2.056961135491725e-08, 'epoch': 0.96} 96%|█████████▌| 11825/12313 [8:51:10<21:00, 2.58s/it] 96%|█████████▌| 11826/12313 [8:51:12<20:50, 2.57s/it] {'loss': 0.5062, 'grad_norm': 5.2006226427597415, 'learning_rate': 2.048551124594733e-08, 'epoch': 0.96} 96%|█████████▌| 11826/12313 [8:51:12<20:50, 2.57s/it] 96%|█████████▌| 11827/12313 [8:51:15<21:10, 2.61s/it] {'loss': 0.4405, 'grad_norm': 9.000238161766775, 'learning_rate': 2.0401582706257304e-08, 'epoch': 0.96} 96%|█████████▌| 11827/12313 [8:51:15<21:10, 2.61s/it] 96%|█████████▌| 11828/12313 [8:51:18<21:06, 2.61s/it] {'loss': 0.5036, 'grad_norm': 5.406010740569412, 'learning_rate': 2.031782574165475e-08, 'epoch': 0.96} 96%|█████████▌| 11828/12313 [8:51:18<21:06, 2.61s/it] 96%|█████████▌| 11829/12313 [8:51:20<21:13, 2.63s/it] {'loss': 0.4662, 'grad_norm': 5.004999912991991, 'learning_rate': 2.0234240357935032e-08, 'epoch': 0.96} 96%|█████████▌| 11829/12313 [8:51:20<21:13, 2.63s/it] 96%|█████████▌| 11830/12313 [8:51:23<21:38, 2.69s/it] {'loss': 0.496, 'grad_norm': 4.837946844168023, 'learning_rate': 2.015082656088213e-08, 'epoch': 0.96} 96%|█████████▌| 11830/12313 [8:51:23<21:38, 2.69s/it] 96%|█████████▌| 11831/12313 [8:51:26<21:39, 2.70s/it] {'loss': 0.375, 'grad_norm': 6.627370947037891, 'learning_rate': 2.0067584356267545e-08, 'epoch': 0.96} 96%|█████████▌| 11831/12313 [8:51:26<21:39, 2.70s/it] 96%|█████████▌| 11832/12313 [8:51:29<21:36, 2.69s/it] {'loss': 0.5067, 'grad_norm': 10.674671755581318, 'learning_rate': 1.998451374985111e-08, 'epoch': 0.96} 96%|█████████▌| 11832/12313 [8:51:29<21:36, 2.69s/it] 96%|█████████▌| 11833/12313 [8:51:32<21:53, 2.74s/it] {'loss': 0.3741, 'grad_norm': 17.13229424192217, 'learning_rate': 1.9901614747381004e-08, 'epoch': 0.96} 96%|█████████▌| 11833/12313 [8:51:32<21:53, 2.74s/it] 96%|█████████▌| 11834/12313 [8:51:34<21:33, 2.70s/it] {'loss': 0.5692, 'grad_norm': 6.635134277566872, 'learning_rate': 1.981888735459375e-08, 'epoch': 0.96} 96%|█████████▌| 11834/12313 [8:51:34<21:33, 2.70s/it] 96%|█████████▌| 11835/12313 [8:51:37<22:12, 2.79s/it] {'loss': 0.5985, 'grad_norm': 3.8371011025366055, 'learning_rate': 1.973633157721283e-08, 'epoch': 0.96} 96%|█████████▌| 11835/12313 [8:51:37<22:12, 2.79s/it] 96%|█████████▌| 11836/12313 [8:51:40<22:07, 2.78s/it] {'loss': 0.492, 'grad_norm': 5.13228830947514, 'learning_rate': 1.9653947420951448e-08, 'epoch': 0.96} 96%|█████████▌| 11836/12313 [8:51:40<22:07, 2.78s/it] 96%|█████████▌| 11837/12313 [8:51:43<22:16, 2.81s/it] {'loss': 0.4515, 'grad_norm': 6.733892581752105, 'learning_rate': 1.9571734891509763e-08, 'epoch': 0.96} 96%|█████████▌| 11837/12313 [8:51:43<22:16, 2.81s/it] 96%|█████████▌| 11838/12313 [8:51:45<21:19, 2.69s/it] {'loss': 0.6163, 'grad_norm': 5.22909116303276, 'learning_rate': 1.9489693994576563e-08, 'epoch': 0.96} 96%|█████████▌| 11838/12313 [8:51:45<21:19, 2.69s/it] 96%|█████████▌| 11839/12313 [8:51:48<21:33, 2.73s/it] {'loss': 0.4538, 'grad_norm': 4.302408834520267, 'learning_rate': 1.9407824735828696e-08, 'epoch': 0.96} 96%|█████████▌| 11839/12313 [8:51:48<21:33, 2.73s/it] 96%|█████████▌| 11840/12313 [8:51:51<21:28, 2.72s/it] {'loss': 0.4024, 'grad_norm': 5.043172467524965, 'learning_rate': 1.932612712093107e-08, 'epoch': 0.96} 96%|█████████▌| 11840/12313 [8:51:51<21:28, 2.72s/it] 96%|█████████▌| 11841/12313 [8:51:54<21:37, 2.75s/it] {'loss': 0.4981, 'grad_norm': 8.816715558441784, 'learning_rate': 1.9244601155536392e-08, 'epoch': 0.96} 96%|█████████▌| 11841/12313 [8:51:54<21:37, 2.75s/it] 96%|█████████▌| 11842/12313 [8:51:56<21:21, 2.72s/it] {'loss': 0.4698, 'grad_norm': 4.8079654103051475, 'learning_rate': 1.9163246845286253e-08, 'epoch': 0.96} 96%|█████████▌| 11842/12313 [8:51:56<21:21, 2.72s/it] 96%|█████████▌| 11843/12313 [8:51:59<21:40, 2.77s/it] {'loss': 0.5798, 'grad_norm': 6.163595049432749, 'learning_rate': 1.908206419580977e-08, 'epoch': 0.96} 96%|█████████▌| 11843/12313 [8:51:59<21:40, 2.77s/it] 96%|█████████▌| 11844/12313 [8:52:02<21:30, 2.75s/it] {'loss': 0.5856, 'grad_norm': 4.866986542895558, 'learning_rate': 1.9001053212724387e-08, 'epoch': 0.96} 96%|█████████▌| 11844/12313 [8:52:02<21:30, 2.75s/it] 96%|█████████▌| 11845/12313 [8:52:04<21:14, 2.72s/it] {'loss': 0.4907, 'grad_norm': 5.782563129780745, 'learning_rate': 1.892021390163562e-08, 'epoch': 0.96} 96%|█████████▌| 11845/12313 [8:52:04<21:14, 2.72s/it] 96%|█████████▌| 11846/12313 [8:52:08<23:57, 3.08s/it] {'loss': 0.4423, 'grad_norm': 5.712857730829221, 'learning_rate': 1.8839546268137054e-08, 'epoch': 0.96} 96%|█████████▌| 11846/12313 [8:52:08<23:57, 3.08s/it] 96%|█████████▌| 11847/12313 [8:52:11<22:53, 2.95s/it] {'loss': 0.4499, 'grad_norm': 4.414806524066914, 'learning_rate': 1.8759050317810612e-08, 'epoch': 0.96} 96%|█████████▌| 11847/12313 [8:52:11<22:53, 2.95s/it] 96%|█████████▌| 11848/12313 [8:52:14<22:00, 2.84s/it] {'loss': 0.4297, 'grad_norm': 4.949387208955272, 'learning_rate': 1.8678726056226004e-08, 'epoch': 0.96} 96%|█████████▌| 11848/12313 [8:52:14<22:00, 2.84s/it] 96%|█████████▌| 11849/12313 [8:52:16<21:39, 2.80s/it] {'loss': 0.5549, 'grad_norm': 4.396280286867585, 'learning_rate': 1.8598573488941285e-08, 'epoch': 0.96} 96%|█████████▌| 11849/12313 [8:52:16<21:39, 2.80s/it] 96%|█████████▌| 11850/12313 [8:52:19<21:39, 2.81s/it] {'loss': 0.5692, 'grad_norm': 6.050149072377, 'learning_rate': 1.8518592621502852e-08, 'epoch': 0.96} 96%|█████████▌| 11850/12313 [8:52:19<21:39, 2.81s/it] 96%|█████████▌| 11851/12313 [8:52:22<20:55, 2.72s/it] {'loss': 0.5378, 'grad_norm': 5.0620158081409246, 'learning_rate': 1.8438783459444608e-08, 'epoch': 0.96} 96%|█████████▌| 11851/12313 [8:52:22<20:55, 2.72s/it] 96%|█████████▋| 11852/12313 [8:52:24<20:14, 2.64s/it] {'loss': 0.5249, 'grad_norm': 3.929260770758667, 'learning_rate': 1.8359146008289087e-08, 'epoch': 0.96} 96%|█████████▋| 11852/12313 [8:52:24<20:14, 2.64s/it] 96%|█████████▋| 11853/12313 [8:52:27<20:05, 2.62s/it] {'loss': 0.3583, 'grad_norm': 6.751642731964542, 'learning_rate': 1.8279680273546874e-08, 'epoch': 0.96} 96%|█████████▋| 11853/12313 [8:52:27<20:05, 2.62s/it] 96%|█████████▋| 11854/12313 [8:52:29<20:06, 2.63s/it] {'loss': 0.5411, 'grad_norm': 7.566915713105148, 'learning_rate': 1.8200386260716352e-08, 'epoch': 0.96} 96%|█████████▋| 11854/12313 [8:52:29<20:06, 2.63s/it] 96%|█████████▋| 11855/12313 [8:52:32<20:10, 2.64s/it] {'loss': 0.5512, 'grad_norm': 4.154405022428572, 'learning_rate': 1.812126397528452e-08, 'epoch': 0.96} 96%|█████████▋| 11855/12313 [8:52:32<20:10, 2.64s/it] 96%|█████████▋| 11856/12313 [8:52:35<20:02, 2.63s/it] {'loss': 0.5403, 'grad_norm': 7.42761465533202, 'learning_rate': 1.804231342272589e-08, 'epoch': 0.96} 96%|█████████▋| 11856/12313 [8:52:35<20:02, 2.63s/it] 96%|█████████▋| 11857/12313 [8:52:37<19:41, 2.59s/it] {'loss': 0.6331, 'grad_norm': 7.694593764622218, 'learning_rate': 1.796353460850331e-08, 'epoch': 0.96} 96%|█████████▋| 11857/12313 [8:52:37<19:41, 2.59s/it] 96%|█████████▋| 11858/12313 [8:52:40<19:24, 2.56s/it] {'loss': 0.4567, 'grad_norm': 16.086802457999514, 'learning_rate': 1.7884927538068532e-08, 'epoch': 0.96} 96%|█████████▋| 11858/12313 [8:52:40<19:24, 2.56s/it] 96%|█████████▋| 11859/12313 [8:52:42<19:15, 2.55s/it] {'loss': 0.5277, 'grad_norm': 6.812124405495696, 'learning_rate': 1.7806492216860537e-08, 'epoch': 0.96} 96%|█████████▋| 11859/12313 [8:52:42<19:15, 2.55s/it] 96%|█████████▋| 11860/12313 [8:52:45<19:39, 2.60s/it] {'loss': 0.4104, 'grad_norm': 6.335913061374134, 'learning_rate': 1.77282286503061e-08, 'epoch': 0.96} 96%|█████████▋| 11860/12313 [8:52:45<19:39, 2.60s/it] 96%|█████████▋| 11861/12313 [8:52:48<20:36, 2.73s/it] {'loss': 0.5395, 'grad_norm': 5.290742984147669, 'learning_rate': 1.7650136843821163e-08, 'epoch': 0.96} 96%|█████████▋| 11861/12313 [8:52:48<20:36, 2.73s/it] 96%|█████████▋| 11862/12313 [8:52:51<20:23, 2.71s/it] {'loss': 0.5077, 'grad_norm': 7.397405822219676, 'learning_rate': 1.7572216802808907e-08, 'epoch': 0.96} 96%|█████████▋| 11862/12313 [8:52:51<20:23, 2.71s/it] 96%|█████████▋| 11863/12313 [8:52:53<20:34, 2.74s/it] {'loss': 0.447, 'grad_norm': 5.393822284388817, 'learning_rate': 1.74944685326614e-08, 'epoch': 0.96} 96%|█████████▋| 11863/12313 [8:52:53<20:34, 2.74s/it] 96%|█████████▋| 11864/12313 [8:52:56<20:24, 2.73s/it] {'loss': 0.5923, 'grad_norm': 4.001728592267749, 'learning_rate': 1.741689203875796e-08, 'epoch': 0.96} 96%|█████████▋| 11864/12313 [8:52:56<20:24, 2.73s/it] 96%|█████████▋| 11865/12313 [8:52:59<20:11, 2.70s/it] {'loss': 0.5556, 'grad_norm': 6.112171929702209, 'learning_rate': 1.7339487326466787e-08, 'epoch': 0.96} 96%|█████████▋| 11865/12313 [8:52:59<20:11, 2.70s/it] 96%|█████████▋| 11866/12313 [8:53:01<20:16, 2.72s/it] {'loss': 0.5483, 'grad_norm': 4.7616632757082575, 'learning_rate': 1.7262254401143873e-08, 'epoch': 0.96} 96%|█████████▋| 11866/12313 [8:53:01<20:16, 2.72s/it] 96%|█████████▋| 11867/12313 [8:53:04<20:12, 2.72s/it] {'loss': 0.4426, 'grad_norm': 5.833975112122491, 'learning_rate': 1.7185193268133282e-08, 'epoch': 0.96} 96%|█████████▋| 11867/12313 [8:53:04<20:12, 2.72s/it] 96%|█████████▋| 11868/12313 [8:53:07<19:53, 2.68s/it] {'loss': 0.4634, 'grad_norm': 3.698744760296142, 'learning_rate': 1.7108303932767135e-08, 'epoch': 0.96} 96%|█████████▋| 11868/12313 [8:53:07<19:53, 2.68s/it] 96%|█████████▋| 11869/12313 [8:53:09<19:39, 2.66s/it] {'loss': 0.4089, 'grad_norm': 6.198330496132483, 'learning_rate': 1.7031586400365895e-08, 'epoch': 0.96} 96%|█████████▋| 11869/12313 [8:53:09<19:39, 2.66s/it] 96%|█████████▋| 11870/12313 [8:53:12<19:38, 2.66s/it] {'loss': 0.571, 'grad_norm': 4.92658425213048, 'learning_rate': 1.695504067623782e-08, 'epoch': 0.96} 96%|█████████▋| 11870/12313 [8:53:12<19:38, 2.66s/it] 96%|█████████▋| 11871/12313 [8:53:14<19:06, 2.59s/it] {'loss': 0.4378, 'grad_norm': 4.570520680219551, 'learning_rate': 1.6878666765679507e-08, 'epoch': 0.96} 96%|█████████▋| 11871/12313 [8:53:14<19:06, 2.59s/it] 96%|█████████▋| 11872/12313 [8:53:17<19:09, 2.61s/it] {'loss': 0.4161, 'grad_norm': 4.130327563075009, 'learning_rate': 1.6802464673975893e-08, 'epoch': 0.96} 96%|█████████▋| 11872/12313 [8:53:17<19:09, 2.61s/it] 96%|█████████▋| 11873/12313 [8:53:20<19:27, 2.65s/it] {'loss': 0.4437, 'grad_norm': 5.2911892010001775, 'learning_rate': 1.6726434406399704e-08, 'epoch': 0.96} 96%|█████████▋| 11873/12313 [8:53:20<19:27, 2.65s/it] 96%|█████████▋| 11874/12313 [8:53:23<19:35, 2.68s/it] {'loss': 0.3474, 'grad_norm': 7.92651230061654, 'learning_rate': 1.6650575968211458e-08, 'epoch': 0.96} 96%|█████████▋| 11874/12313 [8:53:23<19:35, 2.68s/it] 96%|█████████▋| 11875/12313 [8:53:25<19:19, 2.65s/it] {'loss': 0.4664, 'grad_norm': 4.955382254039267, 'learning_rate': 1.6574889364660564e-08, 'epoch': 0.96} 96%|█████████▋| 11875/12313 [8:53:25<19:19, 2.65s/it] 96%|█████████▋| 11876/12313 [8:53:28<19:22, 2.66s/it] {'loss': 0.4326, 'grad_norm': 5.462298377112138, 'learning_rate': 1.6499374600983943e-08, 'epoch': 0.96} 96%|█████████▋| 11876/12313 [8:53:28<19:22, 2.66s/it] 96%|█████████▋| 11877/12313 [8:53:30<18:48, 2.59s/it] {'loss': 0.5458, 'grad_norm': 3.395717603402873, 'learning_rate': 1.642403168240686e-08, 'epoch': 0.96} 96%|█████████▋| 11877/12313 [8:53:30<18:48, 2.59s/it] 96%|█████████▋| 11878/12313 [8:53:33<19:01, 2.62s/it] {'loss': 0.5629, 'grad_norm': 6.599864821668275, 'learning_rate': 1.6348860614142646e-08, 'epoch': 0.96} 96%|█████████▋| 11878/12313 [8:53:33<19:01, 2.62s/it] 96%|█████████▋| 11879/12313 [8:53:36<19:43, 2.73s/it] {'loss': 0.511, 'grad_norm': 3.790645168309616, 'learning_rate': 1.62738614013927e-08, 'epoch': 0.96} 96%|█████████▋| 11879/12313 [8:53:36<19:43, 2.73s/it] 96%|█████████▋| 11880/12313 [8:53:39<20:51, 2.89s/it] {'loss': 0.3197, 'grad_norm': 2.6841332503368918, 'learning_rate': 1.6199034049346474e-08, 'epoch': 0.96} 96%|█████████▋| 11880/12313 [8:53:39<20:51, 2.89s/it] 96%|█████████▋| 11881/12313 [8:53:42<20:09, 2.80s/it] {'loss': 0.4072, 'grad_norm': 8.916212568990264, 'learning_rate': 1.6124378563182053e-08, 'epoch': 0.96} 96%|█████████▋| 11881/12313 [8:53:42<20:09, 2.80s/it] 96%|█████████▋| 11882/12313 [8:53:44<19:25, 2.70s/it] {'loss': 0.5502, 'grad_norm': 6.5493673229576945, 'learning_rate': 1.6049894948064748e-08, 'epoch': 0.96} 96%|█████████▋| 11882/12313 [8:53:44<19:25, 2.70s/it] 97%|█████████▋| 11883/12313 [8:53:47<19:42, 2.75s/it] {'loss': 0.5055, 'grad_norm': 3.681237380060799, 'learning_rate': 1.597558320914849e-08, 'epoch': 0.97} 97%|█████████▋| 11883/12313 [8:53:47<19:42, 2.75s/it] 97%|█████████▋| 11884/12313 [8:53:50<19:02, 2.66s/it] {'loss': 0.4587, 'grad_norm': 5.868912226870429, 'learning_rate': 1.5901443351575563e-08, 'epoch': 0.97} 97%|█████████▋| 11884/12313 [8:53:50<19:02, 2.66s/it] 97%|█████████▋| 11885/12313 [8:53:53<20:04, 2.81s/it] {'loss': 0.4016, 'grad_norm': 3.216888608216173, 'learning_rate': 1.5827475380475744e-08, 'epoch': 0.97} 97%|█████████▋| 11885/12313 [8:53:53<20:04, 2.81s/it] 97%|█████████▋| 11886/12313 [8:53:55<19:29, 2.74s/it] {'loss': 0.6319, 'grad_norm': 6.658602355248441, 'learning_rate': 1.575367930096716e-08, 'epoch': 0.97} 97%|█████████▋| 11886/12313 [8:53:55<19:29, 2.74s/it] 97%|█████████▋| 11887/12313 [8:53:58<19:20, 2.72s/it] {'loss': 0.4729, 'grad_norm': 7.62706341624403, 'learning_rate': 1.5680055118156566e-08, 'epoch': 0.97} 97%|█████████▋| 11887/12313 [8:53:58<19:20, 2.72s/it] 97%|█████████▋| 11888/12313 [8:54:01<19:07, 2.70s/it] {'loss': 0.3682, 'grad_norm': 5.573341730209802, 'learning_rate': 1.5606602837137942e-08, 'epoch': 0.97} 97%|█████████▋| 11888/12313 [8:54:01<19:07, 2.70s/it] 97%|█████████▋| 11889/12313 [8:54:03<18:54, 2.68s/it] {'loss': 0.3449, 'grad_norm': 9.836559250337336, 'learning_rate': 1.5533322462993884e-08, 'epoch': 0.97} 97%|█████████▋| 11889/12313 [8:54:03<18:54, 2.68s/it] 97%|█████████▋| 11890/12313 [8:54:06<19:53, 2.82s/it] {'loss': 0.5415, 'grad_norm': 5.722705721907483, 'learning_rate': 1.546021400079506e-08, 'epoch': 0.97} 97%|█████████▋| 11890/12313 [8:54:06<19:53, 2.82s/it] 97%|█████████▋| 11891/12313 [8:54:09<19:16, 2.74s/it] {'loss': 0.5975, 'grad_norm': 13.038255642017731, 'learning_rate': 1.538727745560048e-08, 'epoch': 0.97} 97%|█████████▋| 11891/12313 [8:54:09<19:16, 2.74s/it] 97%|█████████▋| 11892/12313 [8:54:12<18:50, 2.68s/it] {'loss': 0.3905, 'grad_norm': 4.658075474008751, 'learning_rate': 1.5314512832456385e-08, 'epoch': 0.97} 97%|█████████▋| 11892/12313 [8:54:12<18:50, 2.68s/it] 97%|█████████▋| 11893/12313 [8:54:14<18:48, 2.69s/it] {'loss': 0.6306, 'grad_norm': 3.609827209694475, 'learning_rate': 1.5241920136397913e-08, 'epoch': 0.97} 97%|█████████▋| 11893/12313 [8:54:14<18:48, 2.69s/it] 97%|█████████▋| 11894/12313 [8:54:17<18:45, 2.69s/it] {'loss': 0.6758, 'grad_norm': 4.924247089415893, 'learning_rate': 1.516949937244827e-08, 'epoch': 0.97} 97%|█████████▋| 11894/12313 [8:54:17<18:45, 2.69s/it] 97%|█████████▋| 11895/12313 [8:54:20<18:58, 2.72s/it] {'loss': 0.5907, 'grad_norm': 7.844480475751513, 'learning_rate': 1.5097250545618447e-08, 'epoch': 0.97} 97%|█████████▋| 11895/12313 [8:54:20<18:58, 2.72s/it] 97%|█████████▋| 11896/12313 [8:54:22<18:47, 2.70s/it] {'loss': 0.3675, 'grad_norm': 7.5845268596408655, 'learning_rate': 1.5025173660907776e-08, 'epoch': 0.97} 97%|█████████▋| 11896/12313 [8:54:22<18:47, 2.70s/it] 97%|█████████▋| 11897/12313 [8:54:25<18:34, 2.68s/it] {'loss': 0.5117, 'grad_norm': 6.1316891281247425, 'learning_rate': 1.495326872330366e-08, 'epoch': 0.97} 97%|█████████▋| 11897/12313 [8:54:25<18:34, 2.68s/it] 97%|█████████▋| 11898/12313 [8:54:28<18:15, 2.64s/it] {'loss': 0.2842, 'grad_norm': 13.323521108000474, 'learning_rate': 1.4881535737781282e-08, 'epoch': 0.97} 97%|█████████▋| 11898/12313 [8:54:28<18:15, 2.64s/it] 97%|█████████▋| 11899/12313 [8:54:30<18:15, 2.65s/it] {'loss': 0.4965, 'grad_norm': 6.657166105699875, 'learning_rate': 1.4809974709304176e-08, 'epoch': 0.97} 97%|█████████▋| 11899/12313 [8:54:30<18:15, 2.65s/it] 97%|█████████▋| 11900/12313 [8:54:33<17:50, 2.59s/it] {'loss': 0.5393, 'grad_norm': 4.572946832553962, 'learning_rate': 1.4738585642824488e-08, 'epoch': 0.97} 97%|█████████▋| 11900/12313 [8:54:33<17:50, 2.59s/it] 97%|█████████▋| 11901/12313 [8:54:35<17:45, 2.59s/it] {'loss': 0.3911, 'grad_norm': 5.179202728578802, 'learning_rate': 1.4667368543281324e-08, 'epoch': 0.97} 97%|█████████▋| 11901/12313 [8:54:35<17:45, 2.59s/it] 97%|█████████▋| 11902/12313 [8:54:38<18:13, 2.66s/it] {'loss': 0.5457, 'grad_norm': 3.397470952294324, 'learning_rate': 1.4596323415602965e-08, 'epoch': 0.97} 97%|█████████▋| 11902/12313 [8:54:38<18:13, 2.66s/it] 97%|█████████▋| 11903/12313 [8:54:41<18:21, 2.69s/it] {'loss': 0.495, 'grad_norm': 5.014658354565728, 'learning_rate': 1.4525450264705198e-08, 'epoch': 0.97} 97%|█████████▋| 11903/12313 [8:54:41<18:21, 2.69s/it] 97%|█████████▋| 11904/12313 [8:54:44<18:31, 2.72s/it] {'loss': 0.386, 'grad_norm': 4.876907477649273, 'learning_rate': 1.4454749095491883e-08, 'epoch': 0.97} 97%|█████████▋| 11904/12313 [8:54:44<18:31, 2.72s/it] 97%|█████████▋| 11905/12313 [8:54:46<18:26, 2.71s/it] {'loss': 0.5017, 'grad_norm': 6.641652702622383, 'learning_rate': 1.438421991285549e-08, 'epoch': 0.97} 97%|█████████▋| 11905/12313 [8:54:46<18:26, 2.71s/it] 97%|█████████▋| 11906/12313 [8:54:49<18:15, 2.69s/it] {'loss': 0.5132, 'grad_norm': 6.852642511727459, 'learning_rate': 1.4313862721676285e-08, 'epoch': 0.97} 97%|█████████▋| 11906/12313 [8:54:49<18:15, 2.69s/it] 97%|█████████▋| 11907/12313 [8:54:51<17:48, 2.63s/it] {'loss': 0.5419, 'grad_norm': 4.757471303632329, 'learning_rate': 1.4243677526822319e-08, 'epoch': 0.97} 97%|█████████▋| 11907/12313 [8:54:51<17:48, 2.63s/it] 97%|█████████▋| 11908/12313 [8:54:54<17:47, 2.63s/it] {'loss': 0.4691, 'grad_norm': 7.349401986982847, 'learning_rate': 1.4173664333149983e-08, 'epoch': 0.97} 97%|█████████▋| 11908/12313 [8:54:54<17:47, 2.63s/it] 97%|█████████▋| 11909/12313 [8:54:57<18:08, 2.70s/it] {'loss': 0.396, 'grad_norm': 4.683264212562743, 'learning_rate': 1.4103823145504292e-08, 'epoch': 0.97} 97%|█████████▋| 11909/12313 [8:54:57<18:08, 2.70s/it] 97%|█████████▋| 11910/12313 [8:54:59<17:38, 2.63s/it] {'loss': 0.5567, 'grad_norm': 8.004000185912112, 'learning_rate': 1.4034153968717768e-08, 'epoch': 0.97} 97%|█████████▋| 11910/12313 [8:54:59<17:38, 2.63s/it] 97%|█████████▋| 11911/12313 [8:55:02<17:14, 2.57s/it] {'loss': 0.4341, 'grad_norm': 9.131569052331146, 'learning_rate': 1.3964656807610721e-08, 'epoch': 0.97} 97%|█████████▋| 11911/12313 [8:55:02<17:14, 2.57s/it] 97%|█████████▋| 11912/12313 [8:55:05<17:21, 2.60s/it] {'loss': 0.5043, 'grad_norm': 4.584784368488165, 'learning_rate': 1.3895331666992361e-08, 'epoch': 0.97} 97%|█████████▋| 11912/12313 [8:55:05<17:21, 2.60s/it] 97%|█████████▋| 11913/12313 [8:55:07<17:26, 2.62s/it] {'loss': 0.4195, 'grad_norm': 5.0665939824517165, 'learning_rate': 1.3826178551659686e-08, 'epoch': 0.97} 97%|█████████▋| 11913/12313 [8:55:07<17:26, 2.62s/it] 97%|█████████▋| 11914/12313 [8:55:10<17:57, 2.70s/it] {'loss': 0.4746, 'grad_norm': 4.105167424178751, 'learning_rate': 1.37571974663972e-08, 'epoch': 0.97} 97%|█████████▋| 11914/12313 [8:55:10<17:57, 2.70s/it] 97%|█████████▋| 11915/12313 [8:55:13<17:48, 2.69s/it] {'loss': 0.5362, 'grad_norm': 7.389787805986678, 'learning_rate': 1.3688388415978581e-08, 'epoch': 0.97} 97%|█████████▋| 11915/12313 [8:55:13<17:48, 2.69s/it] 97%|█████████▋| 11916/12313 [8:55:15<17:40, 2.67s/it] {'loss': 0.5635, 'grad_norm': 5.680461317779846, 'learning_rate': 1.361975140516475e-08, 'epoch': 0.97} 97%|█████████▋| 11916/12313 [8:55:15<17:40, 2.67s/it] 97%|█████████▋| 11917/12313 [8:55:18<18:16, 2.77s/it] {'loss': 0.517, 'grad_norm': 3.8294537161607476, 'learning_rate': 1.3551286438705513e-08, 'epoch': 0.97} 97%|█████████▋| 11917/12313 [8:55:18<18:16, 2.77s/it] 97%|█████████▋| 11918/12313 [8:55:21<18:20, 2.79s/it] {'loss': 0.4311, 'grad_norm': 5.093597645059693, 'learning_rate': 1.3482993521337362e-08, 'epoch': 0.97} 97%|█████████▋| 11918/12313 [8:55:21<18:20, 2.79s/it] 97%|█████████▋| 11919/12313 [8:55:24<18:32, 2.82s/it] {'loss': 0.5585, 'grad_norm': 4.15007656176377, 'learning_rate': 1.3414872657786793e-08, 'epoch': 0.97} 97%|█████████▋| 11919/12313 [8:55:24<18:32, 2.82s/it] 97%|█████████▋| 11920/12313 [8:55:27<18:17, 2.79s/it] {'loss': 0.4277, 'grad_norm': 5.618091315100742, 'learning_rate': 1.3346923852766702e-08, 'epoch': 0.97} 97%|█████████▋| 11920/12313 [8:55:27<18:17, 2.79s/it] 97%|█████████▋| 11921/12313 [8:55:29<18:00, 2.76s/it] {'loss': 0.4445, 'grad_norm': 3.77740666889434, 'learning_rate': 1.3279147110979163e-08, 'epoch': 0.97} 97%|█████████▋| 11921/12313 [8:55:30<18:00, 2.76s/it] 97%|█████████▋| 11922/12313 [8:55:32<17:39, 2.71s/it] {'loss': 0.4987, 'grad_norm': 14.059774837034583, 'learning_rate': 1.3211542437113755e-08, 'epoch': 0.97} 97%|█████████▋| 11922/12313 [8:55:32<17:39, 2.71s/it] 97%|█████████▋| 11923/12313 [8:55:35<17:34, 2.70s/it] {'loss': 0.4612, 'grad_norm': 5.451347013670637, 'learning_rate': 1.3144109835848685e-08, 'epoch': 0.97} 97%|█████████▋| 11923/12313 [8:55:35<17:34, 2.70s/it] 97%|█████████▋| 11924/12313 [8:55:37<17:16, 2.66s/it] {'loss': 0.5038, 'grad_norm': 9.206132420636997, 'learning_rate': 1.3076849311849382e-08, 'epoch': 0.97} 97%|█████████▋| 11924/12313 [8:55:37<17:16, 2.66s/it] 97%|█████████▋| 11925/12313 [8:55:40<17:38, 2.73s/it] {'loss': 0.4655, 'grad_norm': 11.483754762302455, 'learning_rate': 1.300976086977046e-08, 'epoch': 0.97} 97%|█████████▋| 11925/12313 [8:55:40<17:38, 2.73s/it] 97%|█████████▋| 11926/12313 [8:55:43<17:17, 2.68s/it] {'loss': 0.5264, 'grad_norm': 6.743288245848484, 'learning_rate': 1.2942844514254038e-08, 'epoch': 0.97} 97%|█████████▋| 11926/12313 [8:55:43<17:17, 2.68s/it] 97%|█████████▋| 11927/12313 [8:55:46<17:32, 2.73s/it] {'loss': 0.4499, 'grad_norm': 6.053479065590689, 'learning_rate': 1.2876100249930024e-08, 'epoch': 0.97} 97%|█████████▋| 11927/12313 [8:55:46<17:32, 2.73s/it] 97%|█████████▋| 11928/12313 [8:55:49<17:46, 2.77s/it] {'loss': 0.536, 'grad_norm': 3.6207902961878853, 'learning_rate': 1.2809528081416667e-08, 'epoch': 0.97} 97%|█████████▋| 11928/12313 [8:55:49<17:46, 2.77s/it] 97%|█████████▋| 11929/12313 [8:55:51<17:35, 2.75s/it] {'loss': 0.3062, 'grad_norm': 6.866156397584753, 'learning_rate': 1.2743128013321115e-08, 'epoch': 0.97} 97%|█████████▋| 11929/12313 [8:55:51<17:35, 2.75s/it] 97%|█████████▋| 11930/12313 [8:55:54<17:02, 2.67s/it] {'loss': 0.4935, 'grad_norm': 5.924869562979613, 'learning_rate': 1.2676900050237472e-08, 'epoch': 0.97} 97%|█████████▋| 11930/12313 [8:55:54<17:02, 2.67s/it] 97%|█████████▋| 11931/12313 [8:55:57<18:17, 2.87s/it] {'loss': 0.3896, 'grad_norm': 5.999357098508331, 'learning_rate': 1.2610844196748184e-08, 'epoch': 0.97} 97%|█████████▋| 11931/12313 [8:55:57<18:17, 2.87s/it] 97%|█████████▋| 11932/12313 [8:56:00<17:50, 2.81s/it] {'loss': 0.4804, 'grad_norm': 24.091678245998175, 'learning_rate': 1.2544960457424316e-08, 'epoch': 0.97} 97%|█████████▋| 11932/12313 [8:56:00<17:50, 2.81s/it] 97%|█████████▋| 11933/12313 [8:56:02<17:13, 2.72s/it] {'loss': 0.4291, 'grad_norm': 4.839750186365689, 'learning_rate': 1.2479248836824165e-08, 'epoch': 0.97} 97%|█████████▋| 11933/12313 [8:56:02<17:13, 2.72s/it] 97%|█████████▋| 11934/12313 [8:56:05<17:07, 2.71s/it] {'loss': 0.3908, 'grad_norm': 5.821546388018869, 'learning_rate': 1.2413709339495205e-08, 'epoch': 0.97} 97%|█████████▋| 11934/12313 [8:56:05<17:07, 2.71s/it] 97%|█████████▋| 11935/12313 [8:56:08<17:06, 2.72s/it] {'loss': 0.461, 'grad_norm': 4.807004678084896, 'learning_rate': 1.2348341969972143e-08, 'epoch': 0.97} 97%|█████████▋| 11935/12313 [8:56:08<17:06, 2.72s/it] 97%|█████████▋| 11936/12313 [8:56:10<17:12, 2.74s/it] {'loss': 0.498, 'grad_norm': 11.290614300170317, 'learning_rate': 1.2283146732778306e-08, 'epoch': 0.97} 97%|█████████▋| 11936/12313 [8:56:10<17:12, 2.74s/it] 97%|█████████▋| 11937/12313 [8:56:13<17:22, 2.77s/it] {'loss': 0.5445, 'grad_norm': 4.900177210666115, 'learning_rate': 1.2218123632424527e-08, 'epoch': 0.97} 97%|█████████▋| 11937/12313 [8:56:13<17:22, 2.77s/it] 97%|█████████▋| 11938/12313 [8:56:16<16:57, 2.71s/it] {'loss': 0.4344, 'grad_norm': 4.9843232783971665, 'learning_rate': 1.2153272673409989e-08, 'epoch': 0.97} 97%|█████████▋| 11938/12313 [8:56:16<16:57, 2.71s/it] 97%|█████████▋| 11939/12313 [8:56:19<16:55, 2.71s/it] {'loss': 0.523, 'grad_norm': 9.329861274666198, 'learning_rate': 1.2088593860222487e-08, 'epoch': 0.97} 97%|█████████▋| 11939/12313 [8:56:19<16:55, 2.71s/it] 97%|█████████▋| 11940/12313 [8:56:21<16:37, 2.67s/it] {'loss': 0.4087, 'grad_norm': 5.949707334731747, 'learning_rate': 1.2024087197337053e-08, 'epoch': 0.97} 97%|█████████▋| 11940/12313 [8:56:21<16:37, 2.67s/it] 97%|█████████▋| 11941/12313 [8:56:24<17:06, 2.76s/it] {'loss': 0.4933, 'grad_norm': 3.2312999306605934, 'learning_rate': 1.1959752689217342e-08, 'epoch': 0.97} 97%|█████████▋| 11941/12313 [8:56:24<17:06, 2.76s/it] 97%|█████████▋| 11942/12313 [8:56:27<16:41, 2.70s/it] {'loss': 0.5973, 'grad_norm': 4.579568524123334, 'learning_rate': 1.1895590340315343e-08, 'epoch': 0.97} 97%|█████████▋| 11942/12313 [8:56:27<16:41, 2.70s/it] 97%|█████████▋| 11943/12313 [8:56:29<16:45, 2.72s/it] {'loss': 0.5191, 'grad_norm': 4.04721463907932, 'learning_rate': 1.183160015507001e-08, 'epoch': 0.97} 97%|█████████▋| 11943/12313 [8:56:29<16:45, 2.72s/it] 97%|█████████▋| 11944/12313 [8:56:32<16:45, 2.72s/it] {'loss': 0.4671, 'grad_norm': 3.4507628169869564, 'learning_rate': 1.1767782137909467e-08, 'epoch': 0.97} 97%|█████████▋| 11944/12313 [8:56:32<16:45, 2.72s/it] 97%|█████████▋| 11945/12313 [8:56:35<16:22, 2.67s/it] {'loss': 0.5631, 'grad_norm': 4.199682308907727, 'learning_rate': 1.17041362932499e-08, 'epoch': 0.97} 97%|█████████▋| 11945/12313 [8:56:35<16:22, 2.67s/it] 97%|█████████▋| 11946/12313 [8:56:37<15:56, 2.61s/it] {'loss': 0.5799, 'grad_norm': 4.406904013569409, 'learning_rate': 1.1640662625494737e-08, 'epoch': 0.97} 97%|█████████▋| 11946/12313 [8:56:37<15:56, 2.61s/it] 97%|█████████▋| 11947/12313 [8:56:40<16:02, 2.63s/it] {'loss': 0.5053, 'grad_norm': 6.977075217955481, 'learning_rate': 1.1577361139036292e-08, 'epoch': 0.97} 97%|█████████▋| 11947/12313 [8:56:40<16:02, 2.63s/it] 97%|█████████▋| 11948/12313 [8:56:43<16:17, 2.68s/it] {'loss': 0.4786, 'grad_norm': 5.4502036515579295, 'learning_rate': 1.1514231838254674e-08, 'epoch': 0.97} 97%|█████████▋| 11948/12313 [8:56:43<16:17, 2.68s/it] 97%|█████████▋| 11949/12313 [8:56:45<16:02, 2.64s/it] {'loss': 0.4975, 'grad_norm': 11.48385762088185, 'learning_rate': 1.1451274727518058e-08, 'epoch': 0.97} 97%|█████████▋| 11949/12313 [8:56:45<16:02, 2.64s/it] 97%|█████████▋| 11950/12313 [8:56:48<15:53, 2.63s/it] {'loss': 0.5192, 'grad_norm': 4.167108685353742, 'learning_rate': 1.1388489811182957e-08, 'epoch': 0.97} 97%|█████████▋| 11950/12313 [8:56:48<15:53, 2.63s/it] 97%|█████████▋| 11951/12313 [8:56:51<16:32, 2.74s/it] {'loss': 0.5431, 'grad_norm': 5.2318458897159115, 'learning_rate': 1.1325877093593396e-08, 'epoch': 0.97} 97%|█████████▋| 11951/12313 [8:56:51<16:32, 2.74s/it] 97%|█████████▋| 11952/12313 [8:56:54<16:27, 2.74s/it] {'loss': 0.6257, 'grad_norm': 6.6175704623771505, 'learning_rate': 1.1263436579082022e-08, 'epoch': 0.97} 97%|█████████▋| 11952/12313 [8:56:54<16:27, 2.74s/it] 97%|█████████▋| 11953/12313 [8:56:56<16:32, 2.76s/it] {'loss': 0.5008, 'grad_norm': 3.4493770443599465, 'learning_rate': 1.1201168271969266e-08, 'epoch': 0.97} 97%|█████████▋| 11953/12313 [8:56:56<16:32, 2.76s/it] 97%|█████████▋| 11954/12313 [8:56:59<16:26, 2.75s/it] {'loss': 0.6227, 'grad_norm': 9.823049761761046, 'learning_rate': 1.1139072176564181e-08, 'epoch': 0.97} 97%|█████████▋| 11954/12313 [8:56:59<16:26, 2.75s/it] 97%|█████████▋| 11955/12313 [8:57:02<16:40, 2.79s/it] {'loss': 0.4436, 'grad_norm': 12.037056297654933, 'learning_rate': 1.1077148297163053e-08, 'epoch': 0.97} 97%|█████████▋| 11955/12313 [8:57:02<16:40, 2.79s/it] 97%|█████████▋| 11956/12313 [8:57:05<17:00, 2.86s/it] {'loss': 0.44, 'grad_norm': 47.76636023695995, 'learning_rate': 1.101539663805079e-08, 'epoch': 0.97} 97%|█████████▋| 11956/12313 [8:57:05<17:00, 2.86s/it] 97%|█████████▋| 11957/12313 [8:57:08<16:34, 2.79s/it] {'loss': 0.4165, 'grad_norm': 5.334949326289944, 'learning_rate': 1.0953817203500084e-08, 'epoch': 0.97} 97%|█████████▋| 11957/12313 [8:57:08<16:34, 2.79s/it] 97%|█████████▋| 11958/12313 [8:57:10<16:17, 2.75s/it] {'loss': 0.4979, 'grad_norm': 9.47687334833281, 'learning_rate': 1.0892409997772524e-08, 'epoch': 0.97} 97%|█████████▋| 11958/12313 [8:57:10<16:17, 2.75s/it] 97%|█████████▋| 11959/12313 [8:57:13<16:10, 2.74s/it] {'loss': 0.4791, 'grad_norm': 5.061927562596902, 'learning_rate': 1.0831175025116658e-08, 'epoch': 0.97} 97%|█████████▋| 11959/12313 [8:57:13<16:10, 2.74s/it] 97%|█████████▋| 11960/12313 [8:57:16<16:00, 2.72s/it] {'loss': 0.5829, 'grad_norm': 8.5596022285608, 'learning_rate': 1.0770112289769653e-08, 'epoch': 0.97} 97%|█████████▋| 11960/12313 [8:57:16<16:00, 2.72s/it] 97%|█████████▋| 11961/12313 [8:57:18<15:22, 2.62s/it] {'loss': 0.5874, 'grad_norm': 4.508772275279344, 'learning_rate': 1.0709221795956738e-08, 'epoch': 0.97} 97%|█████████▋| 11961/12313 [8:57:18<15:22, 2.62s/it] 97%|█████████▋| 11962/12313 [8:57:21<15:54, 2.72s/it] {'loss': 0.4767, 'grad_norm': 4.701424170250187, 'learning_rate': 1.0648503547891487e-08, 'epoch': 0.97} 97%|█████████▋| 11962/12313 [8:57:21<15:54, 2.72s/it] 97%|█████████▋| 11963/12313 [8:57:24<15:37, 2.68s/it] {'loss': 0.5583, 'grad_norm': 3.965969498170774, 'learning_rate': 1.0587957549774986e-08, 'epoch': 0.97} 97%|█████████▋| 11963/12313 [8:57:24<15:37, 2.68s/it] 97%|█████████▋| 11964/12313 [8:57:27<16:24, 2.82s/it] {'loss': 0.5368, 'grad_norm': 7.964781868944689, 'learning_rate': 1.052758380579666e-08, 'epoch': 0.97} 97%|█████████▋| 11964/12313 [8:57:27<16:24, 2.82s/it] 97%|█████████▋| 11965/12313 [8:57:29<16:07, 2.78s/it] {'loss': 0.4113, 'grad_norm': 5.249845918760695, 'learning_rate': 1.0467382320134279e-08, 'epoch': 0.97} 97%|█████████▋| 11965/12313 [8:57:29<16:07, 2.78s/it] 97%|█████████▋| 11966/12313 [8:57:32<15:31, 2.69s/it] {'loss': 0.3504, 'grad_norm': 5.825530015816844, 'learning_rate': 1.0407353096953398e-08, 'epoch': 0.97} 97%|█████████▋| 11966/12313 [8:57:32<15:31, 2.69s/it] 97%|█████████▋| 11967/12313 [8:57:35<15:24, 2.67s/it] {'loss': 0.3699, 'grad_norm': 9.06281122515128, 'learning_rate': 1.034749614040792e-08, 'epoch': 0.97} 97%|█████████▋| 11967/12313 [8:57:35<15:24, 2.67s/it] 97%|█████████▋| 11968/12313 [8:57:37<15:33, 2.71s/it] {'loss': 0.4268, 'grad_norm': 7.498622251328044, 'learning_rate': 1.0287811454639252e-08, 'epoch': 0.97} 97%|█████████▋| 11968/12313 [8:57:37<15:33, 2.71s/it] 97%|█████████▋| 11969/12313 [8:57:40<15:25, 2.69s/it] {'loss': 0.4959, 'grad_norm': 6.263774598272872, 'learning_rate': 1.0228299043777146e-08, 'epoch': 0.97} 97%|█████████▋| 11969/12313 [8:57:40<15:25, 2.69s/it] 97%|█████████▋| 11970/12313 [8:57:43<15:48, 2.77s/it] {'loss': 0.5099, 'grad_norm': 5.061985215032839, 'learning_rate': 1.0168958911939975e-08, 'epoch': 0.97} 97%|█████████▋| 11970/12313 [8:57:43<15:48, 2.77s/it] 97%|█████████▋| 11971/12313 [8:57:46<15:50, 2.78s/it] {'loss': 0.3745, 'grad_norm': 8.537598436187121, 'learning_rate': 1.0109791063233898e-08, 'epoch': 0.97} 97%|█████████▋| 11971/12313 [8:57:46<15:50, 2.78s/it] 97%|█████████▋| 11972/12313 [8:57:48<15:47, 2.78s/it] {'loss': 0.462, 'grad_norm': 3.710316240876283, 'learning_rate': 1.0050795501752309e-08, 'epoch': 0.97} 97%|█████████▋| 11972/12313 [8:57:48<15:47, 2.78s/it] 97%|█████████▋| 11973/12313 [8:57:51<15:26, 2.73s/it] {'loss': 0.568, 'grad_norm': 4.819345845034754, 'learning_rate': 9.991972231577774e-09, 'epoch': 0.97} 97%|█████████▋| 11973/12313 [8:57:51<15:26, 2.73s/it] 97%|█████████▋| 11974/12313 [8:57:54<15:38, 2.77s/it] {'loss': 0.4745, 'grad_norm': 7.126395810841205, 'learning_rate': 9.933321256780925e-09, 'epoch': 0.97} 97%|█████████▋| 11974/12313 [8:57:54<15:38, 2.77s/it] 97%|█████████▋| 11975/12313 [8:57:57<15:42, 2.79s/it] {'loss': 0.3644, 'grad_norm': 4.57445118125883, 'learning_rate': 9.874842581419631e-09, 'epoch': 0.97} 97%|█████████▋| 11975/12313 [8:57:57<15:42, 2.79s/it] 97%|█████████▋| 11976/12313 [8:58:00<15:44, 2.80s/it] {'loss': 0.3933, 'grad_norm': 8.751285985895777, 'learning_rate': 9.816536209540373e-09, 'epoch': 0.97} 97%|█████████▋| 11976/12313 [8:58:00<15:44, 2.80s/it] 97%|█████████▋| 11977/12313 [8:58:02<15:01, 2.68s/it] {'loss': 0.499, 'grad_norm': 6.666232456269325, 'learning_rate': 9.758402145177703e-09, 'epoch': 0.97} 97%|█████████▋| 11977/12313 [8:58:02<15:01, 2.68s/it] 97%|█████████▋| 11978/12313 [8:58:05<14:43, 2.64s/it] {'loss': 0.4075, 'grad_norm': 7.576271556752231, 'learning_rate': 9.70044039235396e-09, 'epoch': 0.97} 97%|█████████▋| 11978/12313 [8:58:05<14:43, 2.64s/it] 97%|█████████▋| 11979/12313 [8:58:08<15:11, 2.73s/it] {'loss': 0.3907, 'grad_norm': 4.487672059560348, 'learning_rate': 9.642650955080379e-09, 'epoch': 0.97} 97%|█████████▋| 11979/12313 [8:58:08<15:11, 2.73s/it] 97%|█████████▋| 11980/12313 [8:58:10<15:26, 2.78s/it] {'loss': 0.5327, 'grad_norm': 3.6526471218089864, 'learning_rate': 9.585033837355151e-09, 'epoch': 0.97} 97%|█████████▋| 11980/12313 [8:58:10<15:26, 2.78s/it] 97%|█████████▋| 11981/12313 [8:58:13<15:38, 2.83s/it] {'loss': 0.5155, 'grad_norm': 4.83213034119808, 'learning_rate': 9.527589043165086e-09, 'epoch': 0.97} 97%|█████████▋| 11981/12313 [8:58:13<15:38, 2.83s/it] 97%|█████████▋| 11982/12313 [8:58:16<15:25, 2.80s/it] {'loss': 0.4436, 'grad_norm': 4.370525558131413, 'learning_rate': 9.470316576485616e-09, 'epoch': 0.97} 97%|█████████▋| 11982/12313 [8:58:16<15:25, 2.80s/it] 97%|█████████▋| 11983/12313 [8:58:19<15:08, 2.75s/it] {'loss': 0.4835, 'grad_norm': 6.8962180265204065, 'learning_rate': 9.41321644127885e-09, 'epoch': 0.97} 97%|█████████▋| 11983/12313 [8:58:19<15:08, 2.75s/it] 97%|█████████▋| 11984/12313 [8:58:21<14:41, 2.68s/it] {'loss': 0.3822, 'grad_norm': 5.928084396311675, 'learning_rate': 9.356288641496624e-09, 'epoch': 0.97} 97%|█████████▋| 11984/12313 [8:58:21<14:41, 2.68s/it] 97%|█████████▋| 11985/12313 [8:58:24<14:49, 2.71s/it] {'loss': 0.36, 'grad_norm': 5.61471404924446, 'learning_rate': 9.299533181077458e-09, 'epoch': 0.97} 97%|█████████▋| 11985/12313 [8:58:24<14:49, 2.71s/it] 97%|█████████▋| 11986/12313 [8:58:27<14:42, 2.70s/it] {'loss': 0.4844, 'grad_norm': 4.928900628567097, 'learning_rate': 9.242950063948763e-09, 'epoch': 0.97} 97%|█████████▋| 11986/12313 [8:58:27<14:42, 2.70s/it] 97%|█████████▋| 11987/12313 [8:58:29<14:36, 2.69s/it] {'loss': 0.6315, 'grad_norm': 6.227465912824045, 'learning_rate': 9.18653929402602e-09, 'epoch': 0.97} 97%|█████████▋| 11987/12313 [8:58:29<14:36, 2.69s/it] 97%|█████████▋| 11988/12313 [8:58:32<14:31, 2.68s/it] {'loss': 0.5286, 'grad_norm': 5.256061365992016, 'learning_rate': 9.13030087521194e-09, 'epoch': 0.97} 97%|█████████▋| 11988/12313 [8:58:32<14:31, 2.68s/it] 97%|█████████▋| 11989/12313 [8:58:35<14:16, 2.64s/it] {'loss': 0.4854, 'grad_norm': 5.4504099987714465, 'learning_rate': 9.074234811398408e-09, 'epoch': 0.97} 97%|█████████▋| 11989/12313 [8:58:35<14:16, 2.64s/it] 97%|█████████▋| 11990/12313 [8:58:37<14:32, 2.70s/it] {'loss': 0.4206, 'grad_norm': 6.539725065060604, 'learning_rate': 9.018341106464823e-09, 'epoch': 0.97} 97%|█████████▋| 11990/12313 [8:58:37<14:32, 2.70s/it] 97%|█████████▋| 11991/12313 [8:58:40<14:25, 2.69s/it] {'loss': 0.4808, 'grad_norm': 4.947085140988941, 'learning_rate': 8.962619764278923e-09, 'epoch': 0.97} 97%|█████████▋| 11991/12313 [8:58:40<14:25, 2.69s/it] 97%|█████████▋| 11992/12313 [8:58:43<14:14, 2.66s/it] {'loss': 0.3113, 'grad_norm': 6.855737833780214, 'learning_rate': 8.907070788695681e-09, 'epoch': 0.97} 97%|█████████▋| 11992/12313 [8:58:43<14:14, 2.66s/it] 97%|█████████▋| 11993/12313 [8:58:46<14:45, 2.77s/it] {'loss': 0.3221, 'grad_norm': 5.986387589137889, 'learning_rate': 8.851694183559523e-09, 'epoch': 0.97} 97%|█████████▋| 11993/12313 [8:58:46<14:45, 2.77s/it] 97%|█████████▋| 11994/12313 [8:58:48<14:28, 2.72s/it] {'loss': 0.5371, 'grad_norm': 6.396980239435106, 'learning_rate': 8.796489952701825e-09, 'epoch': 0.97} 97%|█████████▋| 11994/12313 [8:58:48<14:28, 2.72s/it] 97%|█████████▋| 11995/12313 [8:58:51<14:14, 2.69s/it] {'loss': 0.5594, 'grad_norm': 4.469906653866564, 'learning_rate': 8.741458099942313e-09, 'epoch': 0.97} 97%|█████████▋| 11995/12313 [8:58:51<14:14, 2.69s/it] 97%|█████████▋| 11996/12313 [8:58:53<13:57, 2.64s/it] {'loss': 0.382, 'grad_norm': 6.102062048541683, 'learning_rate': 8.686598629089326e-09, 'epoch': 0.97} 97%|█████████▋| 11996/12313 [8:58:53<13:57, 2.64s/it] 97%|█████████▋| 11997/12313 [8:58:56<14:11, 2.69s/it] {'loss': 0.4869, 'grad_norm': 4.275407491240544, 'learning_rate': 8.63191154393872e-09, 'epoch': 0.97} 97%|█████████▋| 11997/12313 [8:58:56<14:11, 2.69s/it] 97%|█████████▋| 11998/12313 [8:58:59<13:50, 2.64s/it] {'loss': 0.4735, 'grad_norm': 6.27848767223886, 'learning_rate': 8.577396848274134e-09, 'epoch': 0.97} 97%|█████████▋| 11998/12313 [8:58:59<13:50, 2.64s/it] 97%|█████████▋| 11999/12313 [8:59:01<13:48, 2.64s/it] {'loss': 0.5145, 'grad_norm': 6.216106726624828, 'learning_rate': 8.523054545868381e-09, 'epoch': 0.97} 97%|█████████▋| 11999/12313 [8:59:01<13:48, 2.64s/it] 97%|█████████▋| 12000/12313 [8:59:04<13:51, 2.66s/it] {'loss': 0.4235, 'grad_norm': 6.83477643803915, 'learning_rate': 8.468884640480956e-09, 'epoch': 0.97} 97%|█████████▋| 12000/12313 [8:59:04<13:51, 2.66s/it] 97%|█████████▋| 12001/12313 [8:59:07<13:43, 2.64s/it] {'loss': 0.3258, 'grad_norm': 12.60083591448807, 'learning_rate': 8.414887135860528e-09, 'epoch': 0.97} 97%|█████████▋| 12001/12313 [8:59:07<13:43, 2.64s/it] 97%|█████████▋| 12002/12313 [8:59:10<14:18, 2.76s/it] {'loss': 0.5243, 'grad_norm': 4.645778183923307, 'learning_rate': 8.36106203574355e-09, 'epoch': 0.97} 97%|█████████▋| 12002/12313 [8:59:10<14:18, 2.76s/it] 97%|█████████▋| 12003/12313 [8:59:12<13:46, 2.67s/it] {'loss': 0.599, 'grad_norm': 6.124524623697457, 'learning_rate': 8.307409343854267e-09, 'epoch': 0.97} 97%|█████████▋| 12003/12313 [8:59:12<13:46, 2.67s/it] 97%|█████████▋| 12004/12313 [8:59:15<13:47, 2.68s/it] {'loss': 0.382, 'grad_norm': 21.473022616832324, 'learning_rate': 8.253929063904986e-09, 'epoch': 0.97} 97%|█████████▋| 12004/12313 [8:59:15<13:47, 2.68s/it] 97%|█████████▋| 12005/12313 [8:59:18<13:54, 2.71s/it] {'loss': 0.6825, 'grad_norm': 4.661852786954021, 'learning_rate': 8.200621199596359e-09, 'epoch': 0.97} 97%|█████████▋| 12005/12313 [8:59:18<13:54, 2.71s/it] 98%|█████████▊| 12006/12313 [8:59:21<14:27, 2.83s/it] {'loss': 0.4224, 'grad_norm': 3.538632513896874, 'learning_rate': 8.147485754617379e-09, 'epoch': 0.98} 98%|█████████▊| 12006/12313 [8:59:21<14:27, 2.83s/it] 98%|█████████▊| 12007/12313 [8:59:24<15:08, 2.97s/it] {'loss': 0.5328, 'grad_norm': 4.889222428461826, 'learning_rate': 8.094522732644272e-09, 'epoch': 0.98} 98%|█████████▊| 12007/12313 [8:59:24<15:08, 2.97s/it] 98%|█████████▊| 12008/12313 [8:59:27<14:38, 2.88s/it] {'loss': 0.3872, 'grad_norm': 8.835696874530043, 'learning_rate': 8.041732137341885e-09, 'epoch': 0.98} 98%|█████████▊| 12008/12313 [8:59:27<14:38, 2.88s/it] 98%|█████████▊| 12009/12313 [8:59:29<13:48, 2.73s/it] {'loss': 0.6145, 'grad_norm': 5.49476829680145, 'learning_rate': 7.989113972363406e-09, 'epoch': 0.98} 98%|█████████▊| 12009/12313 [8:59:29<13:48, 2.73s/it] 98%|█████████▊| 12010/12313 [8:59:32<13:35, 2.69s/it] {'loss': 0.5043, 'grad_norm': 5.131999174481017, 'learning_rate': 7.936668241349255e-09, 'epoch': 0.98} 98%|█████████▊| 12010/12313 [8:59:32<13:35, 2.69s/it] 98%|█████████▊| 12011/12313 [8:59:35<13:56, 2.77s/it] {'loss': 0.5749, 'grad_norm': 4.572903995548805, 'learning_rate': 7.884394947928476e-09, 'epoch': 0.98} 98%|█████████▊| 12011/12313 [8:59:35<13:56, 2.77s/it] 98%|█████████▊| 12012/12313 [8:59:37<13:29, 2.69s/it] {'loss': 0.4026, 'grad_norm': 4.3477540547148825, 'learning_rate': 7.832294095718452e-09, 'epoch': 0.98} 98%|█████████▊| 12012/12313 [8:59:37<13:29, 2.69s/it] 98%|█████████▊| 12013/12313 [8:59:40<13:06, 2.62s/it] {'loss': 0.3078, 'grad_norm': 7.454961091377597, 'learning_rate': 7.780365688323798e-09, 'epoch': 0.98} 98%|█████████▊| 12013/12313 [8:59:40<13:06, 2.62s/it] 98%|█████████▊| 12014/12313 [8:59:42<13:12, 2.65s/it] {'loss': 0.5571, 'grad_norm': 4.170795190072248, 'learning_rate': 7.72860972933831e-09, 'epoch': 0.98} 98%|█████████▊| 12014/12313 [8:59:42<13:12, 2.65s/it] 98%|█████████▊| 12015/12313 [8:59:45<13:13, 2.66s/it] {'loss': 0.3938, 'grad_norm': 7.707305765402371, 'learning_rate': 7.677026222342454e-09, 'epoch': 0.98} 98%|█████████▊| 12015/12313 [8:59:45<13:13, 2.66s/it] 98%|█████████▊| 12016/12313 [8:59:48<13:32, 2.73s/it] {'loss': 0.6371, 'grad_norm': 4.686956759781069, 'learning_rate': 7.625615170906153e-09, 'epoch': 0.98} 98%|█████████▊| 12016/12313 [8:59:48<13:32, 2.73s/it] 98%|█████████▊| 12017/12313 [8:59:51<13:37, 2.76s/it] {'loss': 0.4409, 'grad_norm': 6.526544319588936, 'learning_rate': 7.57437657858684e-09, 'epoch': 0.98} 98%|█████████▊| 12017/12313 [8:59:51<13:37, 2.76s/it] 98%|█████████▊| 12018/12313 [8:59:54<13:35, 2.77s/it] {'loss': 0.2862, 'grad_norm': 5.749185856694462, 'learning_rate': 7.523310448929178e-09, 'epoch': 0.98} 98%|█████████▊| 12018/12313 [8:59:54<13:35, 2.77s/it] 98%|█████████▊| 12019/12313 [8:59:56<13:22, 2.73s/it] {'loss': 0.5205, 'grad_norm': 6.413191599955744, 'learning_rate': 7.472416785467563e-09, 'epoch': 0.98} 98%|█████████▊| 12019/12313 [8:59:56<13:22, 2.73s/it] 98%|█████████▊| 12020/12313 [8:59:59<13:25, 2.75s/it] {'loss': 0.4963, 'grad_norm': 4.702110106071757, 'learning_rate': 7.421695591723066e-09, 'epoch': 0.98} 98%|█████████▊| 12020/12313 [8:59:59<13:25, 2.75s/it] 98%|█████████▊| 12021/12313 [9:00:02<13:09, 2.70s/it] {'loss': 0.5284, 'grad_norm': 4.552204118507779, 'learning_rate': 7.371146871205381e-09, 'epoch': 0.98} 98%|█████████▊| 12021/12313 [9:00:02<13:09, 2.70s/it] 98%|█████████▊| 12022/12313 [9:00:04<12:57, 2.67s/it] {'loss': 0.5295, 'grad_norm': 5.639485771969086, 'learning_rate': 7.320770627412543e-09, 'epoch': 0.98} 98%|█████████▊| 12022/12313 [9:00:04<12:57, 2.67s/it] 98%|█████████▊| 12023/12313 [9:00:07<13:08, 2.72s/it] {'loss': 0.4764, 'grad_norm': 6.8637890745131145, 'learning_rate': 7.27056686382982e-09, 'epoch': 0.98} 98%|█████████▊| 12023/12313 [9:00:07<13:08, 2.72s/it] 98%|█████████▊| 12024/12313 [9:00:10<12:58, 2.69s/it] {'loss': 0.4443, 'grad_norm': 5.248727371974205, 'learning_rate': 7.220535583931099e-09, 'epoch': 0.98} 98%|█████████▊| 12024/12313 [9:00:10<12:58, 2.69s/it] 98%|█████████▊| 12025/12313 [9:00:12<12:57, 2.70s/it] {'loss': 0.5761, 'grad_norm': 5.140443422767862, 'learning_rate': 7.17067679117861e-09, 'epoch': 0.98} 98%|█████████▊| 12025/12313 [9:00:12<12:57, 2.70s/it] 98%|█████████▊| 12026/12313 [9:00:15<12:56, 2.70s/it] {'loss': 0.4652, 'grad_norm': 4.603187702423827, 'learning_rate': 7.120990489022373e-09, 'epoch': 0.98} 98%|█████████▊| 12026/12313 [9:00:15<12:56, 2.70s/it] 98%|█████████▊| 12027/12313 [9:00:18<12:54, 2.71s/it] {'loss': 0.3577, 'grad_norm': 5.4771501562832245, 'learning_rate': 7.071476680900191e-09, 'epoch': 0.98} 98%|█████████▊| 12027/12313 [9:00:18<12:54, 2.71s/it] 98%|█████████▊| 12028/12313 [9:00:20<12:33, 2.64s/it] {'loss': 0.5396, 'grad_norm': 9.457901018585092, 'learning_rate': 7.022135370237937e-09, 'epoch': 0.98} 98%|█████████▊| 12028/12313 [9:00:20<12:33, 2.64s/it] 98%|█████████▊| 12029/12313 [9:00:23<12:16, 2.59s/it] {'loss': 0.4375, 'grad_norm': 6.7496814308010995, 'learning_rate': 6.972966560450101e-09, 'epoch': 0.98} 98%|█████████▊| 12029/12313 [9:00:23<12:16, 2.59s/it] 98%|█████████▊| 12030/12313 [9:00:26<12:32, 2.66s/it] {'loss': 0.463, 'grad_norm': 9.618733980000252, 'learning_rate': 6.923970254938961e-09, 'epoch': 0.98} 98%|█████████▊| 12030/12313 [9:00:26<12:32, 2.66s/it] 98%|█████████▊| 12031/12313 [9:00:28<12:34, 2.68s/it] {'loss': 0.466, 'grad_norm': 3.6437791716015155, 'learning_rate': 6.875146457094583e-09, 'epoch': 0.98} 98%|█████████▊| 12031/12313 [9:00:28<12:34, 2.68s/it] 98%|█████████▊| 12032/12313 [9:00:31<12:23, 2.65s/it] {'loss': 0.3838, 'grad_norm': 4.087276926389463, 'learning_rate': 6.8264951702951e-09, 'epoch': 0.98} 98%|█████████▊| 12032/12313 [9:00:31<12:23, 2.65s/it] 98%|█████████▊| 12033/12313 [9:00:34<12:41, 2.72s/it] {'loss': 0.3517, 'grad_norm': 7.648508375870743, 'learning_rate': 6.778016397907539e-09, 'epoch': 0.98} 98%|█████████▊| 12033/12313 [9:00:34<12:41, 2.72s/it] 98%|█████████▊| 12034/12313 [9:00:36<12:25, 2.67s/it] {'loss': 0.3296, 'grad_norm': 6.478817726316786, 'learning_rate': 6.729710143286161e-09, 'epoch': 0.98} 98%|█████████▊| 12034/12313 [9:00:36<12:25, 2.67s/it] 98%|█████████▊| 12035/12313 [9:00:39<12:16, 2.65s/it] {'loss': 0.5188, 'grad_norm': 4.785457296804816, 'learning_rate': 6.681576409773016e-09, 'epoch': 0.98} 98%|█████████▊| 12035/12313 [9:00:39<12:16, 2.65s/it] 98%|█████████▊| 12036/12313 [9:00:41<12:01, 2.60s/it] {'loss': 0.6136, 'grad_norm': 7.296274375209841, 'learning_rate': 6.633615200699328e-09, 'epoch': 0.98} 98%|█████████▊| 12036/12313 [9:00:41<12:01, 2.60s/it] 98%|█████████▊| 12037/12313 [9:00:44<12:16, 2.67s/it] {'loss': 0.514, 'grad_norm': 4.160049221454699, 'learning_rate': 6.5858265193835536e-09, 'epoch': 0.98} 98%|█████████▊| 12037/12313 [9:00:44<12:16, 2.67s/it] 98%|█████████▊| 12038/12313 [9:00:47<12:03, 2.63s/it] {'loss': 0.4866, 'grad_norm': 5.830756185483621, 'learning_rate': 6.538210369132214e-09, 'epoch': 0.98} 98%|█████████▊| 12038/12313 [9:00:47<12:03, 2.63s/it] 98%|█████████▊| 12039/12313 [9:00:50<12:08, 2.66s/it] {'loss': 0.6008, 'grad_norm': 5.407452911485265, 'learning_rate': 6.490766753240174e-09, 'epoch': 0.98} 98%|█████████▊| 12039/12313 [9:00:50<12:08, 2.66s/it] 98%|█████████▊| 12040/12313 [9:00:52<12:00, 2.64s/it] {'loss': 0.5819, 'grad_norm': 5.873928827822523, 'learning_rate': 6.443495674990641e-09, 'epoch': 0.98} 98%|█████████▊| 12040/12313 [9:00:52<12:00, 2.64s/it] 98%|█████████▊| 12041/12313 [9:00:55<12:27, 2.75s/it] {'loss': 0.3921, 'grad_norm': 6.00288359521902, 'learning_rate': 6.396397137654054e-09, 'epoch': 0.98} 98%|█████████▊| 12041/12313 [9:00:55<12:27, 2.75s/it] 98%|█████████▊| 12042/12313 [9:00:58<12:15, 2.72s/it] {'loss': 0.4012, 'grad_norm': 7.297473039045752, 'learning_rate': 6.3494711444897495e-09, 'epoch': 0.98} 98%|█████████▊| 12042/12313 [9:00:58<12:15, 2.72s/it] 98%|█████████▊| 12043/12313 [9:01:00<11:53, 2.64s/it] {'loss': 0.4805, 'grad_norm': 4.082939335655915, 'learning_rate': 6.302717698744298e-09, 'epoch': 0.98} 98%|█████████▊| 12043/12313 [9:01:00<11:53, 2.64s/it] 98%|█████████▊| 12044/12313 [9:01:03<12:06, 2.70s/it] {'loss': 0.5639, 'grad_norm': 9.09294805829235, 'learning_rate': 6.2561368036531676e-09, 'epoch': 0.98} 98%|█████████▊| 12044/12313 [9:01:03<12:06, 2.70s/it] 98%|█████████▊| 12045/12313 [9:01:06<12:04, 2.70s/it] {'loss': 0.5002, 'grad_norm': 6.290896132053336, 'learning_rate': 6.209728462439613e-09, 'epoch': 0.98} 98%|█████████▊| 12045/12313 [9:01:06<12:04, 2.70s/it] 98%|█████████▊| 12046/12313 [9:01:08<11:58, 2.69s/it] {'loss': 0.4165, 'grad_norm': 8.382100561674394, 'learning_rate': 6.1634926783143975e-09, 'epoch': 0.98} 98%|█████████▊| 12046/12313 [9:01:08<11:58, 2.69s/it] 98%|█████████▊| 12047/12313 [9:01:11<11:50, 2.67s/it] {'loss': 0.3735, 'grad_norm': 5.891844640379527, 'learning_rate': 6.117429454477186e-09, 'epoch': 0.98} 98%|█████████▊| 12047/12313 [9:01:11<11:50, 2.67s/it] 98%|█████████▊| 12048/12313 [9:01:14<12:10, 2.76s/it] {'loss': 0.5358, 'grad_norm': 5.2181840686707766, 'learning_rate': 6.071538794115151e-09, 'epoch': 0.98} 98%|█████████▊| 12048/12313 [9:01:14<12:10, 2.76s/it] 98%|█████████▊| 12049/12313 [9:01:17<11:57, 2.72s/it] {'loss': 0.5949, 'grad_norm': 5.170788538583498, 'learning_rate': 6.025820700403529e-09, 'epoch': 0.98} 98%|█████████▊| 12049/12313 [9:01:17<11:57, 2.72s/it] 98%|█████████▊| 12050/12313 [9:01:19<11:39, 2.66s/it] {'loss': 0.553, 'grad_norm': 9.342590625760383, 'learning_rate': 5.9802751765061785e-09, 'epoch': 0.98} 98%|█████████▊| 12050/12313 [9:01:19<11:39, 2.66s/it] 98%|█████████▊| 12051/12313 [9:01:22<12:03, 2.76s/it] {'loss': 0.5531, 'grad_norm': 6.856775571985887, 'learning_rate': 5.9349022255741905e-09, 'epoch': 0.98} 98%|█████████▊| 12051/12313 [9:01:22<12:03, 2.76s/it] 98%|█████████▊| 12052/12313 [9:01:25<11:56, 2.75s/it] {'loss': 0.5136, 'grad_norm': 4.222463014742318, 'learning_rate': 5.889701850747276e-09, 'epoch': 0.98} 98%|█████████▊| 12052/12313 [9:01:25<11:56, 2.75s/it] 98%|█████████▊| 12053/12313 [9:01:28<11:58, 2.76s/it] {'loss': 0.4999, 'grad_norm': 6.316498496538767, 'learning_rate': 5.844674055153487e-09, 'epoch': 0.98} 98%|█████████▊| 12053/12313 [9:01:28<11:58, 2.76s/it] 98%|█████████▊| 12054/12313 [9:01:30<11:37, 2.69s/it] {'loss': 0.4719, 'grad_norm': 5.951917290095282, 'learning_rate': 5.799818841907556e-09, 'epoch': 0.98} 98%|█████████▊| 12054/12313 [9:01:30<11:37, 2.69s/it] 98%|█████████▊| 12055/12313 [9:01:33<11:58, 2.78s/it] {'loss': 0.3861, 'grad_norm': 10.835272741624479, 'learning_rate': 5.7551362141142205e-09, 'epoch': 0.98} 98%|█████████▊| 12055/12313 [9:01:33<11:58, 2.78s/it] 98%|█████████▊| 12056/12313 [9:01:36<11:36, 2.71s/it] {'loss': 0.5043, 'grad_norm': 6.226477036302456, 'learning_rate': 5.71062617486462e-09, 'epoch': 0.98} 98%|█████████▊| 12056/12313 [9:01:36<11:36, 2.71s/it] 98%|█████████▊| 12057/12313 [9:01:38<11:25, 2.68s/it] {'loss': 0.4041, 'grad_norm': 9.360240176470688, 'learning_rate': 5.666288727239066e-09, 'epoch': 0.98} 98%|█████████▊| 12057/12313 [9:01:38<11:25, 2.68s/it] 98%|█████████▊| 12058/12313 [9:01:41<11:15, 2.65s/it] {'loss': 0.3628, 'grad_norm': 8.608220193471967, 'learning_rate': 5.622123874305108e-09, 'epoch': 0.98} 98%|█████████▊| 12058/12313 [9:01:41<11:15, 2.65s/it] 98%|█████████▊| 12059/12313 [9:01:44<11:16, 2.67s/it] {'loss': 0.3776, 'grad_norm': 5.828438095145518, 'learning_rate': 5.578131619118909e-09, 'epoch': 0.98} 98%|█████████▊| 12059/12313 [9:01:44<11:16, 2.67s/it] 98%|█████████▊| 12060/12313 [9:01:46<10:55, 2.59s/it] {'loss': 0.4952, 'grad_norm': 4.52241346057629, 'learning_rate': 5.534311964724426e-09, 'epoch': 0.98} 98%|█████████▊| 12060/12313 [9:01:46<10:55, 2.59s/it] 98%|█████████▊| 12061/12313 [9:01:49<11:00, 2.62s/it] {'loss': 0.4629, 'grad_norm': 3.8196975497917736, 'learning_rate': 5.490664914153676e-09, 'epoch': 0.98} 98%|█████████▊| 12061/12313 [9:01:49<11:00, 2.62s/it] 98%|█████████▊| 12062/12313 [9:01:52<11:11, 2.67s/it] {'loss': 0.4722, 'grad_norm': 4.37711626897485, 'learning_rate': 5.447190470427022e-09, 'epoch': 0.98} 98%|█████████▊| 12062/12313 [9:01:52<11:11, 2.67s/it] 98%|█████████▊| 12063/12313 [9:01:54<10:50, 2.60s/it] {'loss': 0.5534, 'grad_norm': 3.977099684985591, 'learning_rate': 5.4038886365523346e-09, 'epoch': 0.98} 98%|█████████▊| 12063/12313 [9:01:54<10:50, 2.60s/it] 98%|█████████▊| 12064/12313 [9:01:57<10:51, 2.62s/it] {'loss': 0.4216, 'grad_norm': 3.64657798618061, 'learning_rate': 5.360759415526385e-09, 'epoch': 0.98} 98%|█████████▊| 12064/12313 [9:01:57<10:51, 2.62s/it] 98%|█████████▊| 12065/12313 [9:01:59<10:48, 2.62s/it] {'loss': 0.4299, 'grad_norm': 5.625694324125192, 'learning_rate': 5.3178028103331725e-09, 'epoch': 0.98} 98%|█████████▊| 12065/12313 [9:01:59<10:48, 2.62s/it] 98%|█████████▊| 12066/12313 [9:02:02<10:48, 2.62s/it] {'loss': 0.4106, 'grad_norm': 3.7515462457345823, 'learning_rate': 5.275018823945044e-09, 'epoch': 0.98} 98%|█████████▊| 12066/12313 [9:02:02<10:48, 2.62s/it] 98%|█████████▊| 12067/12313 [9:02:05<10:47, 2.63s/it] {'loss': 0.3944, 'grad_norm': 4.423453741648708, 'learning_rate': 5.232407459322408e-09, 'epoch': 0.98} 98%|█████████▊| 12067/12313 [9:02:05<10:47, 2.63s/it] 98%|█████████▊| 12068/12313 [9:02:07<10:41, 2.62s/it] {'loss': 0.379, 'grad_norm': 5.1224553290352635, 'learning_rate': 5.189968719413741e-09, 'epoch': 0.98} 98%|█████████▊| 12068/12313 [9:02:07<10:41, 2.62s/it] 98%|█████████▊| 12069/12313 [9:02:10<10:45, 2.65s/it] {'loss': 0.4096, 'grad_norm': 7.181016487036675, 'learning_rate': 5.14770260715558e-09, 'epoch': 0.98} 98%|█████████▊| 12069/12313 [9:02:10<10:45, 2.65s/it] 98%|█████████▊| 12070/12313 [9:02:13<10:47, 2.67s/it] {'loss': 0.5059, 'grad_norm': 5.486473028790214, 'learning_rate': 5.10560912547281e-09, 'epoch': 0.98} 98%|█████████▊| 12070/12313 [9:02:13<10:47, 2.67s/it] 98%|█████████▊| 12071/12313 [9:02:15<11:02, 2.74s/it] {'loss': 0.444, 'grad_norm': 4.843785469577583, 'learning_rate': 5.063688277277545e-09, 'epoch': 0.98} 98%|█████████▊| 12071/12313 [9:02:15<11:02, 2.74s/it] 98%|█████████▊| 12072/12313 [9:02:18<10:31, 2.62s/it] {'loss': 0.4866, 'grad_norm': 12.938046238864795, 'learning_rate': 5.021940065471076e-09, 'epoch': 0.98} 98%|█████████▊| 12072/12313 [9:02:18<10:31, 2.62s/it] 98%|█████████▊| 12073/12313 [9:02:20<10:26, 2.61s/it] {'loss': 0.4679, 'grad_norm': 10.093029172739994, 'learning_rate': 4.980364492941924e-09, 'epoch': 0.98} 98%|█████████▊| 12073/12313 [9:02:20<10:26, 2.61s/it] 98%|█████████▊| 12074/12313 [9:02:23<10:29, 2.63s/it] {'loss': 0.3914, 'grad_norm': 7.273329994281363, 'learning_rate': 4.938961562566402e-09, 'epoch': 0.98} 98%|█████████▊| 12074/12313 [9:02:23<10:29, 2.63s/it] 98%|█████████▊| 12075/12313 [9:02:26<10:40, 2.69s/it] {'loss': 0.5183, 'grad_norm': 3.3725698256307437, 'learning_rate': 4.8977312772102715e-09, 'epoch': 0.98} 98%|█████████▊| 12075/12313 [9:02:26<10:40, 2.69s/it] 98%|█████████▊| 12076/12313 [9:02:29<10:35, 2.68s/it] {'loss': 0.498, 'grad_norm': 5.1217541737470835, 'learning_rate': 4.856673639725695e-09, 'epoch': 0.98} 98%|█████████▊| 12076/12313 [9:02:29<10:35, 2.68s/it] 98%|█████████▊| 12077/12313 [9:02:31<10:29, 2.67s/it] {'loss': 0.4615, 'grad_norm': 3.8730107181355535, 'learning_rate': 4.815788652954012e-09, 'epoch': 0.98} 98%|█████████▊| 12077/12313 [9:02:31<10:29, 2.67s/it] 98%|█████████▊| 12078/12313 [9:02:34<10:15, 2.62s/it] {'loss': 0.5011, 'grad_norm': 8.294253938971737, 'learning_rate': 4.775076319724348e-09, 'epoch': 0.98} 98%|█████████▊| 12078/12313 [9:02:34<10:15, 2.62s/it] 98%|█████████▊| 12079/12313 [9:02:36<10:15, 2.63s/it] {'loss': 0.5997, 'grad_norm': 3.478676476091108, 'learning_rate': 4.734536642853338e-09, 'epoch': 0.98} 98%|█████████▊| 12079/12313 [9:02:36<10:15, 2.63s/it] 98%|█████████▊| 12080/12313 [9:02:39<10:02, 2.59s/it] {'loss': 0.6451, 'grad_norm': 4.181120233195193, 'learning_rate': 4.6941696251465165e-09, 'epoch': 0.98} 98%|█████████▊| 12080/12313 [9:02:39<10:02, 2.59s/it] 98%|█████████▊| 12081/12313 [9:02:42<10:13, 2.65s/it] {'loss': 0.4854, 'grad_norm': 3.6134990519276466, 'learning_rate': 4.6539752693969265e-09, 'epoch': 0.98} 98%|█████████▊| 12081/12313 [9:02:42<10:13, 2.65s/it] 98%|█████████▊| 12082/12313 [9:02:45<10:30, 2.73s/it] {'loss': 0.3663, 'grad_norm': 11.700469427119463, 'learning_rate': 4.613953578385954e-09, 'epoch': 0.98} 98%|█████████▊| 12082/12313 [9:02:45<10:30, 2.73s/it] 98%|█████████▊| 12083/12313 [9:02:47<10:25, 2.72s/it] {'loss': 0.7481, 'grad_norm': 4.369290046988292, 'learning_rate': 4.574104554882497e-09, 'epoch': 0.98} 98%|█████████▊| 12083/12313 [9:02:47<10:25, 2.72s/it] 98%|█████████▊| 12084/12313 [9:02:50<10:10, 2.67s/it] {'loss': 0.423, 'grad_norm': 5.401987698332693, 'learning_rate': 4.534428201644348e-09, 'epoch': 0.98} 98%|█████████▊| 12084/12313 [9:02:50<10:10, 2.67s/it] 98%|█████████▊| 12085/12313 [9:02:52<10:05, 2.65s/it] {'loss': 0.4895, 'grad_norm': 9.430350487972829, 'learning_rate': 4.494924521416533e-09, 'epoch': 0.98} 98%|█████████▊| 12085/12313 [9:02:52<10:05, 2.65s/it] 98%|█████████▊| 12086/12313 [9:02:55<10:06, 2.67s/it] {'loss': 0.5178, 'grad_norm': 3.7707234936613565, 'learning_rate': 4.455593516932699e-09, 'epoch': 0.98} 98%|█████████▊| 12086/12313 [9:02:55<10:06, 2.67s/it] 98%|█████████▊| 12087/12313 [9:02:58<09:52, 2.62s/it] {'loss': 0.3872, 'grad_norm': 6.985985247371312, 'learning_rate': 4.4164351909142815e-09, 'epoch': 0.98} 98%|█████████▊| 12087/12313 [9:02:58<09:52, 2.62s/it] 98%|█████████▊| 12088/12313 [9:03:00<09:48, 2.61s/it] {'loss': 0.4994, 'grad_norm': 5.1330117460195215, 'learning_rate': 4.377449546071055e-09, 'epoch': 0.98} 98%|█████████▊| 12088/12313 [9:03:00<09:48, 2.61s/it] 98%|█████████▊| 12089/12313 [9:03:03<10:03, 2.69s/it] {'loss': 0.5062, 'grad_norm': 6.234609568273921, 'learning_rate': 4.338636585100309e-09, 'epoch': 0.98} 98%|█████████▊| 12089/12313 [9:03:03<10:03, 2.69s/it] 98%|█████████▊| 12090/12313 [9:03:06<10:01, 2.70s/it] {'loss': 0.5828, 'grad_norm': 7.581218288677617, 'learning_rate': 4.299996310687671e-09, 'epoch': 0.98} 98%|█████████▊| 12090/12313 [9:03:06<10:01, 2.70s/it] 98%|█████████▊| 12091/12313 [9:03:08<09:57, 2.69s/it] {'loss': 0.472, 'grad_norm': 5.845752308546046, 'learning_rate': 4.261528725507113e-09, 'epoch': 0.98} 98%|█████████▊| 12091/12313 [9:03:08<09:57, 2.69s/it] 98%|█████████▊| 12092/12313 [9:03:11<09:48, 2.67s/it] {'loss': 0.4632, 'grad_norm': 7.146820683688464, 'learning_rate': 4.223233832220397e-09, 'epoch': 0.98} 98%|█████████▊| 12092/12313 [9:03:11<09:48, 2.67s/it] 98%|█████████▊| 12093/12313 [9:03:14<09:35, 2.62s/it] {'loss': 0.6537, 'grad_norm': 3.2381775735767935, 'learning_rate': 4.18511163347679e-09, 'epoch': 0.98} 98%|█████████▊| 12093/12313 [9:03:14<09:35, 2.62s/it] 98%|█████████▊| 12094/12313 [9:03:16<09:26, 2.59s/it] {'loss': 0.3759, 'grad_norm': 5.730027453232988, 'learning_rate': 4.147162131914739e-09, 'epoch': 0.98} 98%|█████████▊| 12094/12313 [9:03:16<09:26, 2.59s/it] 98%|█████████▊| 12095/12313 [9:03:19<09:23, 2.58s/it] {'loss': 0.5452, 'grad_norm': 3.5984616089033827, 'learning_rate': 4.109385330159921e-09, 'epoch': 0.98} 98%|█████████▊| 12095/12313 [9:03:19<09:23, 2.58s/it] 98%|█████████▊| 12096/12313 [9:03:21<09:11, 2.54s/it] {'loss': 0.4365, 'grad_norm': 6.769530777534317, 'learning_rate': 4.071781230826355e-09, 'epoch': 0.98} 98%|█████████▊| 12096/12313 [9:03:21<09:11, 2.54s/it] 98%|█████████▊| 12097/12313 [9:03:24<09:16, 2.58s/it] {'loss': 0.4541, 'grad_norm': 5.819933789150477, 'learning_rate': 4.034349836516127e-09, 'epoch': 0.98} 98%|█████████▊| 12097/12313 [9:03:24<09:16, 2.58s/it] 98%|█████████▊| 12098/12313 [9:03:26<09:19, 2.60s/it] {'loss': 0.3554, 'grad_norm': 5.491397739648694, 'learning_rate': 3.99709114981911e-09, 'epoch': 0.98} 98%|█████████▊| 12098/12313 [9:03:26<09:19, 2.60s/it] 98%|█████████▊| 12099/12313 [9:03:29<09:17, 2.60s/it] {'loss': 0.3631, 'grad_norm': 5.29270938282257, 'learning_rate': 3.960005173313519e-09, 'epoch': 0.98} 98%|█████████▊| 12099/12313 [9:03:29<09:17, 2.60s/it] 98%|█████████▊| 12100/12313 [9:03:32<09:32, 2.69s/it] {'loss': 0.4277, 'grad_norm': 5.1898426034045615, 'learning_rate': 3.923091909565357e-09, 'epoch': 0.98} 98%|█████████▊| 12100/12313 [9:03:32<09:32, 2.69s/it] 98%|█████████▊| 12101/12313 [9:03:35<09:31, 2.70s/it] {'loss': 0.4089, 'grad_norm': 4.856901331420813, 'learning_rate': 3.88635136112897e-09, 'epoch': 0.98} 98%|█████████▊| 12101/12313 [9:03:35<09:31, 2.70s/it] 98%|█████████▊| 12102/12313 [9:03:37<09:32, 2.71s/it] {'loss': 0.6127, 'grad_norm': 3.4872568138932762, 'learning_rate': 3.8497835305464915e-09, 'epoch': 0.98} 98%|█████████▊| 12102/12313 [9:03:37<09:32, 2.71s/it] 98%|█████████▊| 12103/12313 [9:03:40<09:26, 2.70s/it] {'loss': 0.4843, 'grad_norm': 13.086059636580629, 'learning_rate': 3.813388420348396e-09, 'epoch': 0.98} 98%|█████████▊| 12103/12313 [9:03:40<09:26, 2.70s/it] 98%|█████████▊| 12104/12313 [9:03:43<09:31, 2.73s/it] {'loss': 0.4887, 'grad_norm': 5.395590802101605, 'learning_rate': 3.777166033052948e-09, 'epoch': 0.98} 98%|█████████▊| 12104/12313 [9:03:43<09:31, 2.73s/it] 98%|█████████▊| 12105/12313 [9:03:46<09:29, 2.74s/it] {'loss': 0.5797, 'grad_norm': 4.809183849119693, 'learning_rate': 3.741116371166476e-09, 'epoch': 0.98} 98%|█████████▊| 12105/12313 [9:03:46<09:29, 2.74s/it] 98%|█████████▊| 12106/12313 [9:03:48<09:33, 2.77s/it] {'loss': 0.6074, 'grad_norm': 7.000518609996093, 'learning_rate': 3.705239437183372e-09, 'epoch': 0.98} 98%|█████████▊| 12106/12313 [9:03:48<09:33, 2.77s/it] 98%|█████████▊| 12107/12313 [9:03:51<09:13, 2.69s/it] {'loss': 0.3449, 'grad_norm': 7.749862840244079, 'learning_rate': 3.6695352335863745e-09, 'epoch': 0.98} 98%|█████████▊| 12107/12313 [9:03:51<09:13, 2.69s/it] 98%|█████████▊| 12108/12313 [9:03:54<09:24, 2.75s/it] {'loss': 0.4941, 'grad_norm': 5.2466883835337494, 'learning_rate': 3.6340037628460057e-09, 'epoch': 0.98} 98%|█████████▊| 12108/12313 [9:03:54<09:24, 2.75s/it] 98%|█████████▊| 12109/12313 [9:03:57<09:18, 2.74s/it] {'loss': 0.4583, 'grad_norm': 20.385619121552477, 'learning_rate': 3.5986450274205776e-09, 'epoch': 0.98} 98%|█████████▊| 12109/12313 [9:03:57<09:18, 2.74s/it] 98%|█████████▊| 12110/12313 [9:03:59<09:27, 2.79s/it] {'loss': 0.4225, 'grad_norm': 4.0593617565984035, 'learning_rate': 3.5634590297570215e-09, 'epoch': 0.98} 98%|█████████▊| 12110/12313 [9:03:59<09:27, 2.79s/it] 98%|█████████▊| 12111/12313 [9:04:02<09:17, 2.76s/it] {'loss': 0.5616, 'grad_norm': 6.465413464659027, 'learning_rate': 3.528445772289779e-09, 'epoch': 0.98} 98%|█████████▊| 12111/12313 [9:04:02<09:17, 2.76s/it] 98%|█████████▊| 12112/12313 [9:04:05<09:15, 2.76s/it] {'loss': 0.3596, 'grad_norm': 8.991522570276532, 'learning_rate': 3.4936052574416345e-09, 'epoch': 0.98} 98%|█████████▊| 12112/12313 [9:04:05<09:15, 2.76s/it] 98%|█████████▊| 12113/12313 [9:04:08<09:15, 2.78s/it] {'loss': 0.4936, 'grad_norm': 4.818729279084131, 'learning_rate': 3.458937487623437e-09, 'epoch': 0.98} 98%|█████████▊| 12113/12313 [9:04:08<09:15, 2.78s/it] 98%|█████████▊| 12114/12313 [9:04:11<09:10, 2.77s/it] {'loss': 0.4451, 'grad_norm': 6.112039073938186, 'learning_rate': 3.424442465234101e-09, 'epoch': 0.98} 98%|█████████▊| 12114/12313 [9:04:11<09:10, 2.77s/it] 98%|█████████▊| 12115/12313 [9:04:13<09:05, 2.76s/it] {'loss': 0.5395, 'grad_norm': 6.296027842842914, 'learning_rate': 3.3901201926606063e-09, 'epoch': 0.98} 98%|█████████▊| 12115/12313 [9:04:13<09:05, 2.76s/it] 98%|█████████▊| 12116/12313 [9:04:16<08:39, 2.64s/it] {'loss': 0.6561, 'grad_norm': 5.253597703880921, 'learning_rate': 3.3559706722774423e-09, 'epoch': 0.98} 98%|█████████▊| 12116/12313 [9:04:16<08:39, 2.64s/it] 98%|█████████▊| 12117/12313 [9:04:19<08:53, 2.72s/it] {'loss': 0.3641, 'grad_norm': 5.551682606805065, 'learning_rate': 3.3219939064477182e-09, 'epoch': 0.98} 98%|█████████▊| 12117/12313 [9:04:19<08:53, 2.72s/it] 98%|█████████▊| 12118/12313 [9:04:21<08:43, 2.68s/it] {'loss': 0.4635, 'grad_norm': 7.465132500098328, 'learning_rate': 3.288189897522609e-09, 'epoch': 0.98} 98%|█████████▊| 12118/12313 [9:04:21<08:43, 2.68s/it] 98%|█████████▊| 12119/12313 [9:04:24<08:41, 2.69s/it] {'loss': 0.5209, 'grad_norm': 8.173920927315894, 'learning_rate': 3.254558647841077e-09, 'epoch': 0.98} 98%|█████████▊| 12119/12313 [9:04:24<08:41, 2.69s/it] 98%|█████████▊| 12120/12313 [9:04:26<08:31, 2.65s/it] {'loss': 0.5823, 'grad_norm': 4.788329870369596, 'learning_rate': 3.2211001597304283e-09, 'epoch': 0.98} 98%|█████████▊| 12120/12313 [9:04:26<08:31, 2.65s/it] 98%|█████████▊| 12121/12313 [9:04:29<08:24, 2.63s/it] {'loss': 0.4951, 'grad_norm': 6.8587617655903355, 'learning_rate': 3.187814435505199e-09, 'epoch': 0.98} 98%|█████████▊| 12121/12313 [9:04:29<08:24, 2.63s/it] 98%|█████████▊| 12122/12313 [9:04:32<08:19, 2.62s/it] {'loss': 0.3174, 'grad_norm': 5.481814887439901, 'learning_rate': 3.1547014774693797e-09, 'epoch': 0.98} 98%|█████████▊| 12122/12313 [9:04:32<08:19, 2.62s/it] 98%|█████████▊| 12123/12313 [9:04:35<08:52, 2.80s/it] {'loss': 0.4645, 'grad_norm': 8.885860015107669, 'learning_rate': 3.1217612879139158e-09, 'epoch': 0.98} 98%|█████████▊| 12123/12313 [9:04:35<08:52, 2.80s/it] 98%|█████████▊| 12124/12313 [9:04:37<08:41, 2.76s/it] {'loss': 0.4884, 'grad_norm': 4.284808548816115, 'learning_rate': 3.088993869117818e-09, 'epoch': 0.98} 98%|█████████▊| 12124/12313 [9:04:37<08:41, 2.76s/it] 98%|█████████▊| 12125/12313 [9:04:40<08:30, 2.72s/it] {'loss': 0.3905, 'grad_norm': 4.896143196735318, 'learning_rate': 3.056399223348716e-09, 'epoch': 0.98} 98%|█████████▊| 12125/12313 [9:04:40<08:30, 2.72s/it] 98%|█████████▊| 12126/12313 [9:04:43<08:41, 2.79s/it] {'loss': 0.6363, 'grad_norm': 4.786981373349127, 'learning_rate': 3.023977352861751e-09, 'epoch': 0.98} 98%|█████████▊| 12126/12313 [9:04:43<08:41, 2.79s/it] 98%|█████████▊| 12127/12313 [9:04:46<08:24, 2.71s/it] {'loss': 0.4777, 'grad_norm': 5.440565139296447, 'learning_rate': 2.991728259900684e-09, 'epoch': 0.98} 98%|█████████▊| 12127/12313 [9:04:46<08:24, 2.71s/it] 98%|█████████▊| 12128/12313 [9:04:48<08:17, 2.69s/it] {'loss': 0.4696, 'grad_norm': 4.432486814000843, 'learning_rate': 2.959651946696507e-09, 'epoch': 0.98} 98%|█████████▊| 12128/12313 [9:04:48<08:17, 2.69s/it] 99%|█████████▊| 12129/12313 [9:04:51<08:15, 2.69s/it] {'loss': 0.4412, 'grad_norm': 4.593612284410864, 'learning_rate': 2.927748415469389e-09, 'epoch': 0.99} 99%|█████████▊| 12129/12313 [9:04:51<08:15, 2.69s/it] 99%|█████████▊| 12130/12313 [9:04:53<08:08, 2.67s/it] {'loss': 0.5133, 'grad_norm': 11.349530667232669, 'learning_rate': 2.8960176684261767e-09, 'epoch': 0.99} 99%|█████████▊| 12130/12313 [9:04:53<08:08, 2.67s/it] 99%|█████████▊| 12131/12313 [9:04:56<08:01, 2.65s/it] {'loss': 0.4928, 'grad_norm': 7.977641323262256, 'learning_rate': 2.86445970776289e-09, 'epoch': 0.99} 99%|█████████▊| 12131/12313 [9:04:56<08:01, 2.65s/it] 99%|█████████▊| 12132/12313 [9:04:59<07:58, 2.64s/it] {'loss': 0.4539, 'grad_norm': 4.0771155412598805, 'learning_rate': 2.833074535663338e-09, 'epoch': 0.99} 99%|█████████▊| 12132/12313 [9:04:59<07:58, 2.64s/it] 99%|█████████▊| 12133/12313 [9:05:01<07:52, 2.63s/it] {'loss': 0.4797, 'grad_norm': 4.941422589211022, 'learning_rate': 2.8018621542988402e-09, 'epoch': 0.99} 99%|█████████▊| 12133/12313 [9:05:01<07:52, 2.63s/it] 99%|█████████▊| 12134/12313 [9:05:04<07:49, 2.62s/it] {'loss': 0.5164, 'grad_norm': 8.421332670522345, 'learning_rate': 2.7708225658290566e-09, 'epoch': 0.99} 99%|█████████▊| 12134/12313 [9:05:04<07:49, 2.62s/it] 99%|█████████▊| 12135/12313 [9:05:06<07:37, 2.57s/it] {'loss': 0.621, 'grad_norm': 7.591462565867755, 'learning_rate': 2.739955772401992e-09, 'epoch': 0.99} 99%|█████████▊| 12135/12313 [9:05:06<07:37, 2.57s/it] 99%|█████████▊| 12136/12313 [9:05:09<07:40, 2.60s/it] {'loss': 0.4788, 'grad_norm': 5.772643166679314, 'learning_rate': 2.709261776153438e-09, 'epoch': 0.99} 99%|█████████▊| 12136/12313 [9:05:09<07:40, 2.60s/it] 99%|█████████▊| 12137/12313 [9:05:12<07:55, 2.70s/it] {'loss': 0.4194, 'grad_norm': 6.132821992629694, 'learning_rate': 2.6787405792072507e-09, 'epoch': 0.99} 99%|█████████▊| 12137/12313 [9:05:12<07:55, 2.70s/it] 99%|█████████▊| 12138/12313 [9:05:15<08:12, 2.81s/it] {'loss': 0.4067, 'grad_norm': 5.663712178693497, 'learning_rate': 2.6483921836753525e-09, 'epoch': 0.99} 99%|█████████▊| 12138/12313 [9:05:15<08:12, 2.81s/it] 99%|█████████▊| 12139/12313 [9:05:18<08:00, 2.76s/it] {'loss': 0.547, 'grad_norm': 5.534184126963757, 'learning_rate': 2.6182165916577295e-09, 'epoch': 0.99} 99%|█████████▊| 12139/12313 [9:05:18<08:00, 2.76s/it] 99%|█████████▊| 12140/12313 [9:05:20<07:39, 2.66s/it] {'loss': 0.4595, 'grad_norm': 4.047838425287923, 'learning_rate': 2.5882138052421567e-09, 'epoch': 0.99} 99%|█████████▊| 12140/12313 [9:05:20<07:39, 2.66s/it] 99%|█████████▊| 12141/12313 [9:05:23<07:47, 2.72s/it] {'loss': 0.4067, 'grad_norm': 3.5712541301580667, 'learning_rate': 2.5583838265050286e-09, 'epoch': 0.99} 99%|█████████▊| 12141/12313 [9:05:23<07:47, 2.72s/it] 99%|█████████▊| 12142/12313 [9:05:26<07:38, 2.68s/it] {'loss': 0.5706, 'grad_norm': 5.1536702438204065, 'learning_rate': 2.52872665751025e-09, 'epoch': 0.99} 99%|█████████▊| 12142/12313 [9:05:26<07:38, 2.68s/it] 99%|█████████▊| 12143/12313 [9:05:28<07:48, 2.76s/it] {'loss': 0.5131, 'grad_norm': 16.616810366998326, 'learning_rate': 2.4992423003095124e-09, 'epoch': 0.99} 99%|█████████▊| 12143/12313 [9:05:28<07:48, 2.76s/it] 99%|█████████▊| 12144/12313 [9:05:31<07:34, 2.69s/it] {'loss': 0.4475, 'grad_norm': 5.2196354867906685, 'learning_rate': 2.4699307569436835e-09, 'epoch': 0.99} 99%|█████████▊| 12144/12313 [9:05:31<07:34, 2.69s/it] 99%|█████████▊| 12145/12313 [9:05:34<07:31, 2.69s/it] {'loss': 0.381, 'grad_norm': 4.94188443326249, 'learning_rate': 2.4407920294405864e-09, 'epoch': 0.99} 99%|█████████▊| 12145/12313 [9:05:34<07:31, 2.69s/it] 99%|█████████▊| 12146/12313 [9:05:36<07:22, 2.65s/it] {'loss': 0.4683, 'grad_norm': 6.982635794914591, 'learning_rate': 2.4118261198166625e-09, 'epoch': 0.99} 99%|█████████▊| 12146/12313 [9:05:36<07:22, 2.65s/it] 99%|█████████▊| 12147/12313 [9:05:39<07:14, 2.62s/it] {'loss': 0.4486, 'grad_norm': 8.68074477826504, 'learning_rate': 2.383033030075865e-09, 'epoch': 0.99} 99%|█████████▊| 12147/12313 [9:05:39<07:14, 2.62s/it] 99%|█████████▊| 12148/12313 [9:05:41<07:12, 2.62s/it] {'loss': 0.5145, 'grad_norm': 4.488613369870494, 'learning_rate': 2.354412762210767e-09, 'epoch': 0.99} 99%|█████████▊| 12148/12313 [9:05:41<07:12, 2.62s/it] 99%|█████████▊| 12149/12313 [9:05:44<07:15, 2.66s/it] {'loss': 0.4742, 'grad_norm': 8.342369544568935, 'learning_rate': 2.325965318201728e-09, 'epoch': 0.99} 99%|█████████▊| 12149/12313 [9:05:44<07:15, 2.66s/it] 99%|█████████▊| 12150/12313 [9:05:47<07:10, 2.64s/it] {'loss': 0.4517, 'grad_norm': 4.473725413880338, 'learning_rate': 2.2976907000171743e-09, 'epoch': 0.99} 99%|█████████▊| 12150/12313 [9:05:47<07:10, 2.64s/it] 99%|█████████▊| 12151/12313 [9:05:49<07:03, 2.61s/it] {'loss': 0.5598, 'grad_norm': 5.732059155908079, 'learning_rate': 2.2695889096133184e-09, 'epoch': 0.99} 99%|█████████▊| 12151/12313 [9:05:49<07:03, 2.61s/it] 99%|█████████▊| 12152/12313 [9:05:52<06:58, 2.60s/it] {'loss': 0.4432, 'grad_norm': 6.627352082908765, 'learning_rate': 2.2416599489349933e-09, 'epoch': 0.99} 99%|█████████▊| 12152/12313 [9:05:52<06:58, 2.60s/it] 99%|█████████▊| 12153/12313 [9:05:54<06:50, 2.56s/it] {'loss': 0.641, 'grad_norm': 6.4462903139711525, 'learning_rate': 2.2139038199145424e-09, 'epoch': 0.99} 99%|█████████▊| 12153/12313 [9:05:54<06:50, 2.56s/it] 99%|█████████▊| 12154/12313 [9:05:57<06:55, 2.61s/it] {'loss': 0.3059, 'grad_norm': 4.235134074339075, 'learning_rate': 2.1863205244726514e-09, 'epoch': 0.99} 99%|█████████▊| 12154/12313 [9:05:57<06:55, 2.61s/it] 99%|█████████▊| 12155/12313 [9:06:00<06:51, 2.61s/it] {'loss': 0.4217, 'grad_norm': 10.34898874466498, 'learning_rate': 2.1589100645180715e-09, 'epoch': 0.99} 99%|█████████▊| 12155/12313 [9:06:00<06:51, 2.61s/it] 99%|█████████▊| 12156/12313 [9:06:02<06:49, 2.61s/it] {'loss': 0.4262, 'grad_norm': 7.083106400400486, 'learning_rate': 2.1316724419470637e-09, 'epoch': 0.99} 99%|█████████▊| 12156/12313 [9:06:02<06:49, 2.61s/it] 99%|█████████▊| 12157/12313 [9:06:05<06:42, 2.58s/it] {'loss': 0.4115, 'grad_norm': 4.880842030469988, 'learning_rate': 2.1046076586445084e-09, 'epoch': 0.99} 99%|█████████▊| 12157/12313 [9:06:05<06:42, 2.58s/it] 99%|█████████▊| 12158/12313 [9:06:08<06:51, 2.66s/it] {'loss': 0.4031, 'grad_norm': 5.428807688681806, 'learning_rate': 2.077715716483353e-09, 'epoch': 0.99} 99%|█████████▊| 12158/12313 [9:06:08<06:51, 2.66s/it] 99%|█████████▊| 12159/12313 [9:06:10<06:53, 2.68s/it] {'loss': 0.4478, 'grad_norm': 3.5615581342610594, 'learning_rate': 2.0509966173240524e-09, 'epoch': 0.99} 99%|█████████▊| 12159/12313 [9:06:10<06:53, 2.68s/it] 99%|█████████▉| 12160/12313 [9:06:13<06:48, 2.67s/it] {'loss': 0.566, 'grad_norm': 4.9802028956802245, 'learning_rate': 2.0244503630154066e-09, 'epoch': 0.99} 99%|█████████▉| 12160/12313 [9:06:13<06:48, 2.67s/it] 99%|█████████▉| 12161/12313 [9:06:16<06:39, 2.63s/it] {'loss': 0.3933, 'grad_norm': 3.88910810121843, 'learning_rate': 1.9980769553948344e-09, 'epoch': 0.99} 99%|█████████▉| 12161/12313 [9:06:16<06:39, 2.63s/it] 99%|█████████▉| 12162/12313 [9:06:18<06:41, 2.66s/it] {'loss': 0.4622, 'grad_norm': 4.551525974114428, 'learning_rate': 1.9718763962867094e-09, 'epoch': 0.99} 99%|█████████▉| 12162/12313 [9:06:18<06:41, 2.66s/it] 99%|█████████▉| 12163/12313 [9:06:21<06:34, 2.63s/it] {'loss': 0.5204, 'grad_norm': 6.650520927884947, 'learning_rate': 1.945848687504026e-09, 'epoch': 0.99} 99%|█████████▉| 12163/12313 [9:06:21<06:34, 2.63s/it] 99%|█████████▉| 12164/12313 [9:06:24<06:42, 2.70s/it] {'loss': 0.3685, 'grad_norm': 5.106322645005586, 'learning_rate': 1.919993830847844e-09, 'epoch': 0.99} 99%|█████████▉| 12164/12313 [9:06:24<06:42, 2.70s/it] 99%|█████████▉| 12165/12313 [9:06:26<06:34, 2.66s/it] {'loss': 0.4464, 'grad_norm': 5.293912542968066, 'learning_rate': 1.8943118281070095e-09, 'epoch': 0.99} 99%|█████████▉| 12165/12313 [9:06:26<06:34, 2.66s/it] 99%|█████████▉| 12166/12313 [9:06:29<06:30, 2.66s/it] {'loss': 0.5068, 'grad_norm': 4.885311151399439, 'learning_rate': 1.86880268105899e-09, 'epoch': 0.99} 99%|█████████▉| 12166/12313 [9:06:29<06:30, 2.66s/it] 99%|█████████▉| 12167/12313 [9:06:31<06:23, 2.63s/it] {'loss': 0.4485, 'grad_norm': 5.1380448467955295, 'learning_rate': 1.8434663914687623e-09, 'epoch': 0.99} 99%|█████████▉| 12167/12313 [9:06:31<06:23, 2.63s/it] 99%|█████████▉| 12168/12313 [9:06:34<06:19, 2.62s/it] {'loss': 0.5346, 'grad_norm': 4.9192525339798925, 'learning_rate': 1.8183029610890912e-09, 'epoch': 0.99} 99%|█████████▉| 12168/12313 [9:06:34<06:19, 2.62s/it] 99%|█████████▉| 12169/12313 [9:06:37<06:27, 2.69s/it] {'loss': 0.4065, 'grad_norm': 4.9277595849293565, 'learning_rate': 1.7933123916613614e-09, 'epoch': 0.99} 99%|█████████▉| 12169/12313 [9:06:37<06:27, 2.69s/it] 99%|█████████▉| 12170/12313 [9:06:39<06:12, 2.61s/it] {'loss': 0.459, 'grad_norm': 5.8993559652023215, 'learning_rate': 1.7684946849150232e-09, 'epoch': 0.99} 99%|█████████▉| 12170/12313 [9:06:39<06:12, 2.61s/it] 99%|█████████▉| 12171/12313 [9:06:42<06:28, 2.74s/it] {'loss': 0.5248, 'grad_norm': 4.936973153328386, 'learning_rate': 1.7438498425673135e-09, 'epoch': 0.99} 99%|█████████▉| 12171/12313 [9:06:42<06:28, 2.74s/it] 99%|█████████▉| 12172/12313 [9:06:45<06:36, 2.81s/it] {'loss': 0.5324, 'grad_norm': 4.4227762528236125, 'learning_rate': 1.7193778663229799e-09, 'epoch': 0.99} 99%|█████████▉| 12172/12313 [9:06:45<06:36, 2.81s/it] 99%|█████████▉| 12173/12313 [9:06:48<06:16, 2.69s/it] {'loss': 0.5095, 'grad_norm': 4.957741580907209, 'learning_rate': 1.6950787578759453e-09, 'epoch': 0.99} 99%|█████████▉| 12173/12313 [9:06:48<06:16, 2.69s/it] 99%|█████████▉| 12174/12313 [9:06:51<06:18, 2.72s/it] {'loss': 0.4332, 'grad_norm': 6.64145740938836, 'learning_rate': 1.6709525189073649e-09, 'epoch': 0.99} 99%|█████████▉| 12174/12313 [9:06:51<06:18, 2.72s/it] 99%|█████████▉| 12175/12313 [9:06:53<06:13, 2.71s/it] {'loss': 0.3923, 'grad_norm': 6.145921162498026, 'learning_rate': 1.646999151086459e-09, 'epoch': 0.99} 99%|█████████▉| 12175/12313 [9:06:53<06:13, 2.71s/it] 99%|█████████▉| 12176/12313 [9:06:56<06:08, 2.69s/it] {'loss': 0.4596, 'grad_norm': 5.254183282644911, 'learning_rate': 1.6232186560710684e-09, 'epoch': 0.99} 99%|█████████▉| 12176/12313 [9:06:56<06:08, 2.69s/it] 99%|█████████▉| 12177/12313 [9:06:59<06:05, 2.69s/it] {'loss': 0.6023, 'grad_norm': 17.99187941281074, 'learning_rate': 1.599611035506543e-09, 'epoch': 0.99} 99%|█████████▉| 12177/12313 [9:06:59<06:05, 2.69s/it] 99%|█████████▉| 12178/12313 [9:07:01<05:47, 2.58s/it] {'loss': 0.5383, 'grad_norm': 10.290258462437267, 'learning_rate': 1.5761762910260214e-09, 'epoch': 0.99} 99%|█████████▉| 12178/12313 [9:07:01<05:47, 2.58s/it] 99%|█████████▉| 12179/12313 [9:07:04<05:46, 2.59s/it] {'loss': 0.5216, 'grad_norm': 6.646644372837433, 'learning_rate': 1.5529144242518167e-09, 'epoch': 0.99} 99%|█████████▉| 12179/12313 [9:07:04<05:46, 2.59s/it] 99%|█████████▉| 12180/12313 [9:07:06<05:46, 2.60s/it] {'loss': 0.4035, 'grad_norm': 4.4766333642428675, 'learning_rate': 1.5298254367926424e-09, 'epoch': 0.99} 99%|█████████▉| 12180/12313 [9:07:06<05:46, 2.60s/it] 99%|█████████▉| 12181/12313 [9:07:09<05:54, 2.68s/it] {'loss': 0.4324, 'grad_norm': 9.914986537758677, 'learning_rate': 1.5069093302469418e-09, 'epoch': 0.99} 99%|█████████▉| 12181/12313 [9:07:09<05:54, 2.68s/it] 99%|█████████▉| 12182/12313 [9:07:12<06:03, 2.77s/it] {'loss': 0.4502, 'grad_norm': 4.176667675043893, 'learning_rate': 1.4841661061998358e-09, 'epoch': 0.99} 99%|█████████▉| 12182/12313 [9:07:12<06:03, 2.77s/it] 99%|█████████▉| 12183/12313 [9:07:15<06:01, 2.78s/it] {'loss': 0.4701, 'grad_norm': 3.795368320920534, 'learning_rate': 1.4615957662250657e-09, 'epoch': 0.99} 99%|█████████▉| 12183/12313 [9:07:15<06:01, 2.78s/it] 99%|█████████▉| 12184/12313 [9:07:17<05:47, 2.69s/it] {'loss': 0.605, 'grad_norm': 5.32972800464418, 'learning_rate': 1.4391983118847152e-09, 'epoch': 0.99} 99%|█████████▉| 12184/12313 [9:07:17<05:47, 2.69s/it] 99%|█████████▉| 12185/12313 [9:07:20<05:40, 2.66s/it] {'loss': 0.3262, 'grad_norm': 6.008043856391167, 'learning_rate': 1.4169737447283782e-09, 'epoch': 0.99} 99%|█████████▉| 12185/12313 [9:07:20<05:40, 2.66s/it] 99%|█████████▉| 12186/12313 [9:07:22<05:30, 2.60s/it] {'loss': 0.5486, 'grad_norm': 7.84003745927447, 'learning_rate': 1.394922066293991e-09, 'epoch': 0.99} 99%|█████████▉| 12186/12313 [9:07:22<05:30, 2.60s/it] 99%|█████████▉| 12187/12313 [9:07:25<05:34, 2.65s/it] {'loss': 0.4331, 'grad_norm': 6.066959827307744, 'learning_rate': 1.3730432781070002e-09, 'epoch': 0.99} 99%|█████████▉| 12187/12313 [9:07:25<05:34, 2.65s/it] 99%|█████████▉| 12188/12313 [9:07:28<05:24, 2.59s/it] {'loss': 0.642, 'grad_norm': 17.157061321750632, 'learning_rate': 1.3513373816820274e-09, 'epoch': 0.99} 99%|█████████▉| 12188/12313 [9:07:28<05:24, 2.59s/it] 99%|█████████▉| 12189/12313 [9:07:30<05:19, 2.57s/it] {'loss': 0.4469, 'grad_norm': 3.932173196085788, 'learning_rate': 1.3298043785203718e-09, 'epoch': 0.99} 99%|█████████▉| 12189/12313 [9:07:30<05:19, 2.57s/it] 99%|█████████▉| 12190/12313 [9:07:33<05:19, 2.60s/it] {'loss': 0.485, 'grad_norm': 8.805868997589002, 'learning_rate': 1.30844427011223e-09, 'epoch': 0.99} 99%|█████████▉| 12190/12313 [9:07:33<05:19, 2.60s/it] 99%|█████████▉| 12191/12313 [9:07:36<05:24, 2.66s/it] {'loss': 0.4275, 'grad_norm': 4.027618081932802, 'learning_rate': 1.287257057935587e-09, 'epoch': 0.99} 99%|█████████▉| 12191/12313 [9:07:36<05:24, 2.66s/it] 99%|█████████▉| 12192/12313 [9:07:38<05:23, 2.68s/it] {'loss': 0.5487, 'grad_norm': 8.085780436510987, 'learning_rate': 1.2662427434564916e-09, 'epoch': 0.99} 99%|█████████▉| 12192/12313 [9:07:38<05:23, 2.68s/it] 99%|█████████▉| 12193/12313 [9:07:41<05:27, 2.73s/it] {'loss': 0.4155, 'grad_norm': 6.6055275665447875, 'learning_rate': 1.2454013281290589e-09, 'epoch': 0.99} 99%|█████████▉| 12193/12313 [9:07:41<05:27, 2.73s/it] 99%|█████████▉| 12194/12313 [9:07:44<05:20, 2.69s/it] {'loss': 0.4356, 'grad_norm': 5.7796904505260365, 'learning_rate': 1.2247328133954683e-09, 'epoch': 0.99} 99%|█████████▉| 12194/12313 [9:07:44<05:20, 2.69s/it] 99%|█████████▉| 12195/12313 [9:07:47<05:33, 2.83s/it] {'loss': 0.4525, 'grad_norm': 4.4243001082018445, 'learning_rate': 1.2042372006856873e-09, 'epoch': 0.99} 99%|█████████▉| 12195/12313 [9:07:47<05:33, 2.83s/it] 99%|█████████▉| 12196/12313 [9:07:49<05:21, 2.75s/it] {'loss': 0.3391, 'grad_norm': 5.706811651819559, 'learning_rate': 1.1839144914180256e-09, 'epoch': 0.99} 99%|█████████▉| 12196/12313 [9:07:49<05:21, 2.75s/it] 99%|█████████▉| 12197/12313 [9:07:52<05:17, 2.74s/it] {'loss': 0.4905, 'grad_norm': 5.337953154660965, 'learning_rate': 1.1637646869985809e-09, 'epoch': 0.99} 99%|█████████▉| 12197/12313 [9:07:52<05:17, 2.74s/it] 99%|█████████▉| 12198/12313 [9:07:55<05:04, 2.65s/it] {'loss': 0.4736, 'grad_norm': 7.132197947930379, 'learning_rate': 1.143787788821793e-09, 'epoch': 0.99} 99%|█████████▉| 12198/12313 [9:07:55<05:04, 2.65s/it] 99%|█████████▉| 12199/12313 [9:07:57<04:56, 2.60s/it] {'loss': 0.4538, 'grad_norm': 8.488570779233747, 'learning_rate': 1.1239837982698898e-09, 'epoch': 0.99} 99%|█████████▉| 12199/12313 [9:07:57<04:56, 2.60s/it] 99%|█████████▉| 12200/12313 [9:08:00<04:52, 2.59s/it] {'loss': 0.5289, 'grad_norm': 8.566390662400726, 'learning_rate': 1.104352716713164e-09, 'epoch': 0.99} 99%|█████████▉| 12200/12313 [9:08:00<04:52, 2.59s/it] 99%|█████████▉| 12201/12313 [9:08:02<04:52, 2.61s/it] {'loss': 0.4059, 'grad_norm': 7.713592201697891, 'learning_rate': 1.0848945455099734e-09, 'epoch': 0.99} 99%|█████████▉| 12201/12313 [9:08:02<04:52, 2.61s/it] 99%|█████████▉| 12202/12313 [9:08:05<04:47, 2.59s/it] {'loss': 0.367, 'grad_norm': 8.262715346834725, 'learning_rate': 1.0656092860067413e-09, 'epoch': 0.99} 99%|█████████▉| 12202/12313 [9:08:05<04:47, 2.59s/it] 99%|█████████▉| 12203/12313 [9:08:08<04:48, 2.62s/it] {'loss': 0.3928, 'grad_norm': 5.432388233712127, 'learning_rate': 1.046496939538233e-09, 'epoch': 0.99} 99%|█████████▉| 12203/12313 [9:08:08<04:48, 2.62s/it] 99%|█████████▉| 12204/12313 [9:08:10<04:41, 2.58s/it] {'loss': 0.5214, 'grad_norm': 5.147447230251187, 'learning_rate': 1.027557507426169e-09, 'epoch': 0.99} 99%|█████████▉| 12204/12313 [9:08:10<04:41, 2.58s/it] 99%|█████████▉| 12205/12313 [9:08:13<04:38, 2.58s/it] {'loss': 0.3809, 'grad_norm': 5.0721034321084515, 'learning_rate': 1.0087909909817228e-09, 'epoch': 0.99} 99%|█████████▉| 12205/12313 [9:08:13<04:38, 2.58s/it] 99%|█████████▉| 12206/12313 [9:08:15<04:38, 2.60s/it] {'loss': 0.5828, 'grad_norm': 3.1443588440586963, 'learning_rate': 9.901973915033004e-10, 'epoch': 0.99} 99%|█████████▉| 12206/12313 [9:08:15<04:38, 2.60s/it] 99%|█████████▉| 12207/12313 [9:08:18<04:36, 2.61s/it] {'loss': 0.3964, 'grad_norm': 7.70458840955126, 'learning_rate': 9.717767102770947e-10, 'epoch': 0.99} 99%|█████████▉| 12207/12313 [9:08:18<04:36, 2.61s/it] 99%|█████████▉| 12208/12313 [9:08:20<04:28, 2.56s/it] {'loss': 0.5808, 'grad_norm': 5.013950694674269, 'learning_rate': 9.535289485781973e-10, 'epoch': 0.99} 99%|█████████▉| 12208/12313 [9:08:20<04:28, 2.56s/it] 99%|█████████▉| 12209/12313 [9:08:23<04:29, 2.59s/it] {'loss': 0.3382, 'grad_norm': 5.467259982799362, 'learning_rate': 9.354541076692092e-10, 'epoch': 0.99} 99%|█████████▉| 12209/12313 [9:08:23<04:29, 2.59s/it] 99%|█████████▉| 12210/12313 [9:08:26<04:38, 2.70s/it] {'loss': 0.4828, 'grad_norm': 3.502450749974113, 'learning_rate': 9.17552188800519e-10, 'epoch': 0.99} 99%|█████████▉| 12210/12313 [9:08:26<04:38, 2.70s/it] 99%|█████████▉| 12211/12313 [9:08:29<04:43, 2.78s/it] {'loss': 0.4266, 'grad_norm': 4.810040707477097, 'learning_rate': 8.998231932108581e-10, 'epoch': 0.99} 99%|█████████▉| 12211/12313 [9:08:29<04:43, 2.78s/it] 99%|█████████▉| 12212/12313 [9:08:31<04:34, 2.71s/it] {'loss': 0.5797, 'grad_norm': 4.375638079472969, 'learning_rate': 8.822671221273005e-10, 'epoch': 0.99} 99%|█████████▉| 12212/12313 [9:08:31<04:34, 2.71s/it] 99%|█████████▉| 12213/12313 [9:08:34<04:36, 2.77s/it] {'loss': 0.621, 'grad_norm': 9.293670755342289, 'learning_rate': 8.648839767644302e-10, 'epoch': 0.99} 99%|█████████▉| 12213/12313 [9:08:34<04:36, 2.77s/it] 99%|█████████▉| 12214/12313 [9:08:37<04:28, 2.71s/it] {'loss': 0.4257, 'grad_norm': 6.6261486011288335, 'learning_rate': 8.476737583251737e-10, 'epoch': 0.99} 99%|█████████▉| 12214/12313 [9:08:37<04:28, 2.71s/it] 99%|█████████▉| 12215/12313 [9:08:40<04:25, 2.71s/it] {'loss': 0.522, 'grad_norm': 4.4991871759075375, 'learning_rate': 8.306364680002454e-10, 'epoch': 0.99} 99%|█████████▉| 12215/12313 [9:08:40<04:25, 2.71s/it] 99%|█████████▉| 12216/12313 [9:08:42<04:21, 2.70s/it] {'loss': 0.526, 'grad_norm': 4.881228997280039, 'learning_rate': 8.137721069687021e-10, 'epoch': 0.99} 99%|█████████▉| 12216/12313 [9:08:42<04:21, 2.70s/it] 99%|█████████▉| 12217/12313 [9:08:45<04:20, 2.71s/it] {'loss': 0.4575, 'grad_norm': 3.6297898118275964, 'learning_rate': 7.970806763973882e-10, 'epoch': 0.99} 99%|█████████▉| 12217/12313 [9:08:45<04:20, 2.71s/it] 99%|█████████▉| 12218/12313 [9:08:48<04:17, 2.71s/it] {'loss': 0.3662, 'grad_norm': 6.246461103137779, 'learning_rate': 7.805621774409356e-10, 'epoch': 0.99} 99%|█████████▉| 12218/12313 [9:08:48<04:17, 2.71s/it] 99%|█████████▉| 12219/12313 [9:08:50<04:12, 2.69s/it] {'loss': 0.5867, 'grad_norm': 5.111388107129699, 'learning_rate': 7.642166112428739e-10, 'epoch': 0.99} 99%|█████████▉| 12219/12313 [9:08:50<04:12, 2.69s/it] 99%|█████████▉| 12220/12313 [9:08:53<04:08, 2.67s/it] {'loss': 0.489, 'grad_norm': 6.103417103656217, 'learning_rate': 7.480439789339655e-10, 'epoch': 0.99} 99%|█████████▉| 12220/12313 [9:08:53<04:08, 2.67s/it] 99%|█████████▉| 12221/12313 [9:08:56<04:06, 2.68s/it] {'loss': 0.4852, 'grad_norm': 5.872345642365211, 'learning_rate': 7.320442816333151e-10, 'epoch': 0.99} 99%|█████████▉| 12221/12313 [9:08:56<04:06, 2.68s/it] 99%|█████████▉| 12222/12313 [9:08:59<04:09, 2.74s/it] {'loss': 0.415, 'grad_norm': 4.896384077479555, 'learning_rate': 7.162175204480926e-10, 'epoch': 0.99} 99%|█████████▉| 12222/12313 [9:08:59<04:09, 2.74s/it] 99%|█████████▉| 12223/12313 [9:09:01<04:05, 2.73s/it] {'loss': 0.5892, 'grad_norm': 11.24619397959855, 'learning_rate': 7.005636964732554e-10, 'epoch': 0.99} 99%|█████████▉| 12223/12313 [9:09:01<04:05, 2.73s/it] 99%|█████████▉| 12224/12313 [9:09:04<03:55, 2.65s/it] {'loss': 0.4542, 'grad_norm': 6.417486574394487, 'learning_rate': 6.850828107921037e-10, 'epoch': 0.99} 99%|█████████▉| 12224/12313 [9:09:04<03:55, 2.65s/it] 99%|█████████▉| 12225/12313 [9:09:06<03:47, 2.59s/it] {'loss': 0.4736, 'grad_norm': 5.669079347074385, 'learning_rate': 6.697748644757252e-10, 'epoch': 0.99} 99%|█████████▉| 12225/12313 [9:09:06<03:47, 2.59s/it] 99%|█████████▉| 12226/12313 [9:09:09<03:44, 2.58s/it] {'loss': 0.3881, 'grad_norm': 5.1508249817000475, 'learning_rate': 6.546398585832725e-10, 'epoch': 0.99} 99%|█████████▉| 12226/12313 [9:09:09<03:44, 2.58s/it] 99%|█████████▉| 12227/12313 [9:09:12<03:50, 2.68s/it] {'loss': 0.5204, 'grad_norm': 6.284781187211057, 'learning_rate': 6.396777941622412e-10, 'epoch': 0.99} 99%|█████████▉| 12227/12313 [9:09:12<03:50, 2.68s/it] 99%|█████████▉| 12228/12313 [9:09:16<04:23, 3.10s/it] {'loss': 0.3868, 'grad_norm': 5.867023818166328, 'learning_rate': 6.248886722479142e-10, 'epoch': 0.99} 99%|█████████▉| 12228/12313 [9:09:16<04:23, 3.10s/it] 99%|█████████▉| 12229/12313 [9:09:18<04:10, 2.98s/it] {'loss': 0.4652, 'grad_norm': 7.865759003313058, 'learning_rate': 6.10272493863362e-10, 'epoch': 0.99} 99%|█████████▉| 12229/12313 [9:09:18<04:10, 2.98s/it] 99%|█████████▉| 12230/12313 [9:09:21<03:57, 2.87s/it] {'loss': 0.4368, 'grad_norm': 5.771846652668555, 'learning_rate': 5.958292600202753e-10, 'epoch': 0.99} 99%|█████████▉| 12230/12313 [9:09:21<03:57, 2.87s/it] 99%|█████████▉| 12231/12313 [9:09:24<03:46, 2.76s/it] {'loss': 0.4511, 'grad_norm': 5.555877566179887, 'learning_rate': 5.81558971717855e-10, 'epoch': 0.99} 99%|█████████▉| 12231/12313 [9:09:24<03:46, 2.76s/it] 99%|█████████▉| 12232/12313 [9:09:26<03:45, 2.78s/it] {'loss': 0.3559, 'grad_norm': 7.602271947993169, 'learning_rate': 5.674616299436441e-10, 'epoch': 0.99} 99%|█████████▉| 12232/12313 [9:09:26<03:45, 2.78s/it] 99%|█████████▉| 12233/12313 [9:09:29<03:38, 2.73s/it] {'loss': 0.6782, 'grad_norm': 3.168714257439046, 'learning_rate': 5.53537235672974e-10, 'epoch': 0.99} 99%|█████████▉| 12233/12313 [9:09:29<03:38, 2.73s/it] 99%|█████████▉| 12234/12313 [9:09:32<03:34, 2.72s/it] {'loss': 0.4928, 'grad_norm': 8.0565315209248, 'learning_rate': 5.397857898692404e-10, 'epoch': 0.99} 99%|█████████▉| 12234/12313 [9:09:32<03:34, 2.72s/it] 99%|█████████▉| 12235/12313 [9:09:34<03:31, 2.71s/it] {'loss': 0.4097, 'grad_norm': 6.7281592050291525, 'learning_rate': 5.262072934841822e-10, 'epoch': 0.99} 99%|█████████▉| 12235/12313 [9:09:34<03:31, 2.71s/it] 99%|█████████▉| 12236/12313 [9:09:37<03:26, 2.68s/it] {'loss': 0.5035, 'grad_norm': 5.03523479370938, 'learning_rate': 5.128017474573254e-10, 'epoch': 0.99} 99%|█████████▉| 12236/12313 [9:09:37<03:26, 2.68s/it] 99%|█████████▉| 12237/12313 [9:09:40<03:23, 2.68s/it] {'loss': 0.6748, 'grad_norm': 5.581578649372743, 'learning_rate': 4.995691527162616e-10, 'epoch': 0.99} 99%|█████████▉| 12237/12313 [9:09:40<03:23, 2.68s/it] 99%|█████████▉| 12238/12313 [9:09:42<03:21, 2.69s/it] {'loss': 0.325, 'grad_norm': 5.1959863608312755, 'learning_rate': 4.86509510176647e-10, 'epoch': 0.99} 99%|█████████▉| 12238/12313 [9:09:42<03:21, 2.69s/it] 99%|█████████▉| 12239/12313 [9:09:45<03:23, 2.74s/it] {'loss': 0.5546, 'grad_norm': 5.218818979766321, 'learning_rate': 4.736228207419258e-10, 'epoch': 0.99} 99%|█████████▉| 12239/12313 [9:09:45<03:23, 2.74s/it] 99%|█████████▉| 12240/12313 [9:09:48<03:18, 2.71s/it] {'loss': 0.4902, 'grad_norm': 6.565937064015396, 'learning_rate': 4.60909085304162e-10, 'epoch': 0.99} 99%|█████████▉| 12240/12313 [9:09:48<03:18, 2.71s/it] 99%|█████████▉| 12241/12313 [9:09:50<03:09, 2.64s/it] {'loss': 0.4279, 'grad_norm': 4.111169296382956, 'learning_rate': 4.4836830474265235e-10, 'epoch': 0.99} 99%|█████████▉| 12241/12313 [9:09:50<03:09, 2.64s/it] 99%|█████████▉| 12242/12313 [9:09:53<03:12, 2.71s/it] {'loss': 0.5878, 'grad_norm': 4.28308564761494, 'learning_rate': 4.3600047992559124e-10, 'epoch': 0.99} 99%|█████████▉| 12242/12313 [9:09:53<03:12, 2.71s/it] 99%|█████████▉| 12243/12313 [9:09:56<03:08, 2.69s/it] {'loss': 0.7041, 'grad_norm': 7.57733637051035, 'learning_rate': 4.2380561170840553e-10, 'epoch': 0.99} 99%|█████████▉| 12243/12313 [9:09:56<03:08, 2.69s/it] 99%|█████████▉| 12244/12313 [9:09:58<03:03, 2.65s/it] {'loss': 0.4418, 'grad_norm': 6.372067043265081, 'learning_rate': 4.1178370093486463e-10, 'epoch': 0.99} 99%|█████████▉| 12244/12313 [9:09:58<03:03, 2.65s/it] 99%|█████████▉| 12245/12313 [9:10:01<03:05, 2.73s/it] {'loss': 0.4731, 'grad_norm': 3.884746049499681, 'learning_rate': 3.9993474843735837e-10, 'epoch': 0.99} 99%|█████████▉| 12245/12313 [9:10:01<03:05, 2.73s/it] 99%|█████████▉| 12246/12313 [9:10:04<02:58, 2.66s/it] {'loss': 0.4845, 'grad_norm': 8.66268673867393, 'learning_rate': 3.882587550349537e-10, 'epoch': 0.99} 99%|█████████▉| 12246/12313 [9:10:04<02:58, 2.66s/it] 99%|█████████▉| 12247/12313 [9:10:07<02:56, 2.67s/it] {'loss': 0.4564, 'grad_norm': 4.113892528273869, 'learning_rate': 3.7675572153644814e-10, 'epoch': 0.99} 99%|█████████▉| 12247/12313 [9:10:07<02:56, 2.67s/it] 99%|█████████▉| 12248/12313 [9:10:09<02:52, 2.65s/it] {'loss': 0.4016, 'grad_norm': 4.958163424799454, 'learning_rate': 3.6542564873731645e-10, 'epoch': 0.99} 99%|█████████▉| 12248/12313 [9:10:09<02:52, 2.65s/it] 99%|█████████▉| 12249/12313 [9:10:12<02:48, 2.63s/it] {'loss': 0.6589, 'grad_norm': 3.6688498786954162, 'learning_rate': 3.5426853742137613e-10, 'epoch': 0.99} 99%|█████████▉| 12249/12313 [9:10:12<02:48, 2.63s/it] 99%|█████████▉| 12250/12313 [9:10:14<02:43, 2.59s/it] {'loss': 0.4469, 'grad_norm': 10.717003613916313, 'learning_rate': 3.432843883610648e-10, 'epoch': 0.99} 99%|█████████▉| 12250/12313 [9:10:14<02:43, 2.59s/it] 99%|█████████▉| 12251/12313 [9:10:17<02:37, 2.54s/it] {'loss': 0.5045, 'grad_norm': 4.462573986139782, 'learning_rate': 3.3247320231605265e-10, 'epoch': 0.99} 99%|█████████▉| 12251/12313 [9:10:17<02:37, 2.54s/it] 100%|█████████▉| 12252/12313 [9:10:19<02:31, 2.49s/it] {'loss': 0.4665, 'grad_norm': 5.007934033669896, 'learning_rate': 3.218349800346299e-10, 'epoch': 1.0} 100%|█████████▉| 12252/12313 [9:10:19<02:31, 2.49s/it] 100%|█████████▉| 12253/12313 [9:10:22<02:29, 2.49s/it] {'loss': 0.5173, 'grad_norm': 3.873104868996485, 'learning_rate': 3.1136972225315197e-10, 'epoch': 1.0} 100%|█████████▉| 12253/12313 [9:10:22<02:29, 2.49s/it] 100%|█████████▉| 12254/12313 [9:10:24<02:34, 2.63s/it] {'loss': 0.5485, 'grad_norm': 3.3876902753690796, 'learning_rate': 3.0107742969520683e-10, 'epoch': 1.0} 100%|█████████▉| 12254/12313 [9:10:25<02:34, 2.63s/it] 100%|█████████▉| 12255/12313 [9:10:27<02:35, 2.68s/it] {'loss': 0.367, 'grad_norm': 6.699091681394484, 'learning_rate': 2.9095810307328e-10, 'epoch': 1.0} 100%|█████████▉| 12255/12313 [9:10:27<02:35, 2.68s/it] 100%|█████████▉| 12256/12313 [9:10:30<02:31, 2.65s/it] {'loss': 0.3962, 'grad_norm': 6.171401454308493, 'learning_rate': 2.810117430873671e-10, 'epoch': 1.0} 100%|█████████▉| 12256/12313 [9:10:30<02:31, 2.65s/it] 100%|█████████▉| 12257/12313 [9:10:33<02:28, 2.65s/it] {'loss': 0.4938, 'grad_norm': 4.923317857251655, 'learning_rate': 2.71238350426084e-10, 'epoch': 1.0} 100%|█████████▉| 12257/12313 [9:10:33<02:28, 2.65s/it] 100%|█████████▉| 12258/12313 [9:10:35<02:26, 2.67s/it] {'loss': 0.3092, 'grad_norm': 5.816170040992201, 'learning_rate': 2.61637925765279e-10, 'epoch': 1.0} 100%|█████████▉| 12258/12313 [9:10:35<02:26, 2.67s/it] 100%|█████████▉| 12259/12313 [9:10:38<02:26, 2.72s/it] {'loss': 0.4861, 'grad_norm': 4.783682438235011, 'learning_rate': 2.522104697696981e-10, 'epoch': 1.0} 100%|█████████▉| 12259/12313 [9:10:38<02:26, 2.72s/it] 100%|█████████▉| 12260/12313 [9:10:41<02:31, 2.87s/it] {'loss': 0.4059, 'grad_norm': 6.542965672565677, 'learning_rate': 2.4295598309131973e-10, 'epoch': 1.0} 100%|█████████▉| 12260/12313 [9:10:41<02:31, 2.87s/it] 100%|█████████▉| 12261/12313 [9:10:44<02:28, 2.86s/it] {'loss': 0.433, 'grad_norm': 5.290685230558423, 'learning_rate': 2.3387446637046506e-10, 'epoch': 1.0} 100%|█████████▉| 12261/12313 [9:10:44<02:28, 2.86s/it] 100%|█████████▉| 12262/12313 [9:10:47<02:19, 2.73s/it] {'loss': 0.5698, 'grad_norm': 4.569512260893426, 'learning_rate': 2.2496592023579789e-10, 'epoch': 1.0} 100%|█████████▉| 12262/12313 [9:10:47<02:19, 2.73s/it] 100%|█████████▉| 12263/12313 [9:10:49<02:13, 2.67s/it] {'loss': 0.4349, 'grad_norm': 13.1705343773864, 'learning_rate': 2.1623034530349197e-10, 'epoch': 1.0} 100%|█████████▉| 12263/12313 [9:10:49<02:13, 2.67s/it] 100%|█████████▉| 12264/12313 [9:10:52<02:13, 2.72s/it] {'loss': 0.3677, 'grad_norm': 11.027548037308877, 'learning_rate': 2.076677421783413e-10, 'epoch': 1.0} 100%|█████████▉| 12264/12313 [9:10:52<02:13, 2.72s/it] 100%|█████████▉| 12265/12313 [9:10:55<02:10, 2.71s/it] {'loss': 0.4086, 'grad_norm': 7.761548829661015, 'learning_rate': 1.992781114523723e-10, 'epoch': 1.0} 100%|█████████▉| 12265/12313 [9:10:55<02:10, 2.71s/it] 100%|█████████▉| 12266/12313 [9:10:57<02:06, 2.68s/it] {'loss': 0.523, 'grad_norm': 4.496137975811519, 'learning_rate': 1.910614537065092e-10, 'epoch': 1.0} 100%|█████████▉| 12266/12313 [9:10:57<02:06, 2.68s/it] 100%|█████████▉| 12267/12313 [9:11:00<02:10, 2.84s/it] {'loss': 0.4518, 'grad_norm': 4.01367922274939, 'learning_rate': 1.8301776950918615e-10, 'epoch': 1.0} 100%|█████████▉| 12267/12313 [9:11:00<02:10, 2.84s/it] 100%|█████████▉| 12268/12313 [9:11:03<02:05, 2.79s/it] {'loss': 0.4174, 'grad_norm': 11.271093585704282, 'learning_rate': 1.7514705941690247e-10, 'epoch': 1.0} 100%|█████████▉| 12268/12313 [9:11:03<02:05, 2.79s/it] 100%|█████████▉| 12269/12313 [9:11:06<02:01, 2.75s/it] {'loss': 0.5133, 'grad_norm': 5.748127242285962, 'learning_rate': 1.6744932397422254e-10, 'epoch': 1.0} 100%|█████████▉| 12269/12313 [9:11:06<02:01, 2.75s/it] 100%|█████████▉| 12270/12313 [9:11:09<01:57, 2.74s/it] {'loss': 0.3742, 'grad_norm': 5.1019116955334916, 'learning_rate': 1.5992456371377584e-10, 'epoch': 1.0} 100%|█████████▉| 12270/12313 [9:11:09<01:57, 2.74s/it] 100%|█████████▉| 12271/12313 [9:11:11<01:54, 2.74s/it] {'loss': 0.4672, 'grad_norm': 6.408098676761947, 'learning_rate': 1.5257277915653458e-10, 'epoch': 1.0} 100%|█████████▉| 12271/12313 [9:11:11<01:54, 2.74s/it] 100%|█████████▉| 12272/12313 [9:11:14<01:51, 2.71s/it] {'loss': 0.3491, 'grad_norm': 6.484656082022969, 'learning_rate': 1.4539397081070328e-10, 'epoch': 1.0} 100%|█████████▉| 12272/12313 [9:11:14<01:51, 2.71s/it] 100%|█████████▉| 12273/12313 [9:11:17<01:53, 2.83s/it] {'loss': 0.5493, 'grad_norm': 3.999338555705399, 'learning_rate': 1.3838813917366188e-10, 'epoch': 1.0} 100%|█████████▉| 12273/12313 [9:11:17<01:53, 2.83s/it] 100%|█████████▉| 12274/12313 [9:11:20<01:48, 2.77s/it] {'loss': 0.3952, 'grad_norm': 5.4475266405766565, 'learning_rate': 1.3155528472974523e-10, 'epoch': 1.0} 100%|█████████▉| 12274/12313 [9:11:20<01:48, 2.77s/it] 100%|█████████▉| 12275/12313 [9:11:22<01:44, 2.76s/it] {'loss': 0.5474, 'grad_norm': 4.872599475892686, 'learning_rate': 1.2489540795163068e-10, 'epoch': 1.0} 100%|█████████▉| 12275/12313 [9:11:22<01:44, 2.76s/it] 100%|█████████▉| 12276/12313 [9:11:25<01:39, 2.68s/it] {'loss': 0.5644, 'grad_norm': 3.670731259776475, 'learning_rate': 1.18408509300616e-10, 'epoch': 1.0} 100%|█████████▉| 12276/12313 [9:11:25<01:39, 2.68s/it] 100%|█████████▉| 12277/12313 [9:11:28<01:36, 2.67s/it] {'loss': 0.5351, 'grad_norm': 6.2602727752630045, 'learning_rate': 1.1209458922495365e-10, 'epoch': 1.0} 100%|█████████▉| 12277/12313 [9:11:28<01:36, 2.67s/it] 100%|█████████▉| 12278/12313 [9:11:30<01:31, 2.62s/it] {'loss': 0.5048, 'grad_norm': 4.875410969826889, 'learning_rate': 1.0595364816207155e-10, 'epoch': 1.0} 100%|█████████▉| 12278/12313 [9:11:30<01:31, 2.62s/it] 100%|█████████▉| 12279/12313 [9:11:33<01:27, 2.58s/it] {'loss': 0.5401, 'grad_norm': 4.9678315360142395, 'learning_rate': 9.998568653690754e-11, 'epoch': 1.0} 100%|█████████▉| 12279/12313 [9:11:33<01:27, 2.58s/it] 100%|█████████▉| 12280/12313 [9:11:35<01:25, 2.59s/it] {'loss': 0.516, 'grad_norm': 4.51671927225658, 'learning_rate': 9.41907047619095e-11, 'epoch': 1.0} 100%|█████████▉| 12280/12313 [9:11:35<01:25, 2.59s/it] 100%|█████████▉| 12281/12313 [9:11:38<01:24, 2.64s/it] {'loss': 0.5275, 'grad_norm': 4.930864987720356, 'learning_rate': 8.856870323842304e-11, 'epoch': 1.0} 100%|█████████▉| 12281/12313 [9:11:38<01:24, 2.64s/it] 100%|█████████▉| 12282/12313 [9:11:41<01:24, 2.73s/it] {'loss': 0.4833, 'grad_norm': 6.047707965039309, 'learning_rate': 8.311968235530376e-11, 'epoch': 1.0} 100%|█████████▉| 12282/12313 [9:11:41<01:24, 2.73s/it] 100%|█████████▉| 12283/12313 [9:11:43<01:20, 2.68s/it] {'loss': 0.5413, 'grad_norm': 7.174138199102842, 'learning_rate': 7.784364248974996e-11, 'epoch': 1.0} 100%|█████████▉| 12283/12313 [9:11:43<01:20, 2.68s/it] 100%|█████████▉| 12284/12313 [9:11:46<01:17, 2.69s/it] {'loss': 0.4525, 'grad_norm': 6.751919593589131, 'learning_rate': 7.274058400674744e-11, 'epoch': 1.0} 100%|█████████▉| 12284/12313 [9:11:46<01:17, 2.69s/it] 100%|█████████▉| 12285/12313 [9:11:49<01:15, 2.69s/it] {'loss': 0.4845, 'grad_norm': 5.318816576138242, 'learning_rate': 6.781050725962468e-11, 'epoch': 1.0} 100%|█████████▉| 12285/12313 [9:11:49<01:15, 2.69s/it] 100%|█████████▉| 12286/12313 [9:11:51<01:11, 2.64s/it] {'loss': 0.5208, 'grad_norm': 4.612324143619252, 'learning_rate': 6.30534125889426e-11, 'epoch': 1.0} 100%|█████████▉| 12286/12313 [9:11:51<01:11, 2.64s/it] 100%|█████████▉| 12287/12313 [9:11:54<01:09, 2.67s/it] {'loss': 0.4859, 'grad_norm': 6.7018444490349065, 'learning_rate': 5.846930032443743e-11, 'epoch': 1.0} 100%|█████████▉| 12287/12313 [9:11:54<01:09, 2.67s/it] 100%|█████████▉| 12288/12313 [9:11:57<01:08, 2.75s/it] {'loss': 0.3708, 'grad_norm': 7.423300865438458, 'learning_rate': 5.4058170783077845e-11, 'epoch': 1.0} 100%|█████████▉| 12288/12313 [9:11:57<01:08, 2.75s/it] 100%|█████████▉| 12289/12313 [9:12:00<01:05, 2.74s/it] {'loss': 0.5712, 'grad_norm': 6.668113758835644, 'learning_rate': 4.982002427017518e-11, 'epoch': 1.0} 100%|█████████▉| 12289/12313 [9:12:00<01:05, 2.74s/it] 100%|█████████▉| 12290/12313 [9:12:02<01:00, 2.63s/it] {'loss': 0.4483, 'grad_norm': 7.202885642026081, 'learning_rate': 4.5754861078828314e-11, 'epoch': 1.0} 100%|█████████▉| 12290/12313 [9:12:02<01:00, 2.63s/it] 100%|█████████▉| 12291/12313 [9:12:05<00:58, 2.64s/it] {'loss': 0.4036, 'grad_norm': 8.215415290405625, 'learning_rate': 4.186268149047879e-11, 'epoch': 1.0} 100%|█████████▉| 12291/12313 [9:12:05<00:58, 2.64s/it] 100%|█████████▉| 12292/12313 [9:12:08<00:56, 2.69s/it] {'loss': 0.4738, 'grad_norm': 3.4403988676406008, 'learning_rate': 3.814348577435567e-11, 'epoch': 1.0} 100%|█████████▉| 12292/12313 [9:12:08<00:56, 2.69s/it] 100%|█████████▉| 12293/12313 [9:12:10<00:53, 2.69s/it] {'loss': 0.5479, 'grad_norm': 4.536469686726864, 'learning_rate': 3.4597274187753163e-11, 'epoch': 1.0} 100%|█████████▉| 12293/12313 [9:12:10<00:53, 2.69s/it] 100%|█████████▉| 12294/12313 [9:12:13<00:53, 2.79s/it] {'loss': 0.4988, 'grad_norm': 4.166942290927568, 'learning_rate': 3.122404697603054e-11, 'epoch': 1.0} 100%|█████████▉| 12294/12313 [9:12:13<00:53, 2.79s/it] 100%|█████████▉| 12295/12313 [9:12:16<00:48, 2.69s/it] {'loss': 0.6298, 'grad_norm': 3.821490393417459, 'learning_rate': 2.8023804372889762e-11, 'epoch': 1.0} 100%|█████████▉| 12295/12313 [9:12:16<00:48, 2.69s/it] 100%|█████████▉| 12296/12313 [9:12:18<00:45, 2.68s/it] {'loss': 0.4335, 'grad_norm': 4.579385987949685, 'learning_rate': 2.499654659954276e-11, 'epoch': 1.0} 100%|█████████▉| 12296/12313 [9:12:18<00:45, 2.68s/it] 100%|█████████▉| 12297/12313 [9:12:21<00:43, 2.69s/it] {'loss': 0.4252, 'grad_norm': 3.5660288116414067, 'learning_rate': 2.214227386554413e-11, 'epoch': 1.0} 100%|█████████▉| 12297/12313 [9:12:21<00:43, 2.69s/it] 100%|█████████▉| 12298/12313 [9:12:24<00:39, 2.66s/it] {'loss': 0.5141, 'grad_norm': 3.8887534314797567, 'learning_rate': 1.9460986368513568e-11, 'epoch': 1.0} 100%|█████████▉| 12298/12313 [9:12:24<00:39, 2.66s/it] 100%|█████████▉| 12299/12313 [9:12:26<00:37, 2.68s/it] {'loss': 0.4097, 'grad_norm': 9.174267597596703, 'learning_rate': 1.6952684293580767e-11, 'epoch': 1.0} 100%|█████████▉| 12299/12313 [9:12:26<00:37, 2.68s/it] 100%|█████████▉| 12300/12313 [9:12:29<00:34, 2.68s/it] {'loss': 0.498, 'grad_norm': 4.342933188190943, 'learning_rate': 1.4617367814495632e-11, 'epoch': 1.0} 100%|█████████▉| 12300/12313 [9:12:29<00:34, 2.68s/it] 100%|█████████▉| 12301/12313 [9:12:32<00:32, 2.69s/it] {'loss': 0.5183, 'grad_norm': 5.079424217042842, 'learning_rate': 1.2455037093073163e-11, 'epoch': 1.0} 100%|█████████▉| 12301/12313 [9:12:32<00:32, 2.69s/it] 100%|█████████▉| 12302/12313 [9:12:34<00:28, 2.61s/it] {'loss': 0.5642, 'grad_norm': 8.135085893159895, 'learning_rate': 1.0465692278638361e-11, 'epoch': 1.0} 100%|█████████▉| 12302/12313 [9:12:34<00:28, 2.61s/it] 100%|█████████▉| 12303/12313 [9:12:37<00:27, 2.70s/it] {'loss': 0.4073, 'grad_norm': 30.1448448129456, 'learning_rate': 8.649333509136438e-12, 'epoch': 1.0} 100%|█████████▉| 12303/12313 [9:12:37<00:27, 2.70s/it]/usr/local/lib/python3.9/dist-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants. warnings.warn( 100%|█████████▉| 12304/12313 [9:13:16<02:02, 13.57s/it] {'loss': 0.4412, 'grad_norm': 3.5750066680144217, 'learning_rate': 7.005960910022591e-12, 'epoch': 1.0} 100%|█████████▉| 12304/12313 [9:13:16<02:02, 13.57s/it] 100%|█████████▉| 12305/12313 [9:13:19<01:22, 10.26s/it] {'loss': 0.3997, 'grad_norm': 13.396727140828338, 'learning_rate': 5.535574594817128e-12, 'epoch': 1.0} 100%|█████████▉| 12305/12313 [9:13:19<01:22, 10.26s/it] 100%|█████████▉| 12306/12313 [9:13:21<00:55, 7.99s/it] {'loss': 0.3945, 'grad_norm': 6.163825951561884, 'learning_rate': 4.238174665938122e-12, 'epoch': 1.0} 100%|█████████▉| 12306/12313 [9:13:21<00:55, 7.99s/it] 100%|█████████▉| 12307/12313 [9:13:24<00:38, 6.38s/it] {'loss': 0.4673, 'grad_norm': 5.7285348007196975, 'learning_rate': 3.11376121248097e-12, 'epoch': 1.0} 100%|█████████▉| 12307/12313 [9:13:24<00:38, 6.38s/it] 100%|█████████▉| 12308/12313 [9:13:26<00:26, 5.23s/it] {'loss': 0.4399, 'grad_norm': 4.103396802719272, 'learning_rate': 2.1623343124388405e-12, 'epoch': 1.0} 100%|█████████▉| 12308/12313 [9:13:26<00:26, 5.23s/it] 100%|█████████▉| 12309/12313 [9:13:29<00:17, 4.48s/it] {'loss': 0.4208, 'grad_norm': 4.630608713223542, 'learning_rate': 1.3838940318700034e-12, 'epoch': 1.0} 100%|█████████▉| 12309/12313 [9:13:29<00:17, 4.48s/it] 100%|█████████▉| 12310/12313 [9:13:32<00:11, 3.94s/it] {'loss': 0.5584, 'grad_norm': 5.594155269906072, 'learning_rate': 7.784404243427191e-13, 'epoch': 1.0} 100%|█████████▉| 12310/12313 [9:13:32<00:11, 3.94s/it] 100%|█████████▉| 12311/12313 [9:13:35<00:07, 3.57s/it] {'loss': 0.4048, 'grad_norm': 7.544818372955951, 'learning_rate': 3.4597353176790696e-13, 'epoch': 1.0} 100%|█████████▉| 12311/12313 [9:13:35<00:07, 3.57s/it] 100%|█████████▉| 12312/12313 [9:13:37<00:03, 3.33s/it] {'loss': 0.5781, 'grad_norm': 4.9742349717120895, 'learning_rate': 8.649338439914445e-14, 'epoch': 1.0} 100%|█████████▉| 12312/12313 [9:13:37<00:03, 3.33s/it] 100%|██████████| 12313/12313 [9:13:41<00:00, 3.33s/it] {'loss': 0.4966, 'grad_norm': 4.419415245809011, 'learning_rate': 0.0, 'epoch': 1.0} 100%|██████████| 12313/12313 [9:13:41<00:00, 3.33s/it] {'train_runtime': 33230.6995, 'train_samples_per_second': 11.857, 'train_steps_per_second': 0.371, 'train_loss': 0.5386334688529102, 'epoch': 1.0} 100%|██████████| 12313/12313 [9:13:41<00:00, 3.33s/it] 100%|██████████| 12313/12313 [9:13:41<00:00, 2.70s/it] wandb: Waiting for W&B process to finish... (success). wandb: - 0.042 MB of 0.042 MB uploaded (0.000 MB deduped) wandb: wandb: Run history: wandb: train/epoch ▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███ wandb: train/global_step ▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███ wandb: train/grad_norm ▄▂▃▄▃▄▂▃▂▂▁▃▆▄█▄▇▅▂▂▃▁▄▂▁▂▃▃▄▂▆▁▃▃▂▄▅▃▃▃ wandb: train/learning_rate ▃███████▇▇▇▇▇▆▆▆▆▅▅▅▅▄▄▄▃▃▃▃▂▂▂▂▂▁▁▁▁▁▁▁ wandb: train/loss █▆▅▄▄▆▅▄▅▃▄▄▃▅▃▆▄▃▄▃▄▅▆▃▄▃▄▃▄▃▁▆▃▃▃▃▅▄▂▄ wandb: wandb: Run summary: wandb: total_flos 582090100039680.0 wandb: train/epoch 1.0 wandb: train/global_step 12313 wandb: train/grad_norm 4.41942 wandb: train/learning_rate 0.0 wandb: train/loss 0.4966 wandb: train_loss 0.53863 wandb: train_runtime 33230.6995 wandb: train_samples_per_second 11.857 wandb: train_steps_per_second 0.371 wandb: wandb: Synced 4 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s) wandb: Find logs at: ./wandb/run-20241205_123315-run_20241205_7a25086f/logs